skpipeline-forge Docs
skpipeline-forge lets you define preprocessing systems as composable objects and build repeatable train/score pipelines.
Core idea
Use FeaturePipelineBuilder as a fluent builder:
from pipeline_forge import FeaturePipelineBuilder
pipe = FeaturePipelineBuilder(target_col="target", random_seed=42)
Then stack steps by pipeline layer:
- Validation (
validate_schema) - Feature engineering (encoders, interactions, text/vectorization, scaling)
- Feature selection (
select_by_*) - Postprocessing (
clip_outliers,winsorize,transform_distribution) - Monitoring (
monitor_drift) - Explainability (
explain_with_permutation_importance) - Resampling (
random_oversampling,smote_oversampling, etc.)
Typical workflow
pipe = FeaturePipelineBuilder(target_col="target", random_seed=42)
(
pipe
.validate_schema(required_cols=["city", "hour", "age"], strict=False)
.group_rare_categories(categorical_cols=["city"], min_freq=0.02)
.target_encode(categorical_cols=["city"], drop_original=True)
.encode_cyclical_time(period_map={"hour": 24})
.scaler("robust")
.random_oversampling(sampling_strategy=1.0)
)
pipe.build_pipeline()
X_train_trans = pipe.fit_transform_features(X_train, y_train)
X_train_bal, y_train_bal = pipe.fit_resample_features(X_train, y_train)