Skip to content

skpipeline-forge Docs

skpipeline-forge lets you define preprocessing systems as composable objects and build repeatable train/score pipelines.

Core idea

Use FeaturePipelineBuilder as a fluent builder:

from pipeline_forge import FeaturePipelineBuilder

pipe = FeaturePipelineBuilder(target_col="target", random_seed=42)

Then stack steps by pipeline layer:

  1. Validation (validate_schema)
  2. Feature engineering (encoders, interactions, text/vectorization, scaling)
  3. Feature selection (select_by_*)
  4. Postprocessing (clip_outliers, winsorize, transform_distribution)
  5. Monitoring (monitor_drift)
  6. Explainability (explain_with_permutation_importance)
  7. Resampling (random_oversampling, smote_oversampling, etc.)

Typical workflow

pipe = FeaturePipelineBuilder(target_col="target", random_seed=42)

(
    pipe
    .validate_schema(required_cols=["city", "hour", "age"], strict=False)
    .group_rare_categories(categorical_cols=["city"], min_freq=0.02)
    .target_encode(categorical_cols=["city"], drop_original=True)
    .encode_cyclical_time(period_map={"hour": 24})
    .scaler("robust")
    .random_oversampling(sampling_strategy=1.0)
)

pipe.build_pipeline()
X_train_trans = pipe.fit_transform_features(X_train, y_train)
X_train_bal, y_train_bal = pipe.fit_resample_features(X_train, y_train)

Next