Skip to content

Pipeline Layers

FeaturePipelineBuilder supports multiple logical layers. Each layer contributes steps to the final assembled feature pipeline.

ValidationPipeline

Methods: - validate_schema(required_cols, expected_dtypes=None, strict=True)

Purpose: - Fail fast on missing columns or unexpected dtypes before feature transforms.

FeatureEngineeringPipeline

Methods (core): - drop_columns, select_columns - fill_na, mean_imputation_na_list - one_hot_encode, woe_categorical_imputer, target_encode, group_rare_categories - encode_date_col, add_prophet_features, encode_cyclical_time - add_interaction_features, vectorize_text - clip_feature_outliers, winsorize_features, transform_skewed_features - scaler, reduce_mem_load - drop_correlated_features, decision_tree_feat_select, random_forest_feat_select

FeatureSelectionPipeline

Methods: - select_by_correlation(drop_thresh) - select_by_decision_tree(importance_thresh) - select_by_random_forest(importance_thresh)

Purpose: - Isolate model-oriented selection logic from general engineering steps.

PostProcessingPipeline

Methods: - clip_outliers(...) - winsorize(...) - transform_distribution(...)

Purpose: - Late-stage feature shaping and stability adjustments.

MonitoringPipeline

Methods: - monitor_drift(numeric_cols=None)

Purpose: - Compute lightweight drift stats against fitted baseline while passing data through.

ExplainabilityPipeline

Methods: - explain_with_permutation_importance(model, scoring=None, n_repeats=5)

Purpose: - Attach permutation-importance reporting as a pipeline-compatible pass-through step.

ResamplingPipeline

Methods: - Oversampling: random_oversampling, smote_oversampling, adasyn_oversampling, borderline_smote_oversampling - Undersampling: random_undersampling, cluster_centroids_undersampling, tomek_links_undersampling, enn_undersampling, near_miss_undersampling

Purpose: - Address target imbalance after feature transformations.

Build + Execute

  • build_pipeline() assembles all configured layers.
  • fit_transform_features(X, y) applies feature pipeline.
  • fit_resample_features(X, y) applies feature pipeline then sampling pipeline.
  • fit_features(X, y) / transform_features(X) support train/score separation.