Pipeline Layers
FeaturePipelineBuilder supports multiple logical layers. Each layer contributes steps to the final assembled feature pipeline.
ValidationPipeline
Methods:
- validate_schema(required_cols, expected_dtypes=None, strict=True)
Purpose: - Fail fast on missing columns or unexpected dtypes before feature transforms.
FeatureEngineeringPipeline
Methods (core):
- drop_columns, select_columns
- fill_na, mean_imputation_na_list
- one_hot_encode, woe_categorical_imputer, target_encode, group_rare_categories
- encode_date_col, add_prophet_features, encode_cyclical_time
- add_interaction_features, vectorize_text
- clip_feature_outliers, winsorize_features, transform_skewed_features
- scaler, reduce_mem_load
- drop_correlated_features, decision_tree_feat_select, random_forest_feat_select
FeatureSelectionPipeline
Methods:
- select_by_correlation(drop_thresh)
- select_by_decision_tree(importance_thresh)
- select_by_random_forest(importance_thresh)
Purpose: - Isolate model-oriented selection logic from general engineering steps.
PostProcessingPipeline
Methods:
- clip_outliers(...)
- winsorize(...)
- transform_distribution(...)
Purpose: - Late-stage feature shaping and stability adjustments.
MonitoringPipeline
Methods:
- monitor_drift(numeric_cols=None)
Purpose: - Compute lightweight drift stats against fitted baseline while passing data through.
ExplainabilityPipeline
Methods:
- explain_with_permutation_importance(model, scoring=None, n_repeats=5)
Purpose: - Attach permutation-importance reporting as a pipeline-compatible pass-through step.
ResamplingPipeline
Methods:
- Oversampling: random_oversampling, smote_oversampling, adasyn_oversampling, borderline_smote_oversampling
- Undersampling: random_undersampling, cluster_centroids_undersampling, tomek_links_undersampling, enn_undersampling, near_miss_undersampling
Purpose: - Address target imbalance after feature transformations.
Build + Execute
build_pipeline()assembles all configured layers.fit_transform_features(X, y)applies feature pipeline.fit_resample_features(X, y)applies feature pipeline then sampling pipeline.fit_features(X, y)/transform_features(X)support train/score separation.