Transformation Catalog
This is the complete current transformation list exposed through the builder API.
Validation
validate_schema(required_cols, expected_dtypes=None, strict=True)- Enforces required column presence and optional dtype contracts.
Categorical + Encoding
group_rare_categories(categorical_cols, min_freq=0.01, other_label='__OTHER__')- Replaces infrequent labels with a shared bucket.
target_encode(categorical_cols, smoothing=10.0, suffix='_TE', drop_original=False)- Smoothed target encoding for categorical features.
one_hot_encode(non_ordinal_categorical_cols)- One-hot encoding with train/test column alignment.
woe_categorical_imputer(categorical_cols)- Weight-of-Evidence encoding based on target distribution.
cat_norm_cat_features(categorical_variables, numerical_variables, categorical_alias)- Category-relative normalization for numeric variables.
Missing Values
fill_na(fill_na_dict)- Dictionary-based fill for missing values.
mean_imputation_na_list(fill_na_mean, strategy)- sklearn
SimpleImputerwrapper for selected columns.
Time + Date
encode_date_col(date_col, time_col=None)- Adds calendar/time components.
encode_cyclical_time(period_map, drop_original=False)- Adds sin/cos features for periodic numeric fields.
add_prophet_features(date_col, time_col=None, y_col='close')- Adds Prophet-derived time features.
Numeric Shaping
clip_feature_outliers(numeric_cols=None, lower_quantile=0.01, upper_quantile=0.99)- Quantile clipping.
winsorize_features(numeric_cols=None, lower_tail=0.01, upper_tail=0.99)- Winsorization by tail percentiles.
transform_skewed_features(numeric_cols=None, method='yeo-johnson', standardize=False)- Yeo-Johnson or log1p transform.
scaler(scaler_type)- Numeric scaling (
standard,min_max,robust,max_absolute). reduce_mem_load()- Numeric dtype downcasting helper.
Feature Construction
add_interaction_features(numeric_cols=None, include_self_interactions=False)- Pairwise interaction products.
vectorize_text(text_col, max_features=200, prefix='TFIDF', drop_original=True)- TF-IDF feature generation for one text column.
Feature Selection
drop_correlated_features(drop_thresh)- Correlation-based drop (engineering layer).
decision_tree_feat_select(importance_thresh)- Tree-importance threshold selector (engineering layer).
random_forest_feat_select(importance_thresh)- RF-importance threshold selector (engineering layer).
select_by_correlation(drop_thresh)- Dedicated feature-selection layer variant.
select_by_decision_tree(importance_thresh)- Dedicated feature-selection layer variant.
select_by_random_forest(importance_thresh)- Dedicated feature-selection layer variant.
Monitoring + Explainability
monitor_drift(numeric_cols=None)- Stores basic baseline/current drift stats for numeric columns.
explain_with_permutation_importance(model, scoring=None, n_repeats=5)- Computes permutation importances and stores report.
Resampling
random_oversampling(sampling_strategy)smote_oversampling(sampling_strategy, k_neighbors=5)adasyn_oversampling(sampling_strategy, k_neighbors=5)borderline_smote_oversampling(sampling_strategy, k_neighbors=5, kind='borderline-1')random_undersampling(sampling_strategy)cluster_centroids_undersampling(sampling_strategy)tomek_links_undersampling()enn_undersampling(sampling_strategy)near_miss_undersampling(sampling_strategy, version=1)