How PowerTransformer Enhances Your Data for Better Machine Learning

well , you might have heard of power transformers like box cox , yeo-Johnson and ever wondered what are they ?

its okay i got you🙌, Here i will be sharing everything you need to know about transformers in machine learning .

This is a data Distribution before and after transformation 👇👇( see the difference! )

In machine learning and data science, transformers are tools that help change your data to make it easier to work with. They adjust your data so it's in a better shape for algorithms to analyze or model. For example, if your data doesn’t follow a normal pattern, transformers can help make it look more normal.

There are 3 types of transformers

  1. Function Transformer

  2. Power Transformer

  3. Quantile Transformer

Lets learn about power transformer which is commonly used in ML.

Power Transformer

The Power Transformer is a scikit-learn transformer that applies a power transformation to the data . This can help stabilize variance, make the data more normally distributed.

  • Normalizes Data: Makes the data more Gaussian-like, which is beneficial for many machine learning algorithms.

  • Two Methods: It supports two methods—Box-Cox (only for positive data) and Yeo-Johnson (can handle both positive and negative data).

  • Handles Skewed Data: It’s particularly effective for reducing skewness in data, making it more symmetric.

Box-Cox Transformation:

  • Box-Cox transformation that is only applicable to positive data. It transforms the data based on the power parameter, which is determined by maximum likelihood estimation.

  • Formula Of Box-Cox :

  • Use Box Cox Transformer, When your data is strictly positive and needs to be transformed to a normal distribution.

from sklearn.preprocessing import PowerTransformer
# Apply PowerTransformer (Box-Cox by default)
pt = PowerTransformer(method='box-cox')
X_transformed = pt.fit_transform(X)

print(X_transformed)

Yeo-Johnson Transformation:

  • it is Similar to Box-Cox, but can handle both positive and negative values, making it more flexible.

  • Formula of Yeo-Johnson :

When your data contains both positive and negative values, and you want to reduce skewness and normalize the distribution.

Guys Don't get scared of the equation formulas , just remember that Box- Cox is used when your data is strictly positive and Yeo-Johnson is used when your data contains both positive and negative values. ( easy Right ? )

from sklearn.preprocessing import PowerTransformer
# Apply PowerTransformer (Yeo-Johnson) 
pt = PowerTransformer(method='yeo-johnson')
X_transformed = pt.fit_transform(X)

print(X_transformed)

Why One should use such Transformers?

  • Enhance Model performance : Many machine learning algorithms assume that the data is normally distributed (e.g., linear regression, logistic regression, and support vector machines). By transforming the data to make it more Gaussian-like, these models can perform better, leading to more accurate predictions and better generalization.

  • Reducing Influence of Outliers: In highly skewed data, outliers can disproportionately influence the model. Transformations can reduce the impact of outliers by compressing the range of the data, making the model more robust.

  • Stabilize the variance : Power Transformer stabilizes the variance across the dataset. This makes it easier for models to capture patterns and relationships in the data without being misled by extreme variations.

Check this Blog Out to go deep ( Highly Recommend ) 👇

Thank you

Happy Learning 💐