Week5

Standardization of Data

standardized_x=(x-average)/std

import sklearn.preprocessing as skp
scaler=skp.StandardScaler().fit(Dataset)
standardized_Dataset=scaler.transform(Dataset)

standardized_Dataset=skp.scale(Dataset,axis=0) 
#0 column

Normalization of Data

Normalized_x=(x-min)/(max-min)

Normalizer=skp.Normalizer().fit(Dataset)
normalized_Dataset =Normalizer.transform(Dataset)
normalized_Dataset=skp.normalize(Dataset, norm="l2") 
#in which norm you wan to normalize the data l1 or l2

Binarization of Data

binarizer=skp.Binarizer(threshold=0.1).fit(Dataset)
binarized_Dataset=binarizer.transform(Dataset)

binarized_Dataset=skp.binarize(Dataset,threshold=0.1)

Missing Data Imputation

mean/median/most_frequent

PCA

Exercise

Load the ‘diabetes’ dataset from sklearn dataset library, and do the followings :

• Standardize the data • Normalize the data • Reduce the dimension of the data to 4 columns with PCA • Cluster the input features with k-mean clustering library of scipy package, to 4 clusters

Last updated

Was this helpful?