Week6

Partitioning Data

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y, test_size=f)

import sklearn.model_selection as skms
x_train, x_test, y_train, y_test=skm.train_test_split(x,y,test_size=f)

OLS (ordinary least squares)

from sklearn import linear_model
reg=linear_model.LinearRegression()
reg.fit(x,y)
reg.predict(x_dash) #make predictions

Ridge Regression

Predictors are highly correlated. Impose a small penalty factor to shrink them.

alpha controls amount of shrinkage

from sklearn import linear_model
reg=linear_model.Ridge (alpha=n)
reg.fit(x,y)
reg.predict(x_dash)

LASSO Regression (Least Absolute Selection and Shrinkage Operator)

Regression metrics (The performance of the model)

  • MAE--mean absolute error [no direction]

  • RMSE

  • R2-Score--correlation between y_true and y_pred

Excercise

1 - Import the “boston” dataset from sklearn package. Do a linear regression by OLS, Ridge and Lasso. Please calculate a 10-fold cross validation and calculate RMSE for 10-fold cross validation of each method.

The advantage of k-fold CV is that all observations are used for both training and validation, and each observation is used for validation exactly once. Hint : Call KFold library from sklearn

Last updated

Was this helpful?