Week6
Partitioning Data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y, test_size=f)
import sklearn.model_selection as skms
x_train, x_test, y_train, y_test=skm.train_test_split(x,y,test_size=f)OLS (ordinary least squares)
from sklearn import linear_model
reg=linear_model.LinearRegression()
reg.fit(x,y)
reg.predict(x_dash) #make predictionsRidge Regression
Predictors are highly correlated. Impose a small penalty factor to shrink them.
alpha controls amount of shrinkage
from sklearn import linear_model
reg=linear_model.Ridge (alpha=n)
reg.fit(x,y)
reg.predict(x_dash)LASSO Regression (Least Absolute Selection and Shrinkage Operator)
Regression metrics (The performance of the model)
MAE--mean absolute error [no direction]
RMSE
R2-Score--correlation between y_true and y_pred
Excercise
1 - Import the “boston” dataset from sklearn package. Do a linear regression by OLS, Ridge and Lasso. Please calculate a 10-fold cross validation and calculate RMSE for 10-fold cross validation of each method.
The advantage of k-fold CV is that all observations are used for both training and validation, and each observation is used for validation exactly once. Hint : Call KFold library from sklearn
Last updated
Was this helpful?