# elastic net and multicolinearity

Elastic Net and Multicollinearity

弹性网络（Elastic Net)实质上是也是一种回归，简单的说是LASSO和Ridge回归的结合。LASSO回归使用的是一阶范数的正则化，倾向于将回归系数（回归中的beta）收缩（shrinkage)为0，这样起到了变量选择的作用。岭回归的解是在线性回归基础上，加了二阶范数的正则化．当特征中某些特征存在线性相关时，

可降低了估计值（回归中的beta）的方差， 提高了估计精度。 岭回归结果表明，岭回归虽然一定程度上可以拟合模型，但容易导致回归结果失真；lasso回归虽然能刻画模型代表的现实情况，但是模型过于简单，不符合实际。弹性网回归结果表明，一方面达到了岭回归对重要特征选择的目的，另一方面又像Lasso回归那样，删除了对因变量影响较小的特征，取得了很好的效果

为什么 使用一阶范数正则化的LASSO 有变量选择功能而使用二阶范数的RIDGE 没有？详细的解释见这一篇博文：<http://blog.peachdata.org/2017/02/07/Lasso-Ridge.html>

共线性 不错的课程 <https://onlinecourses.science.psu.edu/stat501/node/347/>

When Can You Safely Ignore Multicollinearity? <https://statisticalhorizons.com/multicollinearity>

#### A good idea is to assess this using neuroimaging data:

The impact of multicollinearity on the variation of coefficient estimation when using logistic regression <https://www.business-school.ed.ac.uk/crc/wp-content/uploads/sites/55/2017/02/The-Impact-of-Multicollinearity-on-the-Variation-of-Coefficient-Estimation-When-Using-Logistic-Regression-de-Jongh-and-Webste.pdf>

### 共线性的后果和是否可以强制把coefficent规定为正

<https://stats.stackexchange.com/questions/242142/positive-coefficients-in-regression-analysis>

<https://stats.stackexchange.com/questions/198271/regression-coefficients-seem-to-have-the-wrong-sign-can-i-force-them-to-have-a>

<https://stats.stackexchange.com/questions/104890/is-it-possible-in-r-or-in-general-to-force-regression-coefficients-to-be-a-cer>

<https://stats.stackexchange.com/questions/1580/regression-coefficients-that-flip-sign-after-including-other-predictors>

## Simpson's paradox

<https://en.wikipedia.org/wiki/Simpson's_paradox>

### Ridge Regression with VIF

```python
def vif_ridge(corr_x, pen_factors, is_corr=True):
    """variance inflation factor for Ridge regression

    assumes penalization is on standardized variables
    data should not include a constant

    Parameters
    ----------
    corr_x : array_like
        correlation matrix if is_corr=True or original data if is_corr is False.
    pen_factors : iterable
        iterable of Ridge penalization factors
    is_corr : bool
        Boolean to indicate how corr_x is interpreted, see corr_x

    Returns
    -------
    vif : ndarray
        variance inflation factors for parameters in columns and ridge
        penalization factors in rows

    could be optimized for repeated calculations
    """
    corr_x = np.asarray(corr_x)
    if not is_corr:
        corr = np.corrcoef(corr_x, rowvar=0, bias=True)
    else:
        corr = corr_x

    eye = np.eye(corr.shape[1])
    res = []
    for k in pen_factors:
        minv = np.linalg.inv(corr + k * eye)
        vif = minv.dot(corr).dot(minv)
        res.append(np.diag(vif))
    return np.asarray(res)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zh-1-peng.gitbook.io/kaggle_hacker/elastic-net-and-multicolinearity.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
