Python机器学习

多元线性回归

多元线性回归详细操作教程
简单线性回归的扩展使用两个或多个特征预测响应。数学上,我们可以解释如下-
将一个具有 n 个观测值, p 特征(即自变量)和 y 作为一个响应(即因变量)的数据集视为p个特征的回归线可以计算如下-
$$ h(x_ {i})\:= \:b_ {0} \:+ \:b_ {1} x_ {i1} \:+ b_ {2} x_ {i2} \ :: + \ dotsm + b_ {p} x_ {ip} $$
这里,$ h(x_ {i})$是预测的响应值,$ b_ {0},b_ {1},b_ {2},\ dotsm \:b_ {p} $是回归系数。
多个线性回归模型始终将数据中的误差称为残差误差,该残差会按以下方式更改计算方式-
$$ h(x_ {i})\:= \:b_ {0} + b_ {1} x_ {i1} + b_ {2} x_ {i2} + \ dotsm + b_ {p} x_ {ip } + e_ {i} $$
我们也可以如下写上等式-
$ y_ {i} \:= \:h(x_ {i})+ e_ {i} \:或\:e_ {i} \:= \:y_ {i} -h(x_ {i} )$

Python实现

在此示例中,我们将使用scikit learning的Boston住房数据集-
首先,我们将从导入必要的包开始,如下所示:-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, metrics
接下来,按如下方式加载数据集-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
boston = datasets.load_boston(return_X_y = False)
以下脚本行将定义特征矩阵X和响应矢量Y-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
X = boston.data
y = boston.target
接下来,按如下所示将数据集分为训练集和测试集-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.7, random_state = 1)
现在,创建线性回归对象并按如下所示训练模型-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
reg = linear_model.LinearRegression()
reg.fit(X_train, y_train)
print('Coefficients: \n', reg.coef_)
print('Variance score: {}'.format(reg.score(X_test, y_test)))
plt.style.use('fivethirtyeight')
plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train, color = "green", s = 10, label = 'Train data')
plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, color = "blue", s = 10, label = 'Test data')
plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2)
plt.legend(loc = 'upper right')
plt.title("Residual errors")
plt.show()
输出
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
Coefficients:
[-1.16358797e-01 6.44549228e-02 1.65416147e-01 1.45101654e+00 -1.77862563e+01
   2.80392779e+00 4.61905315e-02 -1.13518865e+00 3.31725870e-01 -1.01196059e-02
   -9.94812678e-01 9.18522056e-03 -7.92395217e-01]
Variance score: 0.709454060230326
剩余错误
昵称: 邮箱:
Copyright © 2022 立地货 All Rights Reserved.
备案号:京ICP备14037608号-4