Python机器学习

简单线性回归

简单线性回归详细操作教程
这是线性回归的最基本版本,可使用单个功能预测响应。 SLR中的假设是两个变量是线性相关的。

Python实现

我们可以通过两种方式在Python中实现SLR,一种是提供自己的数据集,另一种是使用scikit-learn python库中的数据集。
示例1 -在以下Python实现示例中,我们使用了自己的数据集。
首先,我们将从导入必要的包开始,如下所示:-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
接下来,定义一个函数,该函数将计算SLR的重要值-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
def coef_estimation(x, y):
下面的脚本行将给出观察次数n-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
n = np.size(x)
x和y向量的平均值可以计算如下-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
m_x, m_y = np.mean(x), np.mean(y)
我们可以找到关于x的交叉偏差和偏差,如下所示-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
接下来,可以按如下方式计算回归系数,即b:-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
接下来,我们需要定义一个函数,该函数将绘制回归线并预测响应向量-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
def plot_regression_line(x, y, b):
以下脚本行将实际点绘制为散点图-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
plt.scatter(x, y, color = "m", marker = "o", s = 30)
以下脚本行将预测响应向量-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
y_pred = b[0] + b[1]*x
以下脚本行将绘制回归线并在其上贴上标签-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
最后,我们需要定义main()函数以提供数据集并调用上面定义的函数-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
def main():
   x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
   y = np.array([100, 300, 350, 500, 750, 800, 850, 900, 1050, 1250])
   b = coef_estimation(x, y)
   print("Estimated coefficients:\nb_0 = {} \nb_1 = {}".format(b[0], b[1]))
   plot_regression_line(x, y, b)
   if __name__ == "__main__":
main()
输出
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
Estimated coefficients:
b_0 = 154.5454545454545
b_1 = 117.87878787878788
 Python实现
示例2 -在以下Python实现示例中,我们使用的是scikit-learn的糖尿病数据集。
首先,我们将从导入必要的包开始,如下所示:-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
接下来,我们将加载糖尿病数据集并创建其对象-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
diabetes = datasets.load_diabetes()
在实现SLR时,我们将仅使用一种功能,如下所示-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
X = diabetes.data[:, np.newaxis, 2]
接下来,我们需要将数据分为以下训练集和测试集-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
X_train = X[:-30]
X_test = X[-30:]
接下来,我们需要将目标分为以下训练集和测试集-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
y_train = diabetes.target[:-30]
y_test = diabetes.target[-30:]
现在,要训练模型,我们需要如下创建线性回归对象-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
regr = linear_model.LinearRegression()
接下来,使用以下训练集训练模型-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
regr.fit(X_train, y_train)
接下来,使用以下测试集进行预测-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
y_pred = regr.predict(X_test)
接下来,我们将打印一些系数,例如MSE,方差得分等,如下所示-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
print('Coefficients: \n', regr.coef_)
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))
print('Variance score: %.2f' % r2_score(y_test, y_pred))
现在,按如下所示绘制输出-
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
plt.scatter(X_test, y_test, color = 'blue')
plt.plot(X_test, y_pred, color = 'red', linewidth = 3)
plt.xticks(())
plt.yticks(())
plt.show()
输出
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-27
Coefficients:
   [941.43097333]
Mean squared error: 3035.06
Variance score: 0.41
简单线性回归
昵称: 邮箱:
Copyright © 2022 立地货 All Rights Reserved.
备案号:京ICP备14037608号-4