Configuration
Data Preprocessing
1
2
3
4
5
6
7
8
9
10
11
| from sklearn import datasets
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
import pandas as pd
boston = datasets.load_boston()
X = pd.DataFrame(boston.data)
X.columns = boston.feature_names
y = pd.DataFrame(boston.target)
y.columns = ['PRICE']
y = y['PRICE']
|
Use Boston Housing Data from scikit-learm to predict house prices.
Analysis
1
2
3
4
5
6
7
8
9
| linear_regression = linear_model.LinearRegression()
linear_regression.fit(X=pd.DataFrame(X), y=y)
prediction = linear_regression.predict(X=pd.DataFrame(X))
a = linear_regression.intercept_
b = linear_regression.coef_
print("a : %.2f" %a)
print("b : %.2f" %b)
|
$y=a+b_1x_1+b_2x_2+…+b_{13}x_{13}$
Residual Calculation
1
2
| residuals = y - prediction
residuals.describe()
|
Coefficient of Determination Calculation
1
2
3
4
5
| SSE = (residuals**2).sum()
SST = ((y-y.mean())**2).sum()
r2 = 1 - SSE/SST
print("R Squared : %.3f" %r2)
|
R Squred : 0.741
MSE Calculation
1
2
3
4
5
| score = linear_regression.score(X=pd.DataFrame(X), y=y)
MSE = mean_squared_error(prediction, y)
print("Score : %.3f" %score)
print("MSE : %.2f" %MSE)
|
Score : 0.741 MSE : 21.89