3.6.9.11. A simple regression analysis on the Boston housing dataΒΆ

Here we perform a simple regression analysis on the Boston housing data, exploring two types of regressors.

from sklearn.datasets import load_boston
data = load_boston()

Print a histogram of the quantity to predict: price

import matplotlib.pyplot as plt
plt.figure(figsize=(4, 3))
plt.hist(data.target)
plt.xlabel('price ($1000s)')
plt.ylabel('count')
plt.tight_layout()
../../../_images/sphx_glr_plot_boston_prediction_001.png

Print the join histogram for each feature

for index, feature_name in enumerate(data.feature_names):
plt.figure(figsize=(4, 3))
plt.scatter(data.data[:, index], data.target)
plt.ylabel('Price', size=15)
plt.xlabel(feature_name, size=15)
plt.tight_layout()
  • ../../../_images/sphx_glr_plot_boston_prediction_002.png
  • ../../../_images/sphx_glr_plot_boston_prediction_003.png
  • ../../../_images/sphx_glr_plot_boston_prediction_004.png
  • ../../../_images/sphx_glr_plot_boston_prediction_005.png
  • ../../../_images/sphx_glr_plot_boston_prediction_006.png
  • ../../../_images/sphx_glr_plot_boston_prediction_007.png
  • ../../../_images/sphx_glr_plot_boston_prediction_008.png
  • ../../../_images/sphx_glr_plot_boston_prediction_009.png
  • ../../../_images/sphx_glr_plot_boston_prediction_010.png
  • ../../../_images/sphx_glr_plot_boston_prediction_011.png
  • ../../../_images/sphx_glr_plot_boston_prediction_012.png
  • ../../../_images/sphx_glr_plot_boston_prediction_013.png
  • ../../../_images/sphx_glr_plot_boston_prediction_014.png

Simple prediction

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)
from sklearn.linear_model import LinearRegression
clf = LinearRegression()
clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
expected = y_test
plt.figure(figsize=(4, 3))
plt.scatter(expected, predicted)
plt.plot([0, 50], [0, 50], '--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()
../../../_images/sphx_glr_plot_boston_prediction_015.png

Prediction with gradient boosted tree

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor()
clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
expected = y_test
plt.figure(figsize=(4, 3))
plt.scatter(expected, predicted)
plt.plot([0, 50], [0, 50], '--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()
../../../_images/sphx_glr_plot_boston_prediction_016.png

Print the error rate

import numpy as np
print("RMS: %r " % np.sqrt(np.mean((predicted - expected) ** 2)))
plt.show()

Out:

RMS: 3.2883542139192028

Total running time of the script: ( 0 minutes 1.277 seconds)

Gallery generated by Sphinx-Gallery