3.6.9.11. A simple regression analysis on the Boston housing data¶

Here we perform a simple regression analysis on the Boston housing data, exploring two types of regressors.

fromsklearn.datasetsimportload_boston
data=load_boston()

Print a histogram of the quantity to predict: price

importmatplotlib.pyplotasplt
plt.figure(figsize=(4,3))
plt.hist(data.target)
plt.xlabel('price ($1000s)')
plt.ylabel('count')
plt.tight_layout()

../../../_images/sphx_glr_plot_boston_prediction_001.png

Print the join histogram for each feature

forindex,feature_nameinenumerate(data.feature_names):
plt.figure(figsize=(4,3))
plt.scatter(data.data[:,index],data.target)
plt.ylabel('Price',size=15)
plt.xlabel(feature_name,size=15)
plt.tight_layout()

Simple prediction

fromsklearn.model_selectionimporttrain_test_split
X_train,X_test,y_train,y_test=train_test_split(data.data,data.target)
fromsklearn.linear_modelimportLinearRegression
clf=LinearRegression()
clf.fit(X_train,y_train)
predicted=clf.predict(X_test)
expected=y_test
plt.figure(figsize=(4,3))
plt.scatter(expected,predicted)
plt.plot([0,50],[0,50],'--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()

../../../_images/sphx_glr_plot_boston_prediction_015.png

Prediction with gradient boosted tree

fromsklearn.ensembleimportGradientBoostingRegressor
clf=GradientBoostingRegressor()
clf.fit(X_train,y_train)
predicted=clf.predict(X_test)
expected=y_test
plt.figure(figsize=(4,3))
plt.scatter(expected,predicted)
plt.plot([0,50],[0,50],'--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()

../../../_images/sphx_glr_plot_boston_prediction_016.png

Print the error rate

importnumpyasnp
print("RMS: %r "%np.sqrt(np.mean((predicted-expected)**2)))
plt.show()

Out:

RMS:3.2883542139192028

Total running time of the script: ( 0 minutes 1.277 seconds)

Gallery generated by Sphinx-Gallery