3.6.9.1. Measuring Decision Tree performance¶

Demonstrates overfit when testing on train set.

Get the data

fromsklearn.datasetsimportload_boston
data=load_boston()

Train and test a model

fromsklearn.treeimportDecisionTreeRegressor
clf=DecisionTreeRegressor().fit(data.data,data.target)
predicted=clf.predict(data.data)
expected=data.target

Plot predicted as a function of expected

frommatplotlibimportpyplotasplt
plt.figure(figsize=(4,3))
plt.scatter(expected,predicted)
plt.plot([0,50],[0,50],'--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()

../../../_images/sphx_glr_plot_measuring_performance_001.png

Pretty much no errors!

This is too good to be true: we are testing the model on the train data, which is not a mesure of generalization.

The results are not valid

Total running time of the script: ( 0 minutes 0.111 seconds)

Gallery generated by Sphinx-Gallery