3.6.9.13. Simple visualization and classification of the digits dataset¶

Plot the first few samples of the digits dataset and a 2D representation built using PCA, then do a simple classification

fromsklearn.datasetsimportload_digits
digits=load_digits()

Plot the data: images of digits¶

Each data in a 8x8 image

frommatplotlibimportpyplotasplt
fig=plt.figure(figsize=(6,6))# figure size in inches
fig.subplots_adjust(left=0,right=1,bottom=0,top=1,hspace=0.05,wspace=0.05)
foriinrange(64):
ax=fig.add_subplot(8,8,i+1,xticks=[],yticks=[])
ax.imshow(digits.images[i],cmap=plt.cm.binary,interpolation='nearest')
# label the image with the target value
ax.text(0,7,str(digits.target[i]))

../../../_images/sphx_glr_plot_digits_simple_classif_001.png

Plot a projection on the 2 first principal axis¶

plt.figure()
fromsklearn.decompositionimportPCA
pca=PCA(n_components=2)
proj=pca.fit_transform(digits.data)
plt.scatter(proj[:,0],proj[:,1],c=digits.target)
plt.colorbar()

../../../_images/sphx_glr_plot_digits_simple_classif_002.png

Classify with Gaussian naive Bayes¶

fromsklearn.naive_bayesimportGaussianNB
fromsklearn.model_selectionimporttrain_test_split
# split the data into training and validation sets
X_train,X_test,y_train,y_test=train_test_split(digits.data,digits.target)
# train the model
clf=GaussianNB()
clf.fit(X_train,y_train)
# use the model to predict the labels of the test data
predicted=clf.predict(X_test)
expected=y_test
# Plot the prediction
fig=plt.figure(figsize=(6,6))# figure size in inches
fig.subplots_adjust(left=0,right=1,bottom=0,top=1,hspace=0.05,wspace=0.05)
# plot the digits: each image is 8x8 pixels
foriinrange(64):
ax=fig.add_subplot(8,8,i+1,xticks=[],yticks=[])
ax.imshow(X_test.reshape(-1,8,8)[i],cmap=plt.cm.binary,
interpolation='nearest')
# label the image with the target value
ifpredicted[i]==expected[i]:
ax.text(0,7,str(predicted[i]),color='green')
else:
ax.text(0,7,str(predicted[i]),color='red')

../../../_images/sphx_glr_plot_digits_simple_classif_003.png

Quantify the performance¶

First print the number of correct matches

matches=(predicted==expected)
print(matches.sum())

Out:

The total number of data points

print(len(matches))

Out:

And now, the ration of correct predictions

matches.sum()/float(len(matches))

Print the classification report

fromsklearnimportmetrics
print(metrics.classification_report(expected,predicted))

Out:

precisionrecallf1-scoresupport
981.000.9953
820.880.8542
900.910.9057
940.760.8438
970.850.9140
930.960.9545
950.970.9640
780.980.8751
780.870.8246
000.660.7938
avg/total0.900.890.89450

Print the confusion matrix

print(metrics.confusion_matrix(expected,predicted))
plt.show()

Out:

[[53000000000]
[03710002110]
[02520100020]
[00329010140]
[01003400410]
[00010430010]
[00000139000]
[00000105000]
[04000002400]
[11210006225]]

Total running time of the script: ( 0 minutes 5.056 seconds)

Gallery generated by Sphinx-Gallery