Ibnovate Course 2 · The Rising Builders
⏱ 75 minLive session · ages 12–15

Session 20 — Build an Image Classifier

Duration: 75 min · Format: live online · Ages: 12–15

Session goal: by the end, students can train a real image classifier on handwritten digits, measure its accuracy honestly on images it has never seen, and use a confusion matrix and a misread image to describe where and why it fails.

Before class — prep (5 min)

Agenda

Time Segment
0:00 Hook — could you tell a 4 from a 9 by numbers alone? (5 min)
0:05 Teach — flatten the image, then it's just Unit 1 (13 min)
0:18 Teach — honest evaluation: the test set doesn't lie (13 min)
0:31 Activity — train, test, and interrogate your classifier (27 min)
0:58 Check for understanding (10 min)
1:08 Wrap-up + homework (7 min)

0:00 · Hook (5 min)

Ask the class and take a few answers (chat or unmute):

Land it: today they'll train a model that reads handwritten digits from numbers alone — and, just as importantly, they'll measure how often it's wrong and what it confuses. Honest evaluation is the real skill.


0:05 · Teach — Flatten the image, then it's just Unit 1 (13 min)

Explain: scikit-learn learns from a table — one row per example, one column per feature. An 8×8 image has 64 pixels, so we lay those 64 numbers out in a single row. That "flattened" image is just 64 features. After that, it's the exact same .fit() / .predict() recipe from Unit 1.

Type/run this together in Colab:

from sklearn.datasets import load_digits

digits = load_digits()
X = digits.data     # each image already flattened to 64 numbers (the features)
y = digits.target   # the correct digit 0–9 (the label we want to predict)

print("X shape:", X.shape)   # (1797, 64) = 1797 images, 64 features each
print("First image as 64 numbers:", X[0])
print("Its correct label:", y[0])

Ask: "What is one feature here, and what is the label?" (Answer: each of the 64 pixel-brightness numbers is a feature; the label is the digit 0–9.)

⚠ Watch for: students expect to feed the model a picture. It only takes a row of numbers (X). The picture is for us to look at; the model never sees an image, just features.


0:18 · Teach — Honest evaluation: the test set doesn't lie (13 min)

Explain: exactly like Unit 1, we split the data — the model learns from the training set and is graded on a hidden test set it never saw. Accuracy on the test set is the only honest number. But accuracy alone hides what it gets wrong, so we also read a confusion matrix — a grid showing which digits get mistaken for which.

Picture the train/test split: split the data into training and test sets, train on one, then check accuracy on the unseen data.

Diagram of splitting data into a training set and a test set, training the model, then checking accuracy on the unseen test data

Type/run this together in Colab:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)          # learn from the training images

preds = model.predict(X_test)        # guess on images it has NEVER seen
print("Accuracy:", accuracy_score(y_test, preds))

Run it live — students see something around 0.95–0.97. Celebrate, then immediately push: "That still means a few out of every hundred are wrong. Which ones?"

Ask: "Why do we test on X_test and never on X_train?" (Answer: the model already saw the training answers; grading on them is cheating and hides real mistakes — the Unit 1 rule.)

⚠ Watch for: students want to stop at the accuracy number and call it a win. Push them to ask which digits it confuses and why — that's the difference between a demo and real evaluation.


0:31 · Activity — Train, test, and interrogate your classifier (27 min)

Have students open their own Google ColabNew notebook, build the classifier above, then interrogate it. Screen-share and go line by line.

Type/run this together in Colab — read the confusion matrix:

from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_test, preds))

Explain how to read it: row = the true digit, column = the model's guess. Big numbers on the diagonal = correct. Any big number off the diagonal is a specific confusion (e.g. some 8s guessed as 1).

Now look a wrong prediction in the eye:

import numpy as np
import matplotlib.pyplot as plt

wrong = np.where(preds != y_test)[0]   # positions the model got wrong
print("Number wrong out of", len(y_test), ":", len(wrong))

i = wrong[0]
plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
plt.show()
print("Model said:", preds[i], " | Real answer:", y_test[i])

Ask as they run it: "Look at the messy digit — can you even tell what it is? Is the model's mistake understandable?" (Often the handwriting is genuinely ambiguous — a great honesty moment.)

Then have them experiment and report in the chat:

Circulate for the two common errors: forgetting .reshape(8, 8) when showing a flattened row (it's 64 numbers, not a grid), and calling .predict() before .fit().


0:58 · Check for understanding (10 min)

Ask these aloud or drop them in the chat. Answer key (for you):

  1. Why do we flatten each image into 64 numbers? → scikit-learn learns from a table of features — one row per example; the 64 pixels become 64 features.
  2. Why is test-set accuracy the honest score? → It's measured on images the model never saw; testing on training data is cheating and hides mistakes.
  3. What does a confusion matrix tell you that accuracy doesn't?Which digits get mistaken for which — the specific errors, not just the overall rate.

1:08 · Wrap-up + homework (7 min)


Teaching notes

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
print("KNN accuracy:", knn.score(X_test, y_test))

# per-digit precision/recall for the first model
print(classification_report(y_test, preds))

Ask: which model wins? Which single digit does the model handle worst (lowest score in the report), and why might that digit be hard? Connect back to Unit 1: a single accuracy number can hide a weak spot. - Low-tech fallback: if devices can't run Colab, build it on your shared screen and have students predict each output before you press Run. As an unplugged contrast, demo Teachable Machine — train a webcam image classifier with a few examples and watch it succeed and fail live.

Vocabulary

Term Meaning
Classifier A model that sorts inputs into categories (here, digits 0–9)
Flatten Lay an image's pixels out as one row of numbers
Accuracy Fraction of test images the model gets right
Confusion matrix A grid of which classes get mistaken for which
Held-out test set Data kept hidden to grade the model honestly

Resources

Practice set

A mix of concept questions and short coding tasks on training, testing, and honest evaluation — easy to hard. Use for lab time or homework.

1. Vocabulary check: in this project, what is X and what is y? → X = the flattened images (64 features each); y = the correct digit label 0–9.

2. Spot the "learning" line: which line teaches the model? → model.fit(X_train, y_train).fit() learns the pattern; .predict() only uses it.

3. Fix the bug: this errors when showing a test image — why, and how do you fix it? → a flattened row is 64 numbers, not a grid; reshape it: X_test[i].reshape(8, 8).

import matplotlib.pyplot as plt
plt.imshow(X_test[0], cmap="gray")   # errors: X_test[0] is 64 numbers in a row

4. Reasoning: a classmate reports 100% accuracy but tested on X_train. Trustworthy? → No — the model already saw those answers; grade on the held-out X_test instead.

5. Read the matrix: in a confusion matrix, the cell at row 4, column 9 holds 6. What does that mean? → Six images that were really a 4 were guessed as a 9 — a specific 4↔9 confusion.

6. Write it (harder): print how many test images the model got wrong. → count the mismatches:

import numpy as np
# preds and y_test already exist
# print the number of wrong predictions here

print(np.sum(preds != y_test)).

7. Reasoning (hardest): your model is 96% accurate overall but only 80% on the digit 8. Why does the overall number hide this, and why does it matter? → Accuracy averages over all digits, so a weak class gets buried; it matters because the model is quietly unreliable for 8s — check per-class scores to catch it.

Going deeper (optional)

For a strong class, expose that the model gives a confidence, not a certainty — and that low confidence often lines up with its mistakes. predict_proba returns the probability the model assigns to each digit:

import numpy as np

probs = model.predict_proba(X_test)     # probability for each digit, per image
i = 0
print("Model's guess:", model.predict(X_test)[i:i+1][0])
print("Its confidence:", np.max(probs[i]).round(3))   # highest probability

Have them find a wrong prediction (from the wrong array earlier) and print its confidence — it's usually lower than a correct one. Land the lesson: a responsible builder doesn't just take the guess, they check how sure the model is, and can flag low-confidence cases for a human. This is the honest-evaluation mindset they'll carry into their own project in Session 22.

Common mistakes & fixes

Next session

Session 21 — How Machines Read: students switch from images to text — turning words into numbers (tokens) and building a small sentiment classifier in Python.

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒