Ibnovate Course 2 · The Rising Builders
⏱ 75 minLive session · ages 12–15

Session 19 — How Machines See

Duration: 75 min · Format: live online · Ages: 12–15

Session goal: by the end, students can explain that a picture is really a grid of brightness numbers, describe what a feature is and how a CNN stacks simple patterns into whole objects, and load and inspect a real image dataset in Colab.

Before class — prep (5 min)

Agenda

Time Segment
0:00 Hook — how does a phone know it's a cat? (5 min)
0:05 Teach — an image is a grid of numbers (14 min)
0:19 Teach — features, and what a CNN does (14 min)
0:33 Activity — inspect a real image dataset in Colab (25 min)
0:58 Check for understanding (10 min)
1:08 Wrap-up + homework (7 min)

0:00 · Hook (5 min)

Ask the class and take a few answers (chat or unmute):

Let them guess, then reveal the big idea for today: to a computer, a picture is not a picture at all — it's a grid of numbers. "Seeing" is just doing math on those numbers. Tell them that by the end they'll have printed a real image as numbers and watched a computer pull it apart.


0:05 · Teach — An image is a grid of numbers (14 min)

Explain: a screen is made of tiny dots called pixels. Each pixel stores a number for how bright it is. A small grey image is just a grid — rows and columns of brightness numbers. Low number = dark, high number = bright.

Share this diagram so students can picture an image as a grid of brightness numbers, where each cell is one pixel's value:

Diagram of a small image shown as a grid of squares, each square labelled with a brightness number: low numbers are dark cells, high numbers are bright cells, together tracing the shape of a digit

Type/run this together in Colab:

from sklearn.datasets import load_digits
import matplotlib.pyplot as plt

digits = load_digits()

first = digits.images[0]   # one small picture of a handwritten digit
print(first)               # the SAME picture, printed as a grid of numbers

You'll see an 8×8 grid of numbers. Point out that the big numbers trace the shape of the digit and the zeros are the dark background. Now show them it really is a picture:

plt.imshow(first, cmap="gray")
plt.show()
print("This digit is labelled:", digits.target[0])

Ask: "The grid of numbers and the picture are the same thing — where are the biggest numbers, and what part of the digit do they line up with?" (Answer: the bright ink strokes; the zeros are the empty background.)

⚠ Watch for the #1 misconception: students think the computer "sees" the way they do. It doesn't — there is no picture inside the computer, only numbers. Every bit of computer vision is math done on that grid.


0:19 · Teach — Features, and what a CNN does (14 min)

Explain: a feature is a useful clue the computer measures from the numbers — not the whole image, just something telling. The simplest possible feature is how much ink is in the image: add up all the brightness numbers.

Type/run this together in Colab:

import numpy as np

ink = np.sum(first)              # add up every brightness number
print("Total ink in this digit:", ink)

Explain that a fat digit like 8 has more ink than a thin 1. That single number already helps a computer tell some digits apart. Real vision uses thousands of smarter features — edges, corners, curves.

Explain how a CNN builds them up (intuition only, no math): a CNN (Convolutional Neural Network) uses tiny grids called filters that slide across the image looking for one simple pattern each — one filter fires on horizontal edges, another on a curve. Then it stacks them:

  1. Layer 1 finds tiny edges and dots.
  2. Layer 2 combines edges into shapes — corners, curves, loops.
  3. Later layers combine shapes into parts and whole objects — an eye, a wheel, a face.

Share this diagram to show how a vision model builds up in stages — from edges, to shapes, to the whole object and its label:

Pipeline diagram of a convolutional neural network: an input image flows left to right through a first stage detecting simple edges and dots, a middle stage combining them into shapes such as corners and curves, and a final stage assembling shapes into whole objects that produce the predicted label

Ask: "Why start with edges instead of jumping straight to 'that's a cat'?" (Answer: every object is built from simple edges and curves, so learning those first lets the same building blocks recognise many objects.)

⚠ Watch for: students imagine the computer stores a "cat photo" to compare against. It doesn't — it learned patterns of numbers (edges → shapes → objects) from many examples, exactly like the pattern-learning from Unit 1.


0:33 · Activity — Inspect a real image dataset (25 min)

Have students open their own Google ColabNew notebook and explore the same handwritten-digits dataset. Screen-share your notebook as a model and build it with them.

Type/run this together in Colab:

from sklearn.datasets import load_digits
import matplotlib.pyplot as plt

digits = load_digits()

print("How many images:", len(digits.images))
print("Size of one image:", digits.images[0].shape)   # (8, 8) = 8 rows, 8 columns

Then show a wall of examples so they feel the dataset:

for i in range(8):
    plt.subplot(1, 8, i + 1)
    plt.imshow(digits.images[i], cmap="gray")
    plt.axis("off")
    plt.title(digits.target[i])
plt.show()

Have them try these on their own and report answers in the chat:

Circulate for (or watch the chat for) the common mix-ups: digits.images is the 8×8 grid for pictures, while digits.data is the same image flattened into 64 numbers in a row — they'll use data next session to train a model.


0:58 · Check for understanding (10 min)

Ask these aloud or drop them in the chat. Answer key (for you):

  1. What is an image, to a computer? → A grid of numbers — one brightness number per pixel. There's no picture inside, only numbers.
  2. What is a feature? → A useful clue measured from the image (e.g. total ink, an edge, a curve) that helps tell things apart.
  3. What does a CNN do, in order? → It stacks filters: finds edges first, combines them into shapes, then into whole objects.

1:08 · Wrap-up + homework (7 min)


Teaching notes

import matplotlib.pyplot as plt
darker = first - 4          # subtract 4 from every brightness number
plt.imshow(darker, cmap="gray")
plt.show()

Ask what changed and why — proof that editing an image is just doing arithmetic on the numbers. - Low-tech fallback: if devices can't run Colab, draw a 5×5 grid on the shared screen, fill some squares with numbers to "draw" a letter, and have students read the shape out of the numbers — then reveal that a real image is exactly this, just bigger.

Vocabulary

Term Meaning
Pixel One dot of an image, storing a brightness number
Grid / array Rows and columns of numbers making up an image
Feature A useful clue measured from the image
CNN A model that stacks filters: edges → shapes → objects
Filter A tiny grid that slides across an image to find one pattern

Resources

Practice set

A mix of concept questions and short coding tasks on pixels, features, and CNNs — easy to hard. Use for lab time or homework.

1. Define it: in one sentence, what is a pixel? → The smallest dot of an image; it stores a number for how bright that spot is.

2. Predict the output: what shape does this print, and what do the two numbers mean? → (8, 8) — the image has 8 rows and 8 columns of pixels.

from sklearn.datasets import load_digits
digits = load_digits()
print(digits.images[0].shape)

3. Reasoning: two digits have ink totals of 55 and 12. Which is more likely the fat 8 and which the thin 1? → 55 = the 8 (more ink); 12 = the 1 (less ink).

4. Order the layers: put these CNN stages in the order they happen — whole object, edges, shapes. → edges → shapes → whole object.

5. Write it: print the total ink of the image at index 3 in the digits dataset. → use np.sum:

import numpy as np
from sklearn.datasets import load_digits
digits = load_digits()
# print the total ink of digits.images[3] here

print(np.sum(digits.images[3])).

6. Reasoning (harder): why is "total ink" a weak feature for telling a 6 from a 9? → They have almost the same amount of ink; ink says nothing about where it is, so the model needs shape/position features too.

7. Count the numbers (hardest): a colour photo is 200×200 pixels with 3 colour channels (Red, Green, Blue). How many brightness numbers is that in total? → 200 × 200 × 3 = 120,000 numbers.

Going deeper (optional)

For a strong class, make "the model only knows numbers" concrete by editing the picture through the numbers. Flip a digit left-to-right with nothing but array math, then show that the label is now visually wrong even though the pixels are unchanged in value — only their positions moved:

import matplotlib.pyplot as plt
from sklearn.datasets import load_digits

digits = load_digits()
first = digits.images[0]

flipped = first[:, ::-1]        # reverse the columns = mirror the image
plt.imshow(flipped, cmap="gray")
plt.show()

Have them predict what [:, ::-1] does before running it, then explain the result: mirroring an image is just reordering the numbers. Land the point — because a picture is only a grid of numbers, every filter, edit, and effect is arithmetic. This is exactly why a CNN can be trained: patterns in numbers are things math can learn.

Common mistakes & fixes

Next session

Session 20 — Build an Image Classifier: students train a real model on these handwritten digits, test it on images it has never seen, and evaluate it honestly.

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒