⏱ 75 minLive session · ages 12–15

Session 21 — How Machines Read

Duration: 75 min · Format: live online · Ages: 12–15

Session goal: by the end, students can explain how a sentence is split into tokens and counted into numbers, build and test a small sentiment classifier in Python, and name real limits like sarcasm and unseen words.

Before class — prep (5 min)

Open Google Colab → New notebook, ready to screen-share. You'll build the text classifier live. (scikit-learn is already in Colab — no setup.)
Reminder for yourself: last session images became rows of numbers; today words become numbers too, then it's the same .fit()/.predict() recipe from Unit 1.
Optional: if you want the wow-moment at the end, have the Hugging Face pipeline demo (in Going deeper) ready — but it needs a one-time pip install, so test it beforehand.

Agenda

Time	Segment
0:00	Hook — how does a phone know a review is happy? (5 min)
0:05	Teach — text becomes tokens, then numbers (14 min)
0:19	Teach — a bag of words can be classified (13 min)
0:32	Activity — build a sentiment classifier in Colab (26 min)
0:58	Check for understanding (10 min)
1:08	Wrap-up + homework (7 min)

0:00 · Hook (5 min)

Ask the class and take a few answers (chat or unmute):

"An app flags a review as happy or angry before a human reads it. How could a computer possibly 'read' the feeling?"
"A computer only does math on numbers. So how do you turn the sentence 'I love this' into numbers?"

Land it: computers can't read words — but they can count them. Today they'll turn sentences into numbers and train a model to tell happy text from unhappy text — then find exactly where it gets fooled.

0:05 · Teach — Text becomes tokens, then numbers (14 min)

Explain: the first step in every language model is tokenizing — chopping text into pieces called tokens (here, simply words). Then each token becomes a number the computer can count.

Share this diagram so students can follow how text is split into tokens, counted into numbers, and read by a model that predicts the mood:

Pipeline diagram of natural language processing: a sentence flows left to right, first splitting into separate word tokens, then becoming a bag-of-words table of counts, then feeding a model that outputs a positive or negative sentiment label

Type/run this together in Colab:

text = "I really love this movie"

tokens = text.lower().split()   # lowercase, then split on spaces
print(tokens)                   # ['i', 'really', 'love', 'this', 'movie']
print("Number of tokens:", len(tokens))

Explain each move: lower() so Love and love count as the same word, split() to break on spaces. Now show how a computer turns a whole set of sentences into a table of word counts — the "bag of words":

from sklearn.feature_extraction.text import CountVectorizer

texts = ["I love this", "I hate this"]
vectorizer = CountVectorizer()
counts = vectorizer.fit_transform(texts)

print("Words it found:", vectorizer.get_feature_names_out())
print(counts.toarray())         # one row per sentence, one column per word

Walk through the grid: each column is a word, each row is a sentence, each number is how many times that word appeared. The sentence is now just numbers.

Ask: "Why lowercase everything first?" (Answer: so Love, love, and LOVE are treated as the same word instead of three different ones.)

⚠ Watch for the #1 misconception: students think the model understands the words. It doesn't — it only counts them. It has no idea what "love" means; it just learns that the count of certain words goes with "positive."

0:19 · Teach — A bag of words can be classified (13 min)

Explain: once every sentence is a row of word-counts, text classification is the same train/test/.fit() recipe from Unit 1 — the features are just word counts instead of pixels. We give the model labelled examples (positive / negative) and it learns which words lean which way.

Type/run this together in Colab:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ["I love this", "This is great", "Absolutely wonderful", "Best day ever",
         "I hate this", "This is terrible", "So boring", "Worst day ever"]
labels = ["positive", "positive", "positive", "positive",
          "negative", "negative", "negative", "negative"]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)   # words -> numbers (the features)

model = LogisticRegression()
model.fit(X, labels)                  # learn which words lean positive/negative
print("Trained on", len(texts), "examples.")

Explain that X is the bag-of-words table and labels is the target — identical shape to every model they've built.

Ask: "This model saw only 8 tiny sentences. Do you trust it yet?" (Answer: no — far too little data; a great honesty setup for the activity.)

⚠ Watch for: students assume more clever wording helps the model. What helps is more, varied, labelled examples — the same data lesson from Unit 1, now for text.

0:32 · Activity — Build a sentiment classifier (26 min)

Have students open their own Google Colab → New notebook, build the classifier above, then test it and try to break it. Screen-share and build line by line.

Type/run this together in Colab — predict on brand-new sentences:

new_texts = ["I really love this movie", "This was so boring"]
new_X = vectorizer.transform(new_texts)     # SAME vectorizer, don't refit
print(model.predict(new_X))

Point out the crucial detail: use vectorizer.transform (not fit_transform) on new text, so it uses the same word columns it learned. Then turn students loose to stress-test it and report in the chat:

Try sentences that should be positive/negative and see if it agrees.
Try to fool it with sarcasm: "Oh great, another rainy day". Ask: "What does it say, and is it right? Why does it fail?"
Try a word the model never saw, like "This is fantastic" (if fantastic wasn't in training). Ask: "What happens to an unknown word?" (Answer: it's ignored — the model has no column for it.)

Then measure honestly. Have them see that unknown words simply vanish:

mystery = vectorizer.transform(["This is fantastic and superb"])
print(mystery.toarray())     # likely all zeros — none of those words were learned

Circulate for the classic mistakes: calling fit_transform on new text (which re-learns the vocabulary and breaks alignment) and expecting the model to handle words it never trained on.

0:58 · Check for understanding (10 min)

Ask these aloud or drop them in the chat. Answer key (for you):

What is a token, and what's the first step to "read" text? → A token is a piece of text (here, a word); the first step is tokenizing — splitting text into tokens.
How does "I love this" become numbers? → Bag of words — count how many times each known word appears; each count is a feature.
Name one honest limit of this model. → e.g. sarcasm, unknown words it never saw, tiny/biased training data, or it ignores word order.

1:08 · Wrap-up + homework (7 min)

Ask one student to explain, in their own words, why the model doesn't actually understand the sentence.
Homework — Break your classifier: find 3 sentences your model gets wrong. For each, write one line on why — sarcasm? an unknown word? word order? Then write one sentence: what data would you add to fix it? Bring it to Session 22 — next session you pick vision or text and build your own project.

Teaching notes

Correct this misconception: "the model understands language." It only counts words; it has no meaning, no context, no idea what a word refers to.
fit_transform vs transform: call fit_transform once on the training texts to learn the vocabulary, then transform on all new text to reuse the same columns. Refitting on new text silently breaks the alignment — flag it before they hit it.
Word order is thrown away: "dog bites man" and "man bites dog" produce the identical bag of words. Mention this as a real limitation and a reason more advanced models (which read order) exist.
Fast finishers (extension) — measure it, then peek inside: real evaluators split text data and check accuracy too. Have them build a bigger labelled list (12–20 sentences), do a train/test split, and print accuracy — then read which words the model treats as most positive/negative:

import numpy as np

words = vectorizer.get_feature_names_out()
weights = model.coef_[0]                       # how each word pushes the label
order = np.argsort(weights)
print("Most negative words:", words[order[:3]])
print("Most positive words:", words[order[-3:]])

Ask whether the learned "positive" and "negative" words make sense — and what a weird one reveals about small, biased data (a word looks positive only because it happened to sit in positive examples). This ties straight back to Unit 1's bias lesson. - Low-tech fallback: if devices can't run Colab, do bag-of-words on the shared screen — tally word counts for two happy and two angry sentences by hand, then have students "predict" a new sentence by which word-counts it matches. Reveal that scikit-learn does exactly this counting.

Vocabulary

Term	Meaning
Token	A piece of text, usually a word
Tokenize	Split text into tokens
Bag of words	Counting how often each word appears, ignoring order
Sentiment	Whether text is positive or negative
Vectorizer	The tool that turns text into number counts

Resources

Google Colab — where you build it all (free).
scikit-learn — text feature extraction — how CountVectorizer works.
Hugging Face — pipelines — a free, one-line sentiment model (see Going deeper).
Kaggle — Natural Language Processing — free next-step lessons.

Practice set

A mix of concept questions and short coding tasks on tokens, bag of words, and honest limits — easy to hard. Use for lab time or homework.

1. Define it: what does it mean to tokenize a sentence? → Split it into pieces (tokens) — here, individual words.

2. Predict the output: what does this print? → ['i', 'love', 'pizza'] — lowercased and split on spaces.

print("I Love pizza".lower().split())

3. Reasoning: why do we lower() text before counting words? → So Love, love, and LOVE count as the same word, not three different ones.

4. Read the bag: for the sentences ["good good movie", "bad movie"], the word movie appears in both. In the counts table, what number sits in the movie column for each row? → 1 and 1 — it appears once in each sentence.

5. Fix the bug: why does predicting on new text with fit_transform misbehave? → fit_transform re-learns the vocabulary from the new text, breaking alignment with the trained model; use vectorizer.transform(...) instead.

new_X = vectorizer.fit_transform(["I love this"])   # wrong on new text
print(model.predict(new_X))

6. Reasoning (harder): the model gets "Oh great, another Monday" wrong and calls it positive. Why? → It counts the positive word great and can't detect sarcasm — it has no sense of tone or context.

7. Reasoning (hardest): "dog bites man" and "man bites dog" get the exact same bag of words. What limitation does this reveal, and why does it matter? → Bag of words ignores order, so it can't tell who did what — meaning that depends on order is lost.

Going deeper (optional)

For a class that's flying, show a modern model that does handle unseen words and some context — a pretrained sentiment model in one line with Hugging Face. It's free but downloads a model the first time, so run it once yourself before class:

!pip install -q transformers
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
print(classifier("I really love this movie"))
print(classifier("Oh great, another rainy day"))    # try to fool it too

Contrast it honestly with their own model: this one was trained on millions of examples, so it knows far more words and some tone — but it's still not perfect (test the sarcasm line and see). Land the lesson: bigger training data buys more coverage, but no text model truly understands — they all have limits worth naming. This is exactly the honesty mindset for their Session 22 project.

Common mistakes & fixes

Mistake: believing the model understands the words. → Fix: it only counts them; it learns which counts go with which label, nothing more.
Mistake: calling fit_transform on new text. → Fix: fit_transform once on training data, then transform on new text so the word columns stay aligned.
Mistake: expecting it to handle words it never trained on. → Fix: unknown words have no column, so they're ignored — add them to the training data to teach them.
Mistake: trusting it on sarcasm or tone. → Fix: bag of words has no sense of tone; sarcasm regularly fools it — name this as a real limit.
Mistake: thinking word order is captured. → Fix: bag of words ignores order — "dog bites man" equals "man bites dog" to the model.

Next session

Session 22 — Your AI Mini-Project & Showcase: students pick vision or text, build their own small classifier, evaluate it honestly, and present it — the build project for this unit.