⏱ 75 minLive session

Session 9 — Think Like a Researcher

Duration: 75 min · Format: live online

What you'll learn: by the end, you can turn a fuzzy question into a testable hypothesis, design a fair experiment to test it, read a research paper by its four key parts, and tell the difference between a claim that's trustworthy and one that just sounds impressive.

Soft skill focus — Critical thinking

Today you'll also grow Critical thinking. A researcher's real superpower isn't running experiments — it's refusing to be fooled, especially by their own hopes. Every claim you meet today gets the same question: "How would I know if this were wrong?"

Try this: for every result you hear today — yours or anyone else's — say out loud one way it could be misleading (too little data? the wrong comparison? a lucky run?). You're not being negative; you're stress-testing the idea before you trust it.
Think about: "What would have to be true for me to change my mind about this? And has anyone actually checked?"

What you'll need

Google Colab open in a tab — you'll write down an experiment plan, not run a model yet.
The diagram below open so you can trace the research cycle as you read.
One AI claim you've seen online recently ("this model beats humans at X"). You'll test it against today's checklist.

Hook

Two teams announce the same thing: "Our model reads chest X-rays better than doctors." One team is right and about to save lives. One team fooled itself and will waste millions. From the headline alone, they look identical.

The difference is never the excitement — it's the method. Good research isn't a pile of impressive results; it's a chain of careful choices you could check, repeat, and disagree with. Today you learn to see that chain. Once you can, you'll never read an AI claim the same way again — and your own projects stop being lucky and start being real.

Teach — From a fuzzy idea to a testable hypothesis

Research starts with a question, but a vague question can't be answered. "Is this model good?" leads nowhere — good at what, compared to what, measured how?

A research question is specific enough to test. A hypothesis is your predicted answer, stated so clearly that an experiment could prove it wrong.

Fuzzy: "Does more data help?"
Question: "Does training on 10,000 images instead of 1,000 improve accuracy on the same test set?"
Hypothesis: "Accuracy will rise by at least 5 points when training data goes from 1,000 to 10,000 images."

Notice the hypothesis names a number and a comparison. That's what makes it testable — after the experiment, you can look and say "yes" or "no", not "sort of".

⚠ Watch out: a hypothesis you can't be wrong about is worthless. "AI will change the world" can never fail a test, so it teaches you nothing. If there's no result that would make you say "I was wrong", it's an opinion, not a hypothesis.

Teach — What makes an experiment fair

The research cycle: ask, hypothesise, experiment, conclude, then ask again

An experiment is fair when the only thing that could explain your result is the thing you meant to test. That means:

Change one thing (the variable), hold the rest constant. If you use more data and a bigger model and train longer, and accuracy improves — which one did it? You can't know. That's a confounded experiment.
Compare against a baseline. A number alone ("92% accuracy!") is meaningless until you know what to beat. Random guessing? The old model? A human?
Test on data the model has never seen. Grading a model on its training data is like giving students the exam answers in advance — of course they ace it.

The cycle in the diagram never really ends: every conclusion raises a sharper question, and you go round again. That loop is research.

Teach — How to read a paper (four parts)

You don't read a research paper front to back like a story. You read its four load-bearing parts, and you can judge most papers in ten minutes:

Title & Abstract — the claim, in one sentence and one paragraph. What do they say they found?
Method — exactly what they did: data, model, how they measured. Could I repeat this?
Results — the numbers and figures. Do the numbers actually support the claim in the abstract?
Limitations / Conclusion — what they admit they didn't show. An honest paper tells you where it's weak.

A claim is trustworthy when the method is clear enough to reproduce, the comparison is fair, the result is tested on unseen data, and the authors are honest about limits. When any of those is missing, stay sceptical — no matter how exciting the headline.

Activity — Interrogate a claim

You won't train anything today — you'll practise the researcher's core move: taking a claim apart. Open a Colab notebook (or just paper) and work through this.

First, pick a claim. Use the AI claim you brought, or this one: "A new chatbot scored 95% on a medical exam, beating most doctors."

Now write short answers to these five questions:

Question & hypothesis: what precise question is this claim answering, and what would the hypothesis have been? (e.g. "The chatbot answers exam questions at least as accurately as licensed doctors.")
The comparison: 95% vs what? What's the baseline — a random guess, last year's model, the average doctor? A number with no baseline is a red flag.
The test set: was it tested on questions it might have already seen during training? (Many exam questions are public and end up in training data — this is called data leakage, and it fakes high scores.)
The method: is there enough detail that you could repeat the test? If not, why should you trust it?
The limits: what does "beat doctors on an exam" not prove about real medicine? (Exams aren't patients.)

Write your verdict in one sentence: trustworthy, or not-yet-proven — and the single weakest link in the chain.

You just did what peer reviewers do for a living.

Check yourself

What's the difference between a research question and a hypothesis? → The question is what you want to find out; the hypothesis is your specific, testable predicted answer — clear enough that an experiment could prove it wrong.
What makes an experiment "fair"? → You change one variable and hold everything else constant, compare against a baseline, and test on unseen data — so the result can only be explained by the thing you tested.
Which four parts do you read to judge a paper fast? → Title/Abstract (the claim), Method (what they did), Results (the numbers), and Limitations (what they admit they didn't show).

Wrap-up

You've learned the shape of real research: a sharp question, a testable hypothesis, a fair experiment, and honest reading. This is the lens for the whole unit — next you'll do it, not just judge it.

Try this at home: find one AI headline this week and run it through the five interrogation questions. Post or note your one-sentence verdict and the weakest link. Do this three times and you'll start spotting weak claims before you finish the headline.

Tips & extra challenges

Watch out: "state of the art" and "beats humans" are marketing phrases as often as they are facts. The claim isn't stronger because the words are bigger — check the method, always.
Want more? Try this: find a real paper on arXiv (search "image classification"), read only the abstract and results figure, and write down the single claim plus the single number that supports it. If you can't find the number, that tells you something.

Vocabulary

Term	Meaning
Hypothesis	A specific, testable predicted answer that an experiment could prove wrong
Variable	The one thing you deliberately change in an experiment
Confounded	When two changes happen at once, so you can't tell which caused the result
Baseline	The reference score a new result must beat to be meaningful
Data leakage	When test data sneaks into training, faking a high score

Resources

Google Colab — where you'll write and run everything this unit.
arXiv — the open archive where most AI papers appear first, free to read.
Papers with Code — papers paired with the code and datasets to reproduce them.

Practice set

Practise on your own — work these easy → hard. Answers follow each arrow.

1. Testable or not? Is "AI is amazing" a hypothesis? → No — nothing could ever prove it wrong, so it isn't testable. A hypothesis must name a specific, checkable outcome.

2. Spot the confound. A team uses more data and a bigger model, and accuracy goes up. What's the flaw? → The experiment is confounded — two changes at once, so you can't tell which one caused the improvement.

3. Find the missing baseline. A model "gets 80%". Why isn't that enough to celebrate? → There's no baseline — if random guessing already scores 78%, then 80% is barely better than nothing.

4. Name the parts. Which part of a paper tells you whether you could repeat the experiment? → The Method — it should give the data, model and measurement in enough detail to reproduce.

5. Diagnose the leak. A chatbot scores 95% on a public exam whose questions were on the web for years. Why be suspicious? → Likely data leakage — the questions were probably in its training data, so it may be recalling answers, not reasoning them out.

6. Design a fair test (harder). You think a spam filter works better with 5,000 emails than 500. Write the hypothesis and name what you must hold constant. → Hypothesis: "Accuracy on the same held-out test set rises when training grows from 500 to 5,000 emails." Hold constant: the model, the test set, and all settings — change only the training-set size.

Going deeper (optional)

Optional — for when you want to know why one careful experiment beats ten flashy demos.

Why "one variable at a time" really matters. Imagine you tweak five things at once and the score jumps. It feels efficient — but you've learned almost nothing, because any of the five (or a lucky combination) could be responsible, and next time it might not repeat. Researchers call the disciplined alternative an ablation study: start from a working system and remove or change one piece at a time, measuring each effect on its own. It's slower, and it's the only way to earn a sentence like "this change caused that improvement." Every trustworthy result you've ever read was paid for in exactly this patience.

Common mistakes & fixes

Mistake: Writing a hypothesis so vague it can't fail. → Fix: add a number and a comparison ("at least 5 points higher than the baseline") so a result could actually contradict it.
Mistake: Getting excited by a single accuracy number. → Fix: always ask "compared to what baseline?" before you judge whether it's good.
Mistake: Testing a model on the data it trained on. → Fix: hold out unseen data for the test — a score on training data is meaningless.
Mistake: Trusting a claim because the paper is long or the words are impressive. → Fix: judge the method and the honesty about limits, not the polish.
Mistake: Changing several things at once to "save time". → Fix: change one variable, hold the rest constant — otherwise the experiment can't tell you anything.

What's next

Session 10 — Reproduce a Result: you've learned to judge a claim from the outside. Next you'll test one from the inside — take a real dataset and a stated method, run it yourself in Colab, and see whether your numbers match theirs. Reproducing a result is the truest test of whether it was real.