Ibnovate Course 3 · The Future Builders
⏱ 75 minLive session

Session 9 — Think Like a Researcher

Duration: 75 min · Format: live online

What you'll learn: by the end, you can turn a fuzzy question into a testable hypothesis, design a fair experiment to test it, read a research paper by its four key parts, and tell the difference between a claim that's trustworthy and one that just sounds impressive.

Soft skill focus — Critical thinking

Today you'll also grow Critical thinking. A researcher's real superpower isn't running experiments — it's refusing to be fooled, especially by their own hopes. Every claim you meet today gets the same question: "How would I know if this were wrong?"

What you'll need

Hook

Two teams announce the same thing: "Our model reads chest X-rays better than doctors." One team is right and about to save lives. One team fooled itself and will waste millions. From the headline alone, they look identical.

The difference is never the excitement — it's the method. Good research isn't a pile of impressive results; it's a chain of careful choices you could check, repeat, and disagree with. Today you learn to see that chain. Once you can, you'll never read an AI claim the same way again — and your own projects stop being lucky and start being real.

Teach — From a fuzzy idea to a testable hypothesis

Research starts with a question, but a vague question can't be answered. "Is this model good?" leads nowhere — good at what, compared to what, measured how?

A research question is specific enough to test. A hypothesis is your predicted answer, stated so clearly that an experiment could prove it wrong.

Notice the hypothesis names a number and a comparison. That's what makes it testable — after the experiment, you can look and say "yes" or "no", not "sort of".

⚠ Watch out: a hypothesis you can't be wrong about is worthless. "AI will change the world" can never fail a test, so it teaches you nothing. If there's no result that would make you say "I was wrong", it's an opinion, not a hypothesis.

Teach — What makes an experiment fair

The research cycle: ask, hypothesise, experiment, conclude, then ask again

An experiment is fair when the only thing that could explain your result is the thing you meant to test. That means:

The cycle in the diagram never really ends: every conclusion raises a sharper question, and you go round again. That loop is research.

Teach — How to read a paper (four parts)

You don't read a research paper front to back like a story. You read its four load-bearing parts, and you can judge most papers in ten minutes:

  1. Title & Abstract — the claim, in one sentence and one paragraph. What do they say they found?
  2. Method — exactly what they did: data, model, how they measured. Could I repeat this?
  3. Results — the numbers and figures. Do the numbers actually support the claim in the abstract?
  4. Limitations / Conclusion — what they admit they didn't show. An honest paper tells you where it's weak.

A claim is trustworthy when the method is clear enough to reproduce, the comparison is fair, the result is tested on unseen data, and the authors are honest about limits. When any of those is missing, stay sceptical — no matter how exciting the headline.

Activity — Interrogate a claim

You won't train anything today — you'll practise the researcher's core move: taking a claim apart. Open a Colab notebook (or just paper) and work through this.

First, pick a claim. Use the AI claim you brought, or this one: "A new chatbot scored 95% on a medical exam, beating most doctors."

Now write short answers to these five questions:

  1. Question & hypothesis: what precise question is this claim answering, and what would the hypothesis have been? (e.g. "The chatbot answers exam questions at least as accurately as licensed doctors.")
  2. The comparison: 95% vs what? What's the baseline — a random guess, last year's model, the average doctor? A number with no baseline is a red flag.
  3. The test set: was it tested on questions it might have already seen during training? (Many exam questions are public and end up in training data — this is called data leakage, and it fakes high scores.)
  4. The method: is there enough detail that you could repeat the test? If not, why should you trust it?
  5. The limits: what does "beat doctors on an exam" not prove about real medicine? (Exams aren't patients.)

Write your verdict in one sentence: trustworthy, or not-yet-proven — and the single weakest link in the chain.

You just did what peer reviewers do for a living.

Check yourself

  1. What's the difference between a research question and a hypothesis? → The question is what you want to find out; the hypothesis is your specific, testable predicted answer — clear enough that an experiment could prove it wrong.
  2. What makes an experiment "fair"? → You change one variable and hold everything else constant, compare against a baseline, and test on unseen data — so the result can only be explained by the thing you tested.
  3. Which four parts do you read to judge a paper fast?Title/Abstract (the claim), Method (what they did), Results (the numbers), and Limitations (what they admit they didn't show).

Wrap-up

You've learned the shape of real research: a sharp question, a testable hypothesis, a fair experiment, and honest reading. This is the lens for the whole unit — next you'll do it, not just judge it.

Tips & extra challenges

Vocabulary

Term Meaning
Hypothesis A specific, testable predicted answer that an experiment could prove wrong
Variable The one thing you deliberately change in an experiment
Confounded When two changes happen at once, so you can't tell which caused the result
Baseline The reference score a new result must beat to be meaningful
Data leakage When test data sneaks into training, faking a high score

Resources

Practice set

Practise on your own — work these easy → hard. Answers follow each arrow.

1. Testable or not? Is "AI is amazing" a hypothesis? → No — nothing could ever prove it wrong, so it isn't testable. A hypothesis must name a specific, checkable outcome.

2. Spot the confound. A team uses more data and a bigger model, and accuracy goes up. What's the flaw? → The experiment is confounded — two changes at once, so you can't tell which one caused the improvement.

3. Find the missing baseline. A model "gets 80%". Why isn't that enough to celebrate? → There's no baseline — if random guessing already scores 78%, then 80% is barely better than nothing.

4. Name the parts. Which part of a paper tells you whether you could repeat the experiment? → The Method — it should give the data, model and measurement in enough detail to reproduce.

5. Diagnose the leak. A chatbot scores 95% on a public exam whose questions were on the web for years. Why be suspicious? → Likely data leakage — the questions were probably in its training data, so it may be recalling answers, not reasoning them out.

6. Design a fair test (harder). You think a spam filter works better with 5,000 emails than 500. Write the hypothesis and name what you must hold constant. → Hypothesis: "Accuracy on the same held-out test set rises when training grows from 500 to 5,000 emails." Hold constant: the model, the test set, and all settings — change only the training-set size.

Going deeper (optional)

Optional — for when you want to know why one careful experiment beats ten flashy demos.

Why "one variable at a time" really matters. Imagine you tweak five things at once and the score jumps. It feels efficient — but you've learned almost nothing, because any of the five (or a lucky combination) could be responsible, and next time it might not repeat. Researchers call the disciplined alternative an ablation study: start from a working system and remove or change one piece at a time, measuring each effect on its own. It's slower, and it's the only way to earn a sentence like "this change caused that improvement." Every trustworthy result you've ever read was paid for in exactly this patience.

Common mistakes & fixes

What's next

Session 10 — Reproduce a Result: you've learned to judge a claim from the outside. Next you'll test one from the inside — take a real dataset and a stated method, run it yourself in Colab, and see whether your numbers match theirs. Reproducing a result is the truest test of whether it was real.

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒