Ibnovate Course 3 · The Future Builders
⏱ 75 minLive session

Session 10 — Reproduce a Result

Duration: 75 min · Format: live online

What you'll learn: by the end, you can take a claimed result, load the data, run the stated method yourself in Colab, compare your numbers to theirs, and report honestly what you found — including when it doesn't match.

Soft skill focus — Resilience

Today you'll also grow Resilience. Reproduction is where research gets humbling: your first run will probably not match, the code will throw errors, and the temptation to fudge the numbers is real. Resilience is staying honest and steady through all of that.

What you'll need

Hook

In 2016, researchers tried to reproduce 100 published psychology experiments. Fewer than half came out the same. It shook the field — and it's why "reproducibility" became one of the most important words in science, AI included.

A result that only works once, in one lab, on one lucky run, isn't knowledge — it's an anecdote. The way you tell the difference is simple and brutal: do it again yourself. Today you reproduce a real machine-learning result from scratch, and you'll feel exactly why this is the truest test there is.

Teach — What reproducibility actually means

A result is reproducible if someone else, following the same method on the same data, gets the same answer. It sounds obvious, but it's the thing most claims quietly fail.

Three habits make your own work reproducible:

Teach — Control your variables

A fair test: change one thing, keep everything else the same

Reproducing isn't just re-running — it's re-running fairly. When you compare "their method" to "another method", the only thing allowed to differ is the method. Same data, same split, same seed, same test set. Change one thing; hold everything else constant.

If you change the method and the train/test split at the same time and the score moves, you've learned nothing about the method. This is the single most common way people fool themselves — and the fair-test diagram is your guard against it.

Teach — Honest reporting when it doesn't match

Here's the rule that separates scientists from salespeople: if your number doesn't match the claim, you report your number — not theirs.

A mismatch isn't failure. It's information. Maybe they used a different data version, a preprocessing step they didn't mention, or a lucky seed. Your job is to state clearly: "The paper claims X. Following their stated method, I got Y. Here is one likely reason for the gap." That single honest paragraph is worth more than any number you could have faked.

⚠ Watch out: never tweak your experiment after seeing the answer just to hit the number you wanted (people call this "p-hacking" or "shopping for a result"). Decide your method first, run it once, and report what came out — even if it's disappointing.

Activity — Reproduce a classic claim

The claim you'll test: "On the classic Iris flower dataset, a simple logistic regression classifier reaches around 95% accuracy." Let's find out if that holds up when you run it.

First, load the data and check what you've got:

from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

print("shape:", X.shape)          # how many samples, how many features?
print("classes:", iris.target_names)
print(X.head())

Now run the stated method — with the variables controlled:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Fix the split AND the seed so this is reproducible
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression(max_iter=200, random_state=42)
model.fit(X_train, y_train)

preds = model.predict(X_test)
acc = accuracy_score(y_test, preds)
print("my reproduced accuracy:", round(acc, 4))

Now compare and report honestly:

  1. What accuracy did you get? Write it down to 4 decimals.
  2. Does it match the claim of "around 95%"? (It should land close — but note your exact number, not "about right".)
  3. Change only the seed to random_state=0 in both the split and the model, and re-run. Did the number move? By how much? This shows you why fixing the seed matters — the "result" wobbles run to run.
  4. Write one sentence: "The claim was ~95%. I reproduced X% with seed 42. It held / didn't hold because ___."

You just did real reproduction — the same act that keeps the whole field honest.

Check yourself

  1. What does it mean for a result to be reproducible? → Someone else following the same method on the same data gets the same answer — so it's real knowledge, not a lucky one-off.
  2. Why set a random seed? → It fixes the randomness (shuffling, initial weights) so every run is identical and your reported number is stable and repeatable.
  3. Your number doesn't match the claim — what do you do? → Report your number honestly, then investigate why it differs (data version, preprocessing, seed). A mismatch is information, not something to hide.

Wrap-up

You reproduced a real result end to end: loaded the data, ran the stated method with controlled variables, compared your number, and reported it honestly. That loop — and the discipline to report what actually happened — is the backbone of trustworthy AI.

Tips & extra challenges

Vocabulary

Term Meaning
Reproducibility Getting the same result by repeating the same method on the same data
Random seed A fixed number that makes randomness identical on every run
Controlled variable Something you deliberately hold constant so a comparison stays fair
Baseline The reference method or score a new result is compared against
p-hacking Tweaking an experiment after seeing results to fake a desired number

Resources

Practice set

Practise on your own — work these easy → hard. Answers follow each arrow.

1. Define it. In one sentence, what is reproducibility? → Getting the same result when you repeat the same method on the same data.

2. Why the seed? Your accuracy changes every time you run the cell. What's the one-line fix? → Set a random_state (a random seed) in the split and the model so runs are identical.

3. Spot the unfair test. You compare two models but each got a different train/test split. What's wrong? → The variable isn't controlled — different splits mean any score gap could be the split, not the model. Use the same split and seed.

4. Honest reporting. You expected 95% and got 91%. What do you report? → You report 91% (your real number) and investigate the gap — never the number you wished for.

5. Load and check (code). Write the two lines that load Iris into X, y and print the shape of X. → iris = load_iris() then X, y = iris.data, iris.target; print(X.shape). (Any correct load + .shape earns it.)

6. Fair comparison (harder, code). You want to test whether a decision tree beats logistic regression on Iris. Describe the setup that keeps it fair. → Use the same X_train/X_test, same random_state, same test set; change only the model class. Then compare the two accuracy_score values.

Going deeper (optional)

Optional — for when you want to know why one accuracy number is never enough.

The danger of a single run. Accuracy from one train/test split is itself a random draw — a slightly lucky or unlucky slice of the data. That's why a serious reproduction reports accuracy across many splits (you'll meet cross-validation in Session 12), giving a mean and a spread. If a paper's claim is "95%" but across ten seeds you see anywhere from 88% to 96%, the honest reproduction isn't "it matched" — it's "their number sits at the top of a wide range." Learning to report the range, not just the friendliest point in it, is what turns a re-run into real evidence.

Common mistakes & fixes

What's next

Session 11 — AI Ethics, Bias & Safety: you can now test whether a result is true. Next you'll ask a harder question — whether it's fair. Where bias comes from, how to audit a model across different groups of people, and why a technically-accurate model can still do real harm.

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒