Ibnovate Course 2 · The Rising Builders
⏱ 1–2 sessionsProject · ages 12–15

Unit 1 Project — Build a Predictor

Run after: Sessions 1–4 · Time: 1–2 sessions (75 min each) · Ages: 12–15

Project goal: students train and test a machine-learning model on a real dataset, report its accuracy honestly, and name one fairness or bias risk in their data.

What students build

A short, self-contained Google Colab notebook that loads a dataset, trains a prediction model with scikit-learn, tests it on data it has never seen, and reports how accurate it is. The notebook ends with a written reflection on one bias or fairness issue in the data.

This is not about getting the highest score — it is about doing the method correctly and being honest about what the model can and cannot do.

Example ideas (let students choose one, or bring their own): - Survival predictor — use the classic Titanic passenger dataset to predict who survived from features like age, sex, and ticket class. (Great for the fairness discussion.) - Flower classifier — use the Iris dataset to predict a flower's species from its petal and sepal measurements. - Grades predictor — use a small student-performance dataset to predict pass/fail from study hours and attendance.

Steps

  1. Pick a dataset and a question. Decide clearly what you are predicting (the target) and what you are predicting it from (the features). Write the question in one sentence at the top of the notebook.
  2. Load and look at the data with pandas. Show the first rows, count how many rows there are, and note anything strange (missing values, odd numbers).
  3. Split the data into a training set and a test set. The model learns from training data only; the test set is held back to check it fairly.
  4. Train the model by calling fit on the training data.
  5. Test the model by calling predict on the test set and comparing the predictions to the real answers.
  6. Measure accuracy — report the score as a percentage and say in words what it means.
  7. Discuss one fairness issue — look for a group in the data that the model might treat unfairly, and write 3–4 sentences about it.

A minimal scikit-learn skeleton students can adapt:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1–2. Load and look
data = pd.read_csv("dataset.csv")
print(data.shape)
data.head()

# choose features (X) and target (y)
X = data[["feature_a", "feature_b"]]
y = data["target"]

# 3. Split: train on 80%, test on the held-back 20%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# 4. Train
model = DecisionTreeClassifier(max_depth=4)
model.fit(X_train, y_train)

# 5–6. Test and measure
predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))

Deliverable

One Colab notebook (shared with view access, or exported as a .ipynb / PDF) that contains, in order: - the prediction question in one sentence, - the data loaded and shown with pandas, - a clear train/test split, - a trained model with a reported accuracy score, - a short written fairness note (3–4 sentences) naming one bias risk and who it could affect.

The rubric scores four rising levels:

Assessment ladder showing the four rubric levels rising from the lowest to the highest

Assessment rubric

Criterion Emerging (1) Developing (2) Proficient (3) Exemplary (4)
Data handling (pandas) Data barely loads; no exploration Loads data and shows rows Loads, explores, and notes an issue (e.g. missing values) Cleans or handles a data problem and explains the choice
Train/test method No split; model tested on training data Split present but reasoning unclear Correct train/test split; explains why we hold data back Explains why testing on unseen data prevents fooling yourself
Model & prediction Model does not run Model runs but wrong features/target fit and predict used correctly on the right columns Tries a setting (e.g. depth) and compares the effect
Evaluation honesty No accuracy reported Accuracy shown but not interpreted Accuracy reported and explained in plain words Discusses when the score is misleading (e.g. imbalanced classes)
Fairness / bias reflection Missing or generic Mentions bias vaguely Names a real bias in the data and who it affects Names the bias, the affected group, and a way to reduce it

Instructor tips

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒