Ibnovate Course 3 · The Future Builders
⏱ 75 minLive session

Session 11 — AI Ethics, Bias & Safety

Duration: 75 min · Format: live online

What you'll learn: by the end, you can explain where bias creeps into an AI system, run a hands-on check for it, weigh fairness, safety and privacy, and make a clear, evidence-based case about an ethical problem instead of just a gut feeling.

Soft skill focus — Communication

Today you'll also grow Communication. Spotting an ethical problem is only half the job — the harder half is making people act on it. That means turning "this feels unfair" into a specific claim, backed by a number, explained so a non-expert gets it in one breath.

What you'll need

Hook

A hospital used an algorithm to decide which patients needed extra care. It was accurate, widely trusted, and quietly wrong: it used past healthcare spending as a stand-in for how sick you are. But society had long spent less on Black patients — so the model learned to rank equally-sick Black patients as healthier, and steered care away from them. Nobody wrote a racist rule. The data carried the bias, and the model faithfully passed it on.

That's the uncomfortable truth of this session: an AI can be technically excellent and ethically harmful at the same time. Accuracy is not fairness. Today you learn where the harm comes from, how to measure it, and how to make the case to fix it.

Teach — Where bias comes from

Bias isn't usually a villain writing a bad rule. It slips in at three points:

  1. Data bias — the training data doesn't represent everyone equally. A face system trained mostly on light-skinned faces fails on dark-skinned ones. The model can only learn from what it's shown.
  2. Label bias — the "correct answers" it learns from were themselves unfair. If past hiring managers rejected qualified women, a model trained on those decisions learns to reject them too. It copies human prejudice and calls it a pattern.
  3. Deployment bias — a model built for one setting is used in another where it doesn't fit. A tool trained on one country's data, rolled out globally, quietly fails the groups it never saw.

Bias in, bias out: unfair training data leads to an unfair model

The diagram says it in four words: bias in, bias out. A model is a mirror of its data — polish the model all you like; if the data is skewed, so is the reflection.

Teach — Fairness, safety and privacy

Responsible AI is bigger than bias. Three ideas travel together:

Teach — Auditing a model across groups

You can't fix what you don't measure. An audit means: split your test set by group (gender, age, region, skin tone — whatever's relevant), then compute the model's accuracy separately for each group and compare.

The overall number lies by averaging. The per-group numbers tell the truth. If accuracy is 94% for one group and 71% for another, you've found real, reportable evidence of unfairness — exactly the kind of claim-plus-number you can act on.

⚠ Watch out: "we removed the race/gender column, so the model can't be biased" is a trap. Models rediscover protected traits through proxies — a postcode can stand in for race, a first name for gender. Deleting the label doesn't delete the bias; only measuring outcomes across groups reveals it.

Activity — Run a bias check

Let's audit a model the honest way — by group. We'll build a tiny loan-approval dataset where a hidden bias lives in the labels, train a model, then check whether it treats two groups equally.

First, build a dataset with a baked-in unfairness:

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)
n = 600
# 'group' is a protected attribute (e.g. two communities), 'score' is a fair qualification
group = rng.integers(0, 2, n)
score = rng.normal(50, 10, n)

# Unfair labels: past approvers demanded a HIGHER score from group 1
threshold = np.where(group == 1, 60, 45)
approved = (score >= threshold).astype(int)

data = pd.DataFrame({"group": group, "score": score, "approved": approved})
print(data.groupby("group")["approved"].mean().round(3))  # approval rate per group

Now train a model that only sees score — and audit it by group:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X = data[["score"]]           # note: 'group' is NOT given to the model
y = data["approved"]
g = data["group"]

X_tr, X_te, y_tr, y_te, g_tr, g_te = train_test_split(
    X, y, g, test_size=0.3, random_state=42
)

model = LogisticRegression().fit(X_tr, y_tr)
preds = model.predict(X_te)

# Audit: approval rate and accuracy PER GROUP
for grp in [0, 1]:
    mask = (g_te == grp)
    rate = preds[mask].mean()
    acc = accuracy_score(y_te[mask], preds[mask])
    print(f"group {grp}: approval rate {rate:.3f}, accuracy {acc:.3f}")

Now make the case:

  1. Even though the model never saw group, do the two groups get approved at different rates? Why? (The unfairness was in the labels it learned from — bias in, bias out.)
  2. Write the one-sentence evidence-based claim: "Group ___ is approved ___× less often, because the historical labels demanded a higher score from them."
  3. Suggest one fix and one cost of that fix. (E.g. re-label using a fair threshold — but who decides what's fair?)

You just ran a real fairness audit and turned it into an argument.

Check yourself

  1. Name the three places bias enters an AI system.Data (unrepresentative training data), labels (the "right answers" were themselves unfair), and deployment (a model used where it doesn't fit).
  2. Why isn't overall accuracy enough to call a model fair? → An average hides gaps between groups — a model can score high overall while performing badly for one group. You must check per-group numbers.
  3. Why doesn't deleting the race or gender column fix bias? → Models rediscover those traits through proxies (postcode, name), so the bias survives. Only measuring outcomes across groups reveals it.

Wrap-up

You've learned that accuracy and fairness are different questions, that bias enters through data, labels and deployment, and that the way to prove it is a per-group audit turned into a clear claim. This is what "responsible AI" means in practice — measure, then make the case.

Tips & extra challenges

Vocabulary

Term Meaning
Bias A systematic unfairness in a model's behaviour, usually inherited from data or labels
Label bias When the training "answers" were themselves the product of unfair decisions
Proxy variable A feature (postcode, name) that secretly stands in for a protected trait
Audit Measuring a model's performance separately for each group to check fairness
Privacy Protecting personal data used to train or query a model

Resources

Practice set

Practise on your own — work these easy → hard. Answers follow each arrow.

1. Where's the bias? A face model works on light skin but fails on dark skin. Which source is it? → Data bias — the training set didn't represent darker-skinned faces well enough.

2. Accuracy vs fairness. A model is 96% accurate overall. Is it fair? → Can't tell — you must check per-group accuracy; the average can hide a group being served badly.

3. The proxy trap. A team removes the "gender" column and says the model is now unbiased. Why might they be wrong? → A proxy like first name or profession can still encode gender, so bias can survive — you must measure outcomes, not just delete labels.

4. Make the case. Turn "this hiring model feels unfair" into an evidence-based claim. → Something like: "The model recommends group A twice as often as equally-qualified group B (evidence), so qualified group-B applicants are wrongly filtered out (who it affects)."

5. Label bias. Past managers rejected qualified women; a model trained on those decisions does the same. Which source is this? → Label bias — the "correct answers" it learned from were themselves unfair.

6. Audit it (harder, code). Given predictions preds, true labels y_te, and group array g_te, write the loop that prints accuracy per group. → for grp in set(g_te): m = (g_te==grp); print(grp, accuracy_score(y_te[m], preds[m])). (Any correct per-group masking earns it.)

Going deeper (optional)

Optional — for when you want to know why fairness can't be fully automated.

Why you can't always be fair to everyone at once. There are several reasonable definitions of fairness: equal approval rates across groups, equal accuracy across groups, equal false-positive rates, and more. It's a proven mathematical fact that, when base rates differ between groups, you usually cannot satisfy all of them simultaneously — improving one worsens another. This is why fairness is never a checkbox a script can tick. It forces a human choice about which fairness matters most for this use, and who bears the cost of the trade-off. The engineer's job isn't to find the one true metric; it's to measure the trade-offs clearly and put the value decision in front of the people accountable for it.

Common mistakes & fixes

What's next

Session 12 — Evaluate Like a Pro: you've learned to check whether a model is fair. Next you'll master checking whether it's actually good — train/validation/test splits, the confusion matrix, precision and recall versus plain accuracy, cross-validation, and model cards. Honest evaluation is where research and ethics meet.

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒