Session 11 — AI Ethics, Bias & Safety
Duration: 75 min · Format: live online
What you'll learn: by the end, you can explain where bias creeps into an AI system, run a hands-on check for it, weigh fairness, safety and privacy, and make a clear, evidence-based case about an ethical problem instead of just a gut feeling.
Soft skill focus — Communication
Today you'll also grow Communication. Spotting an ethical problem is only half the job — the harder half is making people act on it. That means turning "this feels unfair" into a specific claim, backed by a number, explained so a non-expert gets it in one breath.
- Try this: every time you sense something's wrong today, force it into this shape: claim + evidence + who it affects. "The model rejects group A twice as often as group B (evidence), so qualified people in group A get wrongly denied (who it affects)." Feelings persuade no one; that sentence does.
- Think about: "If I had 30 seconds to convince a decision-maker this model is unfair, what one number and one sentence would I use?"
What you'll need
- Google Colab open in a tab, ready for a new notebook.
- The diagram below open — you'll refer to "bias in, bias out" throughout.
- A real example in mind: an AI system you've read about that treated some group unfairly. You'll sharpen it into a clear argument.
Hook
A hospital used an algorithm to decide which patients needed extra care. It was accurate, widely trusted, and quietly wrong: it used past healthcare spending as a stand-in for how sick you are. But society had long spent less on Black patients — so the model learned to rank equally-sick Black patients as healthier, and steered care away from them. Nobody wrote a racist rule. The data carried the bias, and the model faithfully passed it on.
That's the uncomfortable truth of this session: an AI can be technically excellent and ethically harmful at the same time. Accuracy is not fairness. Today you learn where the harm comes from, how to measure it, and how to make the case to fix it.
Teach — Where bias comes from
Bias isn't usually a villain writing a bad rule. It slips in at three points:
- Data bias — the training data doesn't represent everyone equally. A face system trained mostly on light-skinned faces fails on dark-skinned ones. The model can only learn from what it's shown.
- Label bias — the "correct answers" it learns from were themselves unfair. If past hiring managers rejected qualified women, a model trained on those decisions learns to reject them too. It copies human prejudice and calls it a pattern.
- Deployment bias — a model built for one setting is used in another where it doesn't fit. A tool trained on one country's data, rolled out globally, quietly fails the groups it never saw.
The diagram says it in four words: bias in, bias out. A model is a mirror of its data — polish the model all you like; if the data is skewed, so is the reflection.
Teach — Fairness, safety and privacy
Responsible AI is bigger than bias. Three ideas travel together:
- Fairness: does the model perform equally well across groups? A model that's 95% accurate overall but 99% for one group and 80% for another is not one model — it's a good service for some people and a bad one for others.
- Safety: what happens when the model is wrong, or is misused? A wrong movie recommendation is harmless; a wrong medical or driving decision is not. Higher stakes demand higher caution.
- Privacy: models trained on personal data can leak it. Was consent given? Could the model reveal something about a real person? Data about people is a responsibility, not just fuel.
Teach — Auditing a model across groups
You can't fix what you don't measure. An audit means: split your test set by group (gender, age, region, skin tone — whatever's relevant), then compute the model's accuracy separately for each group and compare.
The overall number lies by averaging. The per-group numbers tell the truth. If accuracy is 94% for one group and 71% for another, you've found real, reportable evidence of unfairness — exactly the kind of claim-plus-number you can act on.
⚠ Watch out: "we removed the race/gender column, so the model can't be biased" is a trap. Models rediscover protected traits through proxies — a postcode can stand in for race, a first name for gender. Deleting the label doesn't delete the bias; only measuring outcomes across groups reveals it.
Activity — Run a bias check
Let's audit a model the honest way — by group. We'll build a tiny loan-approval dataset where a hidden bias lives in the labels, train a model, then check whether it treats two groups equally.
First, build a dataset with a baked-in unfairness:
import numpy as np
import pandas as pd
rng = np.random.default_rng(42)
n = 600
# 'group' is a protected attribute (e.g. two communities), 'score' is a fair qualification
group = rng.integers(0, 2, n)
score = rng.normal(50, 10, n)
# Unfair labels: past approvers demanded a HIGHER score from group 1
threshold = np.where(group == 1, 60, 45)
approved = (score >= threshold).astype(int)
data = pd.DataFrame({"group": group, "score": score, "approved": approved})
print(data.groupby("group")["approved"].mean().round(3)) # approval rate per group
Now train a model that only sees score — and audit it by group:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = data[["score"]] # note: 'group' is NOT given to the model
y = data["approved"]
g = data["group"]
X_tr, X_te, y_tr, y_te, g_tr, g_te = train_test_split(
X, y, g, test_size=0.3, random_state=42
)
model = LogisticRegression().fit(X_tr, y_tr)
preds = model.predict(X_te)
# Audit: approval rate and accuracy PER GROUP
for grp in [0, 1]:
mask = (g_te == grp)
rate = preds[mask].mean()
acc = accuracy_score(y_te[mask], preds[mask])
print(f"group {grp}: approval rate {rate:.3f}, accuracy {acc:.3f}")
Now make the case:
- Even though the model never saw
group, do the two groups get approved at different rates? Why? (The unfairness was in the labels it learned from — bias in, bias out.) - Write the one-sentence evidence-based claim: "Group ___ is approved ___× less often, because the historical labels demanded a higher score from them."
- Suggest one fix and one cost of that fix. (E.g. re-label using a fair threshold — but who decides what's fair?)
You just ran a real fairness audit and turned it into an argument.
Check yourself
- Name the three places bias enters an AI system. → Data (unrepresentative training data), labels (the "right answers" were themselves unfair), and deployment (a model used where it doesn't fit).
- Why isn't overall accuracy enough to call a model fair? → An average hides gaps between groups — a model can score high overall while performing badly for one group. You must check per-group numbers.
- Why doesn't deleting the race or gender column fix bias? → Models rediscover those traits through proxies (postcode, name), so the bias survives. Only measuring outcomes across groups reveals it.
Wrap-up
You've learned that accuracy and fairness are different questions, that bias enters through data, labels and deployment, and that the way to prove it is a per-group audit turned into a clear claim. This is what "responsible AI" means in practice — measure, then make the case.
- Try this at home: take any model you've built this course, split its test set into two groups (even something simple like short vs long inputs), and report accuracy for each. Write one sentence stating whether it's fair, with the numbers. Practising the audit on your own work is how it becomes a habit.
Tips & extra challenges
- Watch out: there's no single "fairness" number everyone agrees on — equal approval rates and equal accuracy can conflict, and you sometimes can't satisfy both. Fairness is a decision about values, not just a metric.
- Want more? Try this: research one real case — the COMPAS recidivism tool or a facial-recognition audit like Gender Shades. Write a three-sentence brief: what the system did, the evidence of harm (with a number), and who it affected. That's a professional-grade ethics summary.
Vocabulary
| Term | Meaning |
|---|---|
| Bias | A systematic unfairness in a model's behaviour, usually inherited from data or labels |
| Label bias | When the training "answers" were themselves the product of unfair decisions |
| Proxy variable | A feature (postcode, name) that secretly stands in for a protected trait |
| Audit | Measuring a model's performance separately for each group to check fairness |
| Privacy | Protecting personal data used to train or query a model |
Resources
- Google Colab — run today's bias audit here.
- Google — What-If Tool — poke a model's fairness across groups, visually, no heavy code.
- Gender Shades — a landmark audit showing face systems failing on darker-skinned women.
Practice set
Practise on your own — work these easy → hard. Answers follow each arrow.
1. Where's the bias? A face model works on light skin but fails on dark skin. Which source is it? → Data bias — the training set didn't represent darker-skinned faces well enough.
2. Accuracy vs fairness. A model is 96% accurate overall. Is it fair? → Can't tell — you must check per-group accuracy; the average can hide a group being served badly.
3. The proxy trap. A team removes the "gender" column and says the model is now unbiased. Why might they be wrong? → A proxy like first name or profession can still encode gender, so bias can survive — you must measure outcomes, not just delete labels.
4. Make the case. Turn "this hiring model feels unfair" into an evidence-based claim. → Something like: "The model recommends group A twice as often as equally-qualified group B (evidence), so qualified group-B applicants are wrongly filtered out (who it affects)."
5. Label bias. Past managers rejected qualified women; a model trained on those decisions does the same. Which source is this? → Label bias — the "correct answers" it learned from were themselves unfair.
6. Audit it (harder, code). Given predictions preds, true labels y_te, and group array g_te, write the loop that prints accuracy per group. → for grp in set(g_te): m = (g_te==grp); print(grp, accuracy_score(y_te[m], preds[m])). (Any correct per-group masking earns it.)
Going deeper (optional)
Optional — for when you want to know why fairness can't be fully automated.
Why you can't always be fair to everyone at once. There are several reasonable definitions of fairness: equal approval rates across groups, equal accuracy across groups, equal false-positive rates, and more. It's a proven mathematical fact that, when base rates differ between groups, you usually cannot satisfy all of them simultaneously — improving one worsens another. This is why fairness is never a checkbox a script can tick. It forces a human choice about which fairness matters most for this use, and who bears the cost of the trade-off. The engineer's job isn't to find the one true metric; it's to measure the trade-offs clearly and put the value decision in front of the people accountable for it.
Common mistakes & fixes
- Mistake: Believing a high-accuracy model must be fair. → Fix: accuracy and fairness are different questions — always audit performance per group.
- Mistake: "We deleted the sensitive column, so it's unbiased now." → Fix: proxies leak the trait back in; measure outcomes across groups instead of trusting deletion.
- Mistake: Arguing an ethics point on feeling alone. → Fix: back it with claim + number + who's affected — evidence persuades, feelings don't.
- Mistake: Assuming there's one "fair" metric. → Fix: name which definition of fairness you're using; different ones can conflict.
- Mistake: Ignoring privacy because the model "just uses data". → Fix: ask whether the data is personal, consented, and safe from leaking — data about people is a responsibility.
What's next
Session 12 — Evaluate Like a Pro: you've learned to check whether a model is fair. Next you'll master checking whether it's actually good — train/validation/test splits, the confusion matrix, precision and recall versus plain accuracy, cross-validation, and model cards. Honest evaluation is where research and ethics meet.