Ibnovate Course 3 · The Future Builders
⏱ 75 minLive session

Session 2 — How Networks Learn

Duration: 75 min · Format: live online

What you'll learn: by the end, you can explain how a network measures its own mistakes (the loss function), how it improves by rolling downhill toward less error (gradient descent), and how these fit into the training loop that powers all of deep learning — and you'll run a tiny loop in Python that fixes a weight by itself.

Soft skill focus — Problem-solving

Today you'll also grow Problem-solving. Training a network is problem-solving made mechanical: measure how wrong you are, take one small step in the direction that helps, then measure again. That "measure → adjust → repeat" loop is exactly how you crack hard problems in life, not just in code.

What you'll need

Hook

In Session 1 you built a neuron and chose its weights by hand. That works for two inputs. But a real network has millions of weights — nobody could ever set those by hand. So here's the question that built the entire field:

How can a network find its own weights?

The answer is beautifully simple. First, give the network a way to measure how wrong it is — one number. Then keep changing the weights in whatever direction makes that number smaller. Do it enough times and the network teaches itself. Today you'll see exactly how, and you'll watch a weight fix itself in front of you.

Teach — Loss: one number for "how wrong"

Before a network can improve, it needs to know how bad its current answer is. That measurement is the loss — a single number where big = very wrong and 0 = perfect.

Gradient descent: the model rolls down an error curve to find the weights with the lowest error

A common loss for numbers is squared error: take the prediction, subtract the true answer, and square it.

Picture the loss as a curve: on the bottom axis is the weight you could pick, and going up is how much error that weight causes. Somewhere on that curve is a lowest point — the weight with the least error. Learning is the search for the bottom of that valley.

Teach — Gradient descent: roll downhill

So you have an error curve and you want its lowest point. How do you get there without seeing the whole curve at once?

Imagine standing on a foggy hillside, wanting the valley floor, able to see only the ground at your feet. The trick: feel which way is downhill, take a small step that way, and repeat. You'll reach the bottom without ever seeing the whole hill. That is gradient descent.

The "which way is downhill" part is the gradient — it's the slope of the loss curve at your current weight. It tells you two things at once: which direction reduces the loss, and how steep it is right there.

The update rule is one line, and it's the heart of all training:

new weight = old weight − (learning rate × gradient)

⚠ Watch out: the learning rate is the setting people get wrong most. Too large and the loss bounces around or blows up to NaN (you overshot the valley and shot up the far side). Too small and training barely moves and seems "stuck." When training misbehaves, the learning rate is the first dial to check.

Teach — The training loop

Loss and gradient descent come together in a cycle the network repeats thousands of times. This loop is training.

The training loop: data in, prediction, loss, adjust the weights, repeat

  1. Data in — feed the network an example (or a batch of them).
  2. Predict — it runs the forward pass you built in Session 1 and produces an output.
  3. Loss — compare the prediction to the true answer; get one number for how wrong it is.
  4. Adjust the weights — compute the gradient and nudge every weight a little downhill (gradient descent).
  5. Repeat — go back to step 1 with slightly better weights.

Each full pass over your data is called an epoch. After enough epochs the loss shrinks toward the bottom of the valley, the weights settle, and the network has learned. That's it — no magic, just this loop, run at enormous scale.

Activity — Train one weight in Python

Let's make a weight fix itself. You have a machine prediction = weight × input. The true rule is prediction = 2 × input, so the correct weight is 2 — but your weight starts wrong at 0.0. You will not set it to 2; the loop will find it.

First, by hand (30 seconds): input 1.5, truth 3.0, weight 0.0. The prediction is 0.0, so it's way too low. Which way must the weight move — up or down? Write it down, then let the loop prove you right.

Type and run this:

x = 1.5          # one input
y_true = 3.0     # the correct answer (because 2 * 1.5 = 3)
weight = 0.0     # our weight starts wrong
lr = 0.1         # learning rate: the size of each step

for step in range(20):
    pred = weight * x               # 1-2: predict
    loss = (pred - y_true) ** 2     # 3: squared-error loss
    grad = 2 * (pred - y_true) * x  # 4: slope of loss w.r.t. weight
    weight = weight - lr * grad     # 4: step downhill
    print(f"step {step:2d}  weight={weight:.3f}  loss={loss:.3f}")

Now watch what happened:

  1. Did the loss shrink toward 0 as the steps went on? That's the network getting less wrong.
  2. Did the weight climb toward 2.0 on its own? Nobody told it 2 — it found it by rolling downhill.
  3. Change lr to 0.9 and re-run. Does the loss bounce around or explode? You just overshot the valley — that's a learning rate too big.
  4. Change lr to 0.01. Does it crawl and never quite arrive in 20 steps? That's a learning rate too small.

You just ran the exact process — loss, gradient, update, repeat — that trains every deep-learning model, from a two-line demo to a model with billions of weights.

Check yourself

  1. What does the loss function measure?How wrong the model's prediction is, as a single number — big means very wrong, 0 means perfect.
  2. In gradient descent, why do we subtract the gradient? → The gradient points uphill (toward more loss), so subtracting it moves us downhill (toward less loss).
  3. What are the five steps of the training loop?Data in → predict → loss → adjust the weights → repeat.

Wrap-up

You now know how a network learns without anyone setting its weights: it measures its error with a loss function, uses gradient descent to step the weights downhill, and repeats that in a training loop until the loss is small. Every model you'll ever train runs this exact cycle.

Tips & extra challenges

Vocabulary

Term Meaning
Loss function A number measuring how wrong a prediction is (0 = perfect)
Gradient The slope of the loss — which way, and how steeply, error changes
Gradient descent Repeatedly stepping the weights downhill to reduce the loss
Learning rate The size of each downhill step (e.g. 0.1)
Epoch One full pass of the training loop over all the data

Resources

Practice set

Practise on your own — work these easy → hard. Answers follow each arrow.

1. Read the loss. Model A has loss 0.2; model B has loss 8.0. Which one is more wrong? → Model B — a bigger loss means a worse prediction.

2. Compute a loss. Prediction 5, truth 8, using squared error. What is the loss? → (5 − 8)² = (−3)² =9.

3. Pick the direction. Your prediction is too high and you want less loss. Should the weight (with a positive input) go up or down? → Down — a smaller weight lowers the prediction toward the truth.

4. Diagnose the run. You train and the loss reads 2.1 → 40 → 900 → NaN. What single setting is almost certainly wrong? → The learning rate is too large — the steps are overshooting and the loss is exploding.

5. Trace one step (harder). Weight 1.0, input 2.0, truth 6.0, learning rate 0.1. Compute the prediction, the gradient 2 * (pred − truth) * input, and the new weight. → pred = 1.0 × 2.0 = 2.0; gradient = 2 × (2.0 − 6.0) × 2.0 = −16; new weight = 1.0 − 0.1 × (−16) =2.6 (it moved up toward the true weight of 3).

Going deeper (optional)

Optional — for when you want to know where that gradient formula comes from.

Why is the gradient 2 * (pred − truth) * input? The loss is (pred − truth)² and pred = weight × input. Calculus asks: if I wiggle the weight a tiny bit, how much does the loss change? The chain rule answers it in two links — the loss changes by 2 × (pred − truth) for each unit of prediction, and the prediction changes by input for each unit of weight. Multiply the links and you get 2 × (pred − truth) × input. You don't need to derive this by hand in real projects — libraries like TensorFlow compute every gradient for you automatically (it's called backpropagation, and you'll rely on it next session). But seeing it once, on one weight, means the automatic version will never feel like magic.

Common mistakes & fixes

What's next

Session 3 — Build a Neural Network: you've trained one weight by hand. Next you'll hand the whole loop — loss, gradients, and all — to Keras/TensorFlow and train a real multi-layer network on a real image dataset with just a few lines: Sequential, Dense, compile, fit, evaluate.

Ibnovate · Build · Innovate
Type to search · Esc to close
Welcome back
Sign in to continue building.
Accounts are created by Ibnovate — ask your instructor for your login.
🔒