ML One
Lecture 07
Introduction to (artificial) neural network
+
Multi-Layer Perceptron
Welcome ๐Ÿ‘ฉโ€๐ŸŽค๐Ÿง‘โ€๐ŸŽค๐Ÿ‘จโ€๐ŸŽค
By the end of this lecture, we'll have learnt about:
The theoretical:
- Neurons in biological neural network
- Neurons in artificial neural network (ANN)
- Neurons grouped in layers in artificial neural network
- Use function and matrix multiplication to describe what happens between layers in ANN
- A simple neural network - Multilayer Perceptron (MLP)
The practical:
- MLP implemented in python
recommended learning resources for this lecture's content: this playlist of video tutorials by 3B1B , providing a visual understanding of neural network.
First of all, don't forget to confirm your attendence on Seats App!
introduing "repeated exposure": the amazingness in our built-in perceptual adaptation effect
Recap
Today we are going to see how dots (adding/multiplying matrices, functions) are connected!!!
Scalar, vector and matrix ๐Ÿง‘โ€๐ŸŽจ
- how to describe their shapes
-- number of rows x number of columns
Scalar, vector and matrix ๐Ÿง‘โ€๐ŸŽจ
- how to multiply a row vector and a column vector?
-- dot product which results in a scalar
-- the shape rule: these two vectors have to be of the same length.
Scalar, vector and matrix ๐Ÿง‘โ€๐ŸŽจ
- how to multiply two matrices?
-- the shape rule:
-- the shapes of the two matrices should be: M x K and K x N
-- the shape of the product matrix would be: M x N
Functions ๐Ÿง‘โ€๐ŸŽจ
- A function relates an input to an output.
-- Chain functions together to make a new function
-- Function graphs
-- Exp, sigmoid, quadratic, relu, sine, tanh (and they have characteristics)
Functions ๐Ÿง‘โ€๐ŸŽจ
-- Function graphs
-- Exp, sigmoid, quadratic, relu, sine, tanh
end of recap
Artificial neural network is fun, computationally capable and made up of smaller components including neurons.
We'll meet quite a few new terms today - they are easy concepts, just have faith in perceptual adaptation through repetition!
Let's forget about math for now
the story starts from real biological neuron (a simulation) ๐Ÿค˜
As human we have roughly 86 (some said 100) billion neurons. A neuron is an electrically excitable cell that fires electric signals across a neural network. [wikipedia]
It is the fundamental unit of the brain and nervous system.
The cells are responsible for receiving sensory input, for sending motor commands to our muscles, and for transforming and relaying the electrical signals at every step in between.
Neurons are connected in some structure.
Connected neurons communicate with each other via electrical impulses. โšก๏ธ
one neuron with dendrites, axon and transmitters
when did you last have a biology lesson?
Think of your happiest moment in memory, and this is probably what was going on in your brain during that moment.
Recap of the simulated neural process:
-- A neuron is charged by signals from other connected neurons.
-- We can refer to the level of accumulated charges in one neuron as its activation.
-- A neuron receives different levels of signals from different neurons.
-- Once a neuron is sufficiently charged, it fires off a charge to the next neurons.
The myth of grandma neuron โญ๏ธ:
A hypothetical neuron that has high activation
when a person "sees, hears, or otherwise sensibly discriminates" a specific entity, such as their grandmother.
But does the grandma neuron actually look like a grandma? ๐Ÿ˜œ
Nope, the information it carries is encoded as its conditional activation,
which can be loosely depicted as a number which increases when you see your grandma and decreases when you doesn't see your grandma.
What are the mathsy parts in the neural process? ๐Ÿงฎ
Recap of the simulated neural process:
-- A neuron is charged by signals from other connected neurons.
-- We can refer to the level of accumulated charges in one neuron as its activation value.
-- There are usually different levels of signals emitted from different neurons.
-- Once a neuron is sufficiently charged, it fires off a signal to the next neurons.
let's do something quite interdisciplinary
--- extracting maths ideas from dat biology class ---> ๐Ÿ’ก๐Ÿ‘พ๐Ÿงช๐Ÿงฎ๐Ÿ’ก
Maths extraction 00
-- Numberify each neuron's activation:
a number that representing how much electrical charge a neuron receives and fires โ˜๏ธ
Maths extraction 01
-- View the charging process from arithemetic:
accumulation, addition โž•
Maths extraction 02
-- A neuron is NOT firing immediately whatever charges it receives,
instead it waits till being sufficiently charged and firing:
a sense of thresholding ๐Ÿชœ
hint hint function: relu, sigmoid
Maths extraction 03
-- A bird eye view of neuron connectivity:
there is a hierarchical process where neurons are both the receiver of preceding neurons and transmitter of next ones,
recall how function chaining works? It it routing one function's output to be the next function's input. โ›“๏ธ
Maths extraction 04
-- Numberify the connection strength:
Not every two neurons are connected with equal strength.
Perhaps we can use a number per each connection, referred to as weight, to depict the different stength? ๐Ÿ”‹
โ˜๏ธ๐ŸŒซ๏ธโ˜๏ธ
Introducing now: (artificial) neural networks, the anatomy
finally!!! ๐Ÿ”ฅ
โš ๏ธ Note:
For this lecture, we are not looking at what do the numbers actually mean and how to interpret them.
We are only looking at how this neural process portrayed computationally.
Today's road map:
1. Introduction to neurons: what does it do
2. Introduction to layers: what does it do and what's the computation underneath
3. Introduction to MLP: a combination of what we have introduced so far
Starting from (artificial) neuron {
One neuron holds a single number indicating its activation ๐Ÿชซ๐Ÿ”‹
on whiteboard
Neurons are grouped in layers, reflecting the hierarchical structure (the order is from left to right) ๐Ÿ˜๏ธ
let's draw another two layers of neurons because I want to
Connectivity between !consecutive! layers
ps: it is actually connectivity between neurons in !consecutive! layers
A neuron receives signals(numbers, or activations) from all neurons in the previous layer, let's draw out the links๐Ÿ”—
and so does every single neurons!
โš ๏ธ note: neurons inside the same layer are NOT connected
Different connection strengths: every link indicates a different connection strength ๐Ÿ”Œ
that is to say every link also indicates a number, let's call it a weight
Note that a weight is different from an activation that is stored in each neuron
One activation is contextualised in one single neuron,
whereas one weight is contextualised in the link between two connected neurons
/*end of (artificial) neuron */
}
Now that we know what neurons are and that they are grouped in layers ๐Ÿฅฐ
let's look from the perspective of layers and build our first multilayer perceptron by hands ๐Ÿค‘
layers and Multilayer perceptron MLP {
aka vanilla neural networks,
aka fully connected feedforward neural network, (don't memorise this, MLP sounds way cooler, but this lengthy name has some meanings we'll see shortly )
Let's contexturise MLP in an example image classification task.
summoning the "hello world" in ML: MNIST, handwritten digits recognition
It is a dataset comprising 28*28 images of handwritten digits. Each image is labelled with its digit class (from 0 to 9).
The task is to, take an input image and output its digit class.
Since an MLP is characterised by its layer types, let's introduce layers.
Through the lens of layers{
Neurons are holding numbers (activations), so in one layer there is a vertical layout of one column of numbers. Does this one vertical column of numbers sound familiar?
Neuron activations in one layer forms a vector, let's call this "layer vector" or "layer activation vector"
There are different types of layers:
First we have input and output layers
Input layer is where the input data is loaded (e.g. one neuron holds one pixel's grayscale value)
The number of neurons in input layer is pre-defined by the specific task and data.
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ How many neurons there should be in the input layer for MNIST (which is a dataset of 28*28 images)? hint: one neuron for one pixel
28*28 = 784
Because it has to be a vertical col vector for MLP, the flattening giant has stepped over...
For instance, a 2*2 image/matrix after flattening becomes a 4*1 col vector
๐ŸŒถ๏ธ๐ŸŒถ๏ธ What is the shape of the input layer vector in MNIST (which is a dataset of 28*28 images)?
784*1
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ What if we have a dataset of small images of size 20*20? How many neurons should we put in the input layer and what should the new shape of the input layer vector be?
400*1
Output layer is where the output is held
For classification tasks, the output is categorical and how do we encode categorical data?
One-hot encoding: it depends on how many classes are there
Another way to interpret one-hot encoding output: each neuron holds the "probability" of the output belonging to that class
It is just another number container anyway ๐Ÿคช
๐ŸŒถ๏ธ๐ŸŒถ๏ธ What is the shape of the output layer vector for MNIST?
10*1 (10 classes of digits)
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ What if my task changes to recognise if the digit is zero or non-zero? How many neurons should we put in the output layer and what should the new shape of the output layer vector be?
2*1 (only 2 classes of digits!)
The number of neurons in the input and output layer are determined by the task and the dataset.
Next:
Hidden layers: any layer inbetween input and output layers ๐Ÿ˜…
How many neurons should we put in each hidden layer? Is it pre-defined by the task and the dataset like input/output layers?
No, it is all up to you woo hoo ! It is part of the fun neural net designing process ๐Ÿฅฐ
Here i choose...
Let's connect these layers following our previous connection rule: only consecutives layers are *directly* linked
ATTENTION ๐Ÿ’ฟ
The last piece of puzzle ๐Ÿงฉ
Recall the biological process of charging, accumulation and firing
Let's simulate the ANN process from biological analogies, from input layer to output layer
โš ๏ธ Note:
For this lecture, we are not looking at what do the numbers actually mean and how to interpret them.
We are only looking at how this neural process is portrayed computationally.
Recall that each link has a number ("weight") for connection strength
Activations in each layer's activation vector are computed using previous layer's activation vector and the corresponding connection weights
For example, to caculate the first neuron's activation in the first hidden layer: input layers' neurons activations multiplied with corresponding connection weights, and sums up
Wait did that look just like a dot product process? ๐Ÿ’ฟ
Indeed, we can simulate the "charging" and "accumulating" process using matrix multiplication โœŒ๏ธ๐Ÿ˜Ž
A layer's weights matrix: made of weight of every connection link this layer has with *previous* layer ๐Ÿงฎ
Demonstration of first hidden layer's weights matrix multipled with input layer's activation vector on whiteboard ๐Ÿค˜
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ What is the shape of the weights matrix?
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ What is the shape of the weights matrix?
# of neurons in THIS layer
x
# of neurons in PREVIOUS layer
For the "wait till sufficiently charging or thresholding" part, let's introduce bias vector and activation function โœŒ๏ธ
๐ŸŒถ๏ธ๐ŸŒถ๏ธ Recall the graph of ReLU function, what is its activate zone? aka the range of input that relates to non-zero output
๐ŸŒถ๏ธ๐ŸŒถ๏ธ Recall the graph of ReLU function, what is its activate zone? aka the range of input that relates to non-zero output
from 0 to โˆž !
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ The effect of the bias vector:
What if I want to have an activate zone from 1 to โˆž?
Add a bias of -1 to the input.
(you can take some time ponder and wonder here later)
Clarify the input and output of the activation:
The raw activation vector after matrix multiplication and bias vector addition is the input to the activation function (e.g. ReLU).
The activation function output is the actual activation (fires to the next layer)
Demonstration of layer vector added with bias vector on whiteboard (adding or removing extra difficulty for neuron activation to reach "active zone" in activation function)
๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ๐ŸŒถ๏ธ What is the shape of the bias vector?
# of neurons in THIS layer
x
1
Demonstration of layer vector wrapped with activation function on whiteboard
Puzzle almost finished!
/*end of through the lens of layers */
}
Let's write down what just happened using function expression ๐Ÿค˜๐Ÿ˜Ž
1. wrap each layer's charging(aka weight matrix multiplication) and thresholding(bias vector and act. function) process as a function
-- function input: a vector, previous layer's activation vector
-- function output: a vector, this layer's activation vector
-- the function body: input multiplied by this layer's weights matrix, added with bias vector, wrapped with activation function
V_output =
ReLU(WeightsMat * V_input + Bias)
Next, how to connect different layers using function expression?
Function chaining!!! demonstration on whiteboard โ›“โ›“โ›“
Puzzle finished, recall that a model is roughly a big function?
MLP is a model, and a function. Let's writedown the final BIG function for this neural network
๐ŸŽ‰๐Ÿฅ‚๐ŸŽŠ
Done! โœŒ๏ธ๐Ÿ˜ค
Note that I made up all the numbers in the weights matrices and bias vectors during demo ๐Ÿ™‚
In practice, these numbers are learned through training process.
We'll leave the training process to next week's lecture.
The process we talk about today with assumed weights matrices and bias vector is the forward pass of neural network.
aka how information(activations) is propagated from input to outputโฉ.
The training process will be about how information is propagated backwardsโช,
from ouput to input to find the proper numbers in weights matrices and bias vectors to make the neural network work.
That's quite a lot, congrats! ๐ŸŽ‰
Here is a video from 3B1B that explains MLP from another perspective, very nice.
Next, we are going to:
- take a look at how MLP can be implemented in python with help from NumPy and Pytorch(a very popular deep learning library in Python)!
Alert: you are going to see quite advanced python and neural network programming stuff, we are not expected to understand them all at the moment.
Let's take a look at how some ideas we talk about today are reflected in the code,
especially how we set up a layer by specifying how many neurons it should have.
A prepared google colab notebook
1. click on the link and open this google colab notebook
Let's take a look at the notebook!

- 1. Make sure you have saved a copy to your GDrive or opened in playground. ๐ŸŽ‰
- 2. Most parts are out of the range of the content we have covered so far.
- 3. We only need to take a look at the several lines in the "Defining the Model" section.
- 4. IMPORTANT: In practice, we just need to specify the number of neurons in each layer and all the computation is left to computers.
Today we have looked at:
- Neurons as a number container (activation)
- Neurons are grouped in layers
- Layers are connected Hierarchically (from left to righ)
- Input layer
- Output layer
- Hidden layer
- Weights matrix
- Bias vector
- Activation function
- Write the MLP into one big function
We'll see you next Thursday same time and same place!