ML One
Lecture 02
Introduction to representation, numbers and image classification ๐Ÿต
Welcome ๐Ÿ‘ฉโ€๐ŸŽค๐Ÿง‘โ€๐ŸŽค๐Ÿ‘จโ€๐ŸŽค
By the end of this lecture, we'll have learnt about:
The theoretical:
- Introduction to representation
- How to use numbers to represent things
- Introduction to machine learning [model]
- Introduction to image classification
The practical:
- How to use image classification API from Apple's Vision Framework
First of all, don't forget to confirm your attendence on Seats App!
fun (vintage) AI for today: an image-to-image translation model Pix2Pix (with an interactive demo web page)
Recap
- Pattern recognition is everywhere and amazing. โญ๏ธ
- We'd love to understand intelligence better by prototyping it using human-made machines (โ€œAIโ€ systems).โญ๏ธ
- One intermediate goal of prototyping intelligence is to make machines to do pattern recognition. โญ๏ธ
(because we guess that pattern recognition is an essential part of intelligence.)
Representation
What is apple
Pattern recognition question 01
How do you distinguish apple as a fruit vs. Apple as a tech company?
Here are some possible representations:
- is it edible or not?
- is it a fruit?
- is the word spelled with an upper cased "A"? ๐Ÿคช
Pattern recognition question 02
How do you distinguish apple vs. pear? ๐ŸŽ๐Ÿ๐Ÿ
How do you distinguish apple vs. pear? ๐ŸŽ๐Ÿ๐Ÿ
Do any of our previous representations work?
- is it edible or not?
- is it a fruit?
- is the word spelled with an upper cased "A"? ๐Ÿคช
No but here are some possible representations:
- their shapes
- their tastes
Good representation simplifies our task!
To excel at pattern recognition ~= To find a good representation
๐ŸŒช๏ธ
Meet papple
Representation can be contextual.
Depends on the problem context, different tasks may have different efficient representations for the same objects.
My take on representation:
- It can be descriptive and capture some characteristics.
- It can be perspective and partial.
- It can be contextual where "good" ones are task-dependent.
- A side note on abstraction (in the context of computational thinking) -
Abstraction is related to representation in the sense that it is about taking away irrelevant details and reducing the representation to essential characteristics.
(Shoutout to Joel's slides on "what is computational thinking?")
Till now we have been using English words to describe representations. ๐Ÿ“™
How about using machine words (numbers) to represent objects, concepts, etc.? ๐Ÿค–
๐Ÿ•ถ๏ธQuestion for curiosity: what are differences between using our natural language vs. using numbers for representation?
any questions so far?
Numbers and Numeric representation
What do we use (real) numbers for?
- To count things
- To measure things
- To label things
- ...
007
A side note:
- Do we really have numbers in the real world? Have you seen numbers walking on the street on their own?
A side note:
Have you seen numbers walking on the street on their own?
- Numbers are often used with contexts. To connect numbers with reality, we always need interpretation guide(e.g. a protocol, unit, etc.) when using numbers in real life.
- How to interpret numbers produced by computer programms is a helpful skill you can get from this unit.
๐Ÿค— My number of chaos:
-858993460
Why are numbers important in this unit?
- Our human-made poor machine can only deal with numbers ๐Ÿซ 
- Numbers can introduce maths, which is our DOMAIN EXPANSION ๐Ÿ’ฅ
โ€œInformation eraโ€๐ŸŽ๐Ÿ’ป
Information we receive from the world are mainly from four modalities:
- Image (video)
- Text (written language)
- Audio (music, speech)
- Tabular (the weather in degree celsius, your birthday, etc. )
Can you think of any information that is not from the four categories?
๐Ÿง  My mind-blowing moment:
- Information from any of those three categories (image, text and audio) can be digitized / numberified aka be represented by just a bunch of numbers.
How to numberify the size of digital images?
- Two numbers for its width and height (how many pixels).
e.g. 3840 x 2160 for 4K resolution
- Sometimes another number for how many color channels there are.
e.g. 256 x 256 x 3 for an RGB color image
Here is one way to numberify digital images:
- Three numbers for each pixel representing the RGB values in color images
- One number for each pixel representing the greyscale value in grey images
How to numberify digital audio?
Here is one way to numberify digital audio:
Each number in a digital audio file represents the instantaneous amplitude (height) of the sound wave.
- that is, how far the air pressure at that moment is above or below the neutral โ€œsilenceโ€ level.
Slightly more poeticL A series of numbers where each number denotes a displacement value from silence at that point in time.
A little bit more details:
- The sound wave is sampled at regular time intervals โ€” e.g. 44,100 times per second in CD-quality audio (thatโ€™s the sampling rate).
- Each sample corresponds to one number.
- Together, these numbers trace out how the air pressure (or voltage, in an electrical signal) changes over time.
Here is one way to numberify text:
- Think about when you are looking up a word in a dictionary๐Ÿ“• using page number and index
- To decode the pair of (page number, index) back to actual words, you will need to pass that dictionary around.
- OR?
- more on this in week11
๐Ÿค—A tricky question for you! - how to use numbers to represent information that are more abstract than images/audios?
How to numberify the label of an image being a dog๐Ÿถ image or a cat๐Ÿ˜ผ image?
You are free to define any dictionary/protocol to explain the number(s) you have used.
One possible way:
- I will use two numbers and each number is either 0 or 1.
- I will have the first number indicating if there is a dog in the image.
- And I will have the second number indicating if there is a cat in the image.
- If it is a dog-only image: [1, 0] (square brackets here are optional)
- If it is a cat-only image: [0, 1]
- If it is a dog and cat image: [1, 1]
- What does this numeric representation [0, 0] mean in this context?
any questions so far?
Image classification
What we will talk about today: ๐Ÿค—
- What is image classification?
- What can an image classification model do?
What we will NOT talk about today: โœ‹
- How to make an image classification model from scratch?
- How to modify an image classification model?
Image classification
- It is a machine learning and computer vision task. ๐Ÿ‘๏ธ
- It is a stepping stone task in computer vision and computational pattern recognition.
- The goal is to train a model to recognize and assign a predefined label or class to an input image.
๐Ÿ•ถ๏ธ Your first AI model to inspect!!!
๐Ÿ•ถ๏ธ One hack for inspecting any model:
Because a model is about "input -> process -> output",
to inspect what a model does, one helpful starting point is to check its input and output!
Input? Output?
๐Ÿ•ถ๏ธ Image classification model:
- Input: a digital image
- output: the predicted classes of the input image
๐Ÿ•ถ๏ธ Image classification model:
Input (in a bit more details):
An image with a pre-defined size.
(but you can get around with the size limit simply by resizing and/or cropping images.)
(and be careful of the aspect ratio!)
๐Ÿ•ถ๏ธ Image classification model:
Output (in a bit more details):
Probabilities assigned to each class from a set of pre-defined classes.
Image classification model:
It outputs probabilities assigned to each class from a set of pre-defined classes.
Let's dive into the "pre-defined classes" bit
Back to the example of dog๐Ÿถ-or-cat๐Ÿ˜ผ image classification model:
What are the pre-defined classes that this model can predict into?
- a class for "dog" and a class for "cat"
- Can this model tell us which vehicle the input image is?
- ๐Ÿšซ
- But it would be interesting to feed a ๐Ÿš— image into the model and see how it improvises.
Image classification model:
It outputs probabilities assigned to each class from a set of pre-defined classes.
Let's dive into the "probabilities" bit
- It is primarily outputing a list of probabilities assigned to each class from the pre-defined class sets
- From there, we can easily post-process the list of probabilities and output human-friendlier result like "which one is the most probable class?"
any questions so far?
Takeaway messages from our first encounter with image classification models
- Each model has a pre-defined input image size.
- Each model has a pre-defined output classes range.
A toy example on image classification in playground
Has everyone installed Xcode?
Download the playground here and double click the file to open it (Xcode should be fired up automatically)
Don't panic about the chunks of swift code ๐Ÿ’™
- The code has been prepared and is ready to run.
- It can predict the given input image's class, one image at a time.
All we have to do for this lecture is to:
1. Run the playground as it is.
2. Check the classfication results.
3. Inspect the loaded classification model's input image size and output classes.
4. Swap to another input image and run the playground.
This is how you can run the playground and check the classification results ๐Ÿ‘ป
This is where you can inspect the model's input image size๐Ÿ‘‹
- Bonus question: go back to the main file and check the code in line 25, have you seen some familiar numbers? What does these numbers mean?
This is where you can inspect the model's output classes ๐ŸŒ›
- Bonus question: how many classes can this model predict from?
This is how you can swap the input image ๐Ÿ’…
- Bonus question: try using other images from your laptop (you can download new interesting ones from the internet) as the input image
A nice App that makes good use of image classification model
Today we have talked about:
Representation ๐Ÿง 
- descriptive, perspective, and contextual
Numeric representation ๐ŸŒถ๏ธ
- How image, audio and text can be represented by numbers
Image classification ๐Ÿ•น๏ธ
- Given an input image, which is of a pre-defined size, a IC model predicts the probabilities of that image assigned to each class from a pre-defined set of classes.
Homework :)
Envision an App that leverages image classification model(s) ๐Ÿ‘
Some possible starting points:
- What is the scenario where an image classification model could be helpful / fun?
- What are the classes you envision the model to predict from input images?
- Classes could be any, both objective and subjective(e.g. different emotions) categories!
We'll see you next Thursday same time and same place!