ML One
Lecture 02
Introduction to representation, numbers and image classification 🍡
Welcome πŸ‘©β€πŸŽ€πŸ§‘β€πŸŽ€πŸ‘¨β€πŸŽ€
By the end of this lecture, we'll have learnt about:
The theoretical:
- Introduction to representation
- How to use numbers to represent things
- Introduction to machine learning [model]
- Introduction to image classification
The practical:
- How to use image classification API from Apple's Vision Framework
First of all, don't forget to confirm your attendence on Seats App!
fun (vintage) AI for today: an image-to-image translation model Pix2Pix (with an interactive demo web page)
Recap
- Pattern recognition is everywhere and amazing. ⭐️
- It'd be cool to better understand intelligence by prototyping it using human-made machine (β€œAI”).⭐️
- One intermediate goal of prototyping intelligence is to make machines to do pattern recognition. ⭐️
(because we guess that pattern recognition is an essential part of intelligence.)
from "Introduction to AI" by Mick Grierson:
- We can also view AI as statistical, probabilistic, stochastic and other mathematical processes that can be used to learn procedures and perform tasks automatically.
- Symbolic and Connectionist approaches to AI
(We'll see more on this in later lectures)
Representation
What is apple
Pattern recognition question 01
How do you distinguish apple as a fruit vs. Apple as a tech company?
Here are some possible representations:
- is it edible or not?
- is it a fruit?
- is the word spelled with an upper cased "A"? πŸ€ͺ
Pattern recognition question 02
How do you distinguish apple vs. pear? 🍎🍏🍐
How do you distinguish apple vs. pear? 🍎🍏🍐
Do any of our previous representations work?
- is it edible or not?
- is it a fruit?
- is the word spelled with an upper cased "A"? πŸ€ͺ
No but here are some possible representations:
- their shapes
- their tastes
Good representation simplifies our task!
To excel at pattern recognition ~= To find a good representation
πŸŒͺ️
Meet papple
Representation can be contextual.
Depends on the problem context, different tasks may have different efficient representations for the same objects.
My take on representation:
- It can be descriptive and capture some characteristics.
- It can be perspective and partial.
- It can be contextual where "good" ones are task-dependent.
- A side note on abstraction (in the context of computational thinking) -
Abstraction is related to representation in the sense that it is about taking away irrelevant details and reducing the representation to essential characteristics.
(Shoutout to Joel's slides on "what is computational thinking?")
Speaking of representation...
A name can be a person's representation and we have not heard about your names yet 😜
any questions so far?
Numbers and Numeric representation
What do we use (real) numbers for?
- To count things
- To measure things
- To label things
- ...
007
A side note:
- Do we really have numbers in the real world? Have you seen numbers walking on the street on their own?
A side note:
Have you seen numbers walking on the street on their own?
- Numbers are often used with contexts. To connect numbers with reality, we always need interpretation guide(e.g. a protocol, unit, etc.) when using numbers in real life.
- How to interpret numbers produced by computer programms is a helpful skill you can get from this unit.
My number of chaos:
-858993460
Why are numbers important in this unit?
- Our human-made poor machine can only deal with numbers 🫠
- Numbers can introduce maths, which is our DOMAIN EXPANSION πŸ’₯
β€œInformation eraβ€πŸŽπŸ’»
Information we receive from the world are mainly from four modalities:
- Image (video)
- Text (written language)
- Audio (music, speech)
- Number (the weather in degree celsius, your birthday, etc. )
Can you think of any information that is not from the four categories?
My mind-blowing moment:
- Information from any of those three categories (image, text and audio) can be digitized / numberified aka be represented by just a bunch of numbers.
How to numberify the size of digital images?
- Two numbers for its width and height (how many pixels).
e.g. 3840 x 2160 for 4K resolution
- Sometimes another number for how many color channels there are.
e.g. 256 x 256 x 3 for an RGB color image
Here is one way to numberify digital images:
- Three numbers for each pixel representing the RGB values in color images
- One number for each pixel representing the greyscale value in grey images
How to numberify digital audio?
Here is one way to numberify digital audio:
- A series of numbers where each number denotes a displacement value from silence at that point of time
Here is one way to numberify text:
- Think about when you are looking up a word in a dictionaryπŸ“• using page number and index
- To decode the pair of (page number, index) back to actual words, you will need to pass that dictionary around.
- OR?
- more on this in week11
A tricky question for you!
How to numberify the label of an image being a dog🐢 image or a cat😼 image?
You are free to define any dictionary/protocol to explain the number(s) you have used.
One possible way:
- I will use two numbers and each number is either 0 or 1.
- I will have the first number indicating if there is a dog in the image.
- And I will have the second number indicating if there is a cat in the image.
- If it is a dog-only image: [1, 0] (square brackets here are optional)
- If it is a cat-only image: [0, 1]
- If it is a dog and cat image: [1, 1]
- What does this numeric representation [0, 0] mean in this context?
any questions so far?
Image classification
What we will talk about today:
- What is image classification?
- What can an image classification model do?
What we will NOT talk about today:
- How to make an image classification model from scratch?
- How to modify an image classification model?
Image classification
- It is a machine learning and computer vision task.
- It is a stepping stone task in computer vision and computational pattern recognition.
- The goal is to train a model to recognize and assign a predefined label or class to an input image.
Your first AI model to inspect!!!
One hack for inspecting any model:
Because a model is about "input -> process -> output",
to inspect what a model does, one helpful starting point is to check its input and output!
Input? Output?
Image classification model:
- Input: a digital image
- output: the predicted classes of the input image
Image classification model:
Input (in a bit more details):
An image with a pre-defined size.
(but you can get around with the size limit simply by resizing and/or cropping images.)
(and be careful of the aspect ratio!)
Image classification model:
Output (in a bit more details):
Probabilities assigned to each class from a set of pre-defined classes.
Image classification model:
It outputs probabilities assigned to each class from a set of pre-defined classes.
Let's dive into the "pre-defined classes" bit
Back to the example of dog🐢-or-cat😼 image classification model:
What are the pre-defined classes that this model can predict into?
- a class for "dog" and a class for "cat"
- Can this model tell us which vehicle the input image is?
- 🚫
- But it would be interesting to feed a πŸš— image into the model and see how it improvises.
Image classification model:
It outputs probabilities assigned to each class from a set of pre-defined classes.
Let's dive into the "probabilities" bit
- It is primarily outputing a list of probabilities assigned to each class from the pre-defined class sets
- From there, we can easily post-process the list of probabilities and output human-friendlier result like "which one is the most probable class?"
any questions so far?
Takeaway messages from our first encounter with image classification models
- Each model has a pre-defined input image size.
- Each model has a pre-defined output classes range.
A toy example on image classification in playground
Has everyone installed Xcode?
Download the playground here and double click the file to open it (Xcode should be fired up automatically)
Don't panic about the chunks of swift code πŸ’™
- The code has been prepared and is ready to run.
- It can predict the given input image's class, one image at a time.
All we have to do for this lecture is to:
1. Run the playground as it is.
2. Check the classfication results.
3. Inspect the loaded classification model's input image size and output classes.
4. Swap to another input image and run the playground.
This is how you can run the playground and check the classification results πŸ‘»
This is where you can inspect the model's input image sizeπŸ‘‹
- Bonus question: go back to the main file and check the code in line 25, have you seen some familiar numbers? What does these numbers mean?
This is where you can inspect the model's output classes πŸŒ›
- Bonus question: how many classes can this model predict from?
This is how you can swap the input image πŸ’…
- Bonus question: try using other images from your laptop (you can download new interesting ones from the internet) as the input image
A nice App that makes good use of image classification model
Today we have talked about:
Representation 🧠
- descriptive, perspective, and contextual
Numeric representation 🌢️
- How image, audio and text can be represented by numbers
Image classification πŸ•ΉοΈ
- Given an input image, which is of a pre-defined size, a IC model predicts the probabilities of that image assigned to each class from a pre-defined set of classes.
Homework :)
Envision an App that leverages image classification model(s) πŸ‘
Some possible starting points:
- What is the scenario where an image classification model could be helpful / fun?
- What are the classes you envision the model to predict from input images?
- Classes could be any, both objective and subjective(e.g. different emotions) categories!
We'll see you next Thursday same time and same place!