Machine Learning One

By the end of this lecture, we'll have learnt about:
The theoretical:
- Introduction to representation
- How to use numbers to represent things
- Introduction to machine learning [model]
- Introduction to image classification
The practical:
- How to use image classification API from Apple's Vision Framework

- Pattern recognition is everywhere and amazing. ⭐️
- We'd love to understand intelligence better by prototyping it using human-made machines (“AI” systems).⭐️
- One intermediate goal of prototyping intelligence is to make machines to do pattern recognition. ⭐️
(because we guess that pattern recognition is an essential part of intelligence.)

How do you distinguish apple vs. pear? 🍎🍏🍐
Do any of our previous representations work?
- is it edible or not?
- is it a fruit?
- is the word spelled with an upper cased "A"? 🤪

Representation can be contextual.
Depends on the problem context, different tasks may have different efficient representations for the same objects.

My take on representation:
- It can be descriptive and capture some characteristics.
- It can be perspective and partial.
- It can be contextual where "good" ones are task-dependent.

- A side note on abstraction (in the context of computational thinking) -
Abstraction is related to representation in the sense that it is about taking away irrelevant details and reducing the representation to essential characteristics.
(Shoutout to Joel's slides on "what is computational thinking?")

Till now we have been using English words to describe representations. 📙
How about using machine words (numbers) to represent objects, concepts, etc.? 🤖
🕶️Question for curiosity: what are differences between using our natural language vs. using numbers for representation?

A side note:
Have you seen numbers walking on the street on their own?
- Numbers are often used with contexts. To connect numbers with reality, we always need interpretation guide(e.g. a protocol, unit, etc.) when using numbers in real life.
- How to interpret numbers produced by computer programms is a helpful skill you can get from this unit.

Why are numbers important in this unit?
- Our human-made poor machine can only deal with numbers 🫠
- Numbers can introduce maths, which is our DOMAIN EXPANSION 💥

“Information era”🍎💻
Information we receive from the world are mainly from four modalities:
- Image (video)
- Text (written language)
- Audio (music, speech)
- Tabular (the weather in degree celsius, your birthday, etc. )
Can you think of any information that is not from the four categories?

🧠 My mind-blowing moment:
- Information from any of those three categories (image, text and audio) can be digitized / numberified aka be represented by just a bunch of numbers.

How to numberify the size of digital images?
- Two numbers for its width and height (how many pixels).
e.g. 3840 x 2160 for 4K resolution
- Sometimes another number for how many color channels there are.
e.g. 256 x 256 x 3 for an RGB color image

Here is one way to numberify digital images:
- Three numbers for each pixel representing the RGB values in color images
- One number for each pixel representing the greyscale value in grey images

Here is one way to numberify digital audio:
Each number in a digital audio file represents the instantaneous amplitude (height) of the sound wave.
- that is, how far the air pressure at that moment is above or below the neutral “silence” level.
Slightly more poeticL A series of numbers where each number denotes a displacement value from silence at that point in time.
A little bit more details:
- The sound wave is sampled at regular time intervals — e.g. 44,100 times per second in CD-quality audio (that’s the sampling rate).
- Each sample corresponds to one number.
- Together, these numbers trace out how the air pressure (or voltage, in an electrical signal) changes over time.

Here is one way to numberify text:
- Think about when you are looking up a word in a dictionary📕 using page number and index
- To decode the pair of (page number, index) back to actual words, you will need to pass that dictionary around.
- OR?
- more on this in week11

🤗A tricky question for you! - how to use numbers to represent information that are more abstract than images/audios?
How to numberify the label of an image being a dog🐶 image or a cat😼 image?
You are free to define any dictionary/protocol to explain the number(s) you have used.

One possible way:
- I will use two numbers and each number is either 0 or 1.
- I will have the first number indicating if there is a dog in the image.
- And I will have the second number indicating if there is a cat in the image.
- If it is a dog-only image: [1, 0] (square brackets here are optional)
- If it is a cat-only image: [0, 1]
- If it is a dog and cat image: [1, 1]
- What does this numeric representation [0, 0] mean in this context?

What we will talk about today: 🤗
- What is image classification?
- What can an image classification model do?
What we will NOT talk about today: ✋
- How to make an image classification model from scratch?
- How to modify an image classification model?

Image classification
- It is a machine learning and computer vision task. 👁️
- It is a stepping stone task in computer vision and computational pattern recognition.
- The goal is to train a model to recognize and assign a predefined label or class to an input image.

🕶️ One hack for inspecting any model:
Because a model is about "input -> process -> output",
to inspect what a model does, one helpful starting point is to check its input and output!

🕶️ Image classification model:
Input (in a bit more details):
An image with a pre-defined size.
(but you can get around with the size limit simply by resizing and/or cropping images.)
(and be careful of the aspect ratio!)

🕶️ Image classification model:
Output (in a bit more details):
Probabilities assigned to each class from a set of pre-defined classes.

Image classification model:
It outputs probabilities assigned to each class from a set of pre-defined classes.
Let's dive into the "pre-defined classes" bit

Back to the example of dog🐶-or-cat😼 image classification model:
What are the pre-defined classes that this model can predict into?
- a class for "dog" and a class for "cat"
- Can this model tell us which vehicle the input image is?
- 🚫
- But it would be interesting to feed a 🚗 image into the model and see how it improvises.

Image classification model:
It outputs probabilities assigned to each class from a set of pre-defined classes.
Let's dive into the "probabilities" bit

- It is primarily outputing a list of probabilities assigned to each class from the pre-defined class sets
- From there, we can easily post-process the list of probabilities and output human-friendlier result like "which one is the most probable class?"

Takeaway messages from our first encounter with image classification models
- Each model has a pre-defined input image size.
- Each model has a pre-defined output classes range.

Don't panic about the chunks of swift code 💙
- The code has been prepared and is ready to run.
- It can predict the given input image's class, one image at a time.

All we have to do for this lecture is to:
1. Run the playground as it is.
2. Check the classfication results.
3. Inspect the loaded classification model's input image size and output classes.
4. Swap to another input image and run the playground.

This is how you can run the playground and check the classification results 👻

This is where you can inspect the model's input image size👋
- Bonus question: go back to the main file and check the code in line 25, have you seen some familiar numbers? What does these numbers mean?

This is where you can inspect the model's output classes 🌛
- Bonus question: how many classes can this model predict from?

This is how you can swap the input image 💅
- Bonus question: try using other images from your laptop (you can download new interesting ones from the internet) as the input image

Today we have talked about:
Representation 🧠
- descriptive, perspective, and contextual
Numeric representation 🌶️
- How image, audio and text can be represented by numbers
Image classification 🕹️
- Given an input image, which is of a pre-defined size, a IC model predicts the probabilities of that image assigned to each class from a pre-defined set of classes.

Homework :)
Envision an App that leverages image classification model(s) 👐
Some possible starting points:
- What is the scenario where an image classification model could be helpful / fun?
- What are the classes you envision the model to predict from input images?
- Classes could be any, both objective and subjective(e.g. different emotions) categories!