ML Zoomcamp 2025 : Week 1. Intro to Machine Learning

Week 1. Intro to Machine Learning

Machine Learning

Definition

A process of extracting patterns from data
An algorithm that learns patterns from data and predicts outcomes.

Related concepts

Model: The output of a machine learning algorithm after it has been trained on data. A model encapsulates all the patterns learned during training.

Model Training

2 Types of Data

- feature: all information about the object

- target: What we want to predict about the project

Comparison: Rule-Based Systems VS Machine Learning

Example: Spam Mail

Case 1. Rule-Base System

Observing data and create rules to filter spam mail
ex) if a mail contains ‘review, promotion, …’ then this mail is a spam

-> but spams are keep changing…. it is hard to maintain the code and add new rules…

JUST USE MACHINE LEARNING!

Case 2. Machine Learning

Get Data
Define & Calculate features
Train and use the model - in this case, use model to classify messages into spam or not spam

Process of ML

Features:

We put the data and code to software and get output from usual software.

On the one hand, we put data and output to machine learning and get the model as an output!

Supervised Machine Learning

Definition

teaching the computer with putting feature matrix X, and want to make a prediction from it, we try getting something very close to our target variable right

Expression

g(x) = y

* g(): model, x: features, y: target

Features: Matrix, two-dimension array
Target: One-dimension array

Types of Machine Learning

Regression

function g(model) outputs range of numbers
example: predict prices of cars or house renting....

Classification

outputs categories, multi-classes
example: this animal is a cat/dog/hamster, this mail is a spam or not

CRISP-DM(Cross-Industry Standard Process for Data Mining)

Process

Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment

Model Selection Process

Selecting the best model From Model1, Model2, and Model3

Data Preparation

We split the data into training and validation sets. For example, 80% of the data is used for training, and 20% is used for validation.

After training the models, we compare their results on the validation data and select the one with the best score.

Multiple Comparison Problem

What if our model is not really good, but gets the highest score by chance..?

-> For checking this, we need test data!

Whole Process Of Model Selection

Split data into training, validation, and test.
Train the models
Evaluate those models
Select the best one and apply it to the test data
Comparing the performance

Linear Algebra Refresher

* I used GPT to help me understand the concepts.

Vector-vector multiplication (dot product)

Definition & Concepts

an operation that takes two vectors and returns a single scalar value.
the dot product primarily measures how much two vectors point in the same direction.
This property makes the dot product useful in machine learning for things like calculating similarity or defining cost functions in linear regression.

Transpose

- If you want to turn a column vector into row vector, use this!

* Why do we need to use transpose?

Dot product of vectors is a key concept that indicates how much two vectors point in the same direction. We can easily express this as a matrix multiplication by using the product of a row vector and a column vector.
In linear algebra, it's common to represent a vector as a column vector. However, to express the dot product of two vectors—which is the operation of multiplying their corresponding components and summing the results—using the rules of matrix multiplication, the first vector must be a row vector and the second must be a column vector. The operation that changes vector v into a row vector is called the transpose ( $v^{T}$ )

Matrix-vector Multiplication

Matrix-vector multiplication acts as a linear transformation of a vector.

Concept: It's an operation where a matrix acts on a vector, changing its direction and/or magnitude.
How It Works: A new vector is created by taking the dot product of each row of the matrix with the vector. This operation is only possible if the number of columns in the matrix is equal to the number of rows in the vector.

Matrix-Matrix Multiplication

Matrix-matrix multiplication can be seen as applying multiple linear transformations in sequence.

Concept: It's an operation for transforming multiple vectors at once or for combining two or more linear transformations.
How It Works: The second matrix is broken down into several column vectors. Then, matrix-vector multiplication is performed repeatedly, multiplying the first matrix with each of the second matrix's column vectors. The results are then combined to form a single new matrix.

Identity Matrix

Concept: The identity matrix, denoted as $I$ , is a square matrix with all ones on the main diagonal and zeros everywhere else. Just like the number 1, multiplying any matrix by the identity matrix leaves the original matrix unchanged.
how to make: np.eye()

Inverse Matrix

how to make: np.linalg,inv()

Quick Look at Numpy

What is numpy? NumPy is an open source project that enables numerical computing with Python. Numpy is implemented in C, its performance is faster than a Python list. Document Core Features ndarray : N-dimensional Array Object, the basic data type of NumPy. Vectorization : Operations (or Calculations) performed on the entire array without explicit loops. Broadcasting : Enables operations between arrays of different shapes (or sizes) Array Operations and Attributes 1. Array Creation np.array(): Creates an array from a Python list. np.zeros(): An array filled with zeros. np.ones(): An array filled with ones. np.arange(): An array of consecutive numbers. np.linspace(): An array with evenly spaced numbers. np.random: Module for generating random arrays. 2. Array Attributes shape: The array's dimensions/axes (rows, columns). dtype: Data type. ndim: Number of dimensions. size: Total number of elements. 3. Array Indexing and Slicing Basic Indexing: arr[0], arr[1, 2] Slicing: arr[1:3]...

Time To Step Out

Search This Blog

ML Zoomcamp 2025 : Week 1. Intro to Machine Learning

Week 1. Intro to Machine Learning

Machine Learning

Definition

Model Training

Comparison: Rule-Based Systems VS Machine Learning

Example: Spam Mail

Case 1. Rule-Base System

Case 2. Machine Learning

Supervised Machine Learning

Definition

Expression

Types of Machine Learning

CRISP-DM(Cross-Industry Standard Process for Data Mining)

Process

Model Selection Process

Data Preparation

Multiple Comparison Problem

Linear Algebra Refresher

Vector-vector multiplication (dot product)

Definition & Concepts

Transpose

Matrix-vector Multiplication

Matrix-Matrix Multiplication

Identity Matrix

Inverse Matrix

Labels

Comments

Post a Comment

Popular posts from this blog

Datastructure & Algorithm: Tree

Quick Look at Numpy