Skip to main content

ML Zoomcamp 2025 : Week 1. Intro to Machine Learning

Week 1. Intro to Machine Learning

Machine Learning

Definition

  • A process of extracting patterns from data
  • An algorithm that learns patterns from data and predicts outcomes.

Related concepts

  • Model: The output of a machine learning algorithm after it has been trained on data. A model encapsulates all the patterns learned during training.

Model Training


2 Types of Data

- feature: all information about the object

- target: What we want to predict about the project



Comparison: Rule-Based Systems VS Machine Learning

Example: Spam Mail

Case 1. Rule-Base System

  • Observing data and create rules to filter spam mail
  • ex) if a mail contains ‘review, promotion, …’ then this mail is a spam

-> but spams are keep changing…. it is hard to maintain the code and add new rules…

JUST USE MACHINE LEARNING!

Case 2. Machine Learning

  • Get Data
  • Define & Calculate features
  • Train and use the model - in this case, use model to classify messages into spam or not spam

Process of ML

Features:


















We put the data and code to software and get output from usual software.

On the one hand, we put data and output to machine learning and get the model as an output!


Supervised Machine Learning

Definition

  • teaching the computer with putting feature matrix X, and want to make a prediction from it, we try getting something very close to our target variable right

Expression

g(x) = y
* g(): model, x: features, y: target
  • Features: Matrix, two-dimension array
  • Target: One-dimension array

Types of Machine Learning

Regression

  • function g(model) outputs range of numbers
  • example: predict prices of cars or house renting....

Classification

  • outputs categories, multi-classes
  • example: this animal is a cat/dog/hamster, this mail is a spam or not


CRISP-DM(Cross-Industry Standard Process for Data Mining)



Process

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment


Model Selection Process

Selecting the best model From Model1, Model2, and Model3

Data Preparation

  • We split the data into training and validation sets. For example, 80% of the data is used for training, and 20% is used for validation. 

After training the models, we compare their results on the validation data and select the one with the best score.

Multiple Comparison Problem

What if our model is not really good, but gets the highest score by chance..?

-> For checking this, we need test data!

Whole Process Of Model Selection

  1. Split data into training, validation, and test.
  2. Train the models
  3. Evaluate those models
  4. Select the best one and apply it to the test data
  5. Comparing the performance

Linear Algebra Refresher

* I used GPT to help me understand the concepts.

Vector-vector multiplication (dot product)

Definition & Concepts

  • an operation that takes two vectors and returns a single scalar value.
  • the dot product primarily measures how much two vectors point in the same direction.
  • This property makes the dot product useful in machine learning for things like calculating similarity or defining cost functions in linear regression.

Transpose

- If you want to turn a column vector into row vector, use this!

* Why do we need to use transpose?

  • Dot product of vectors is a key concept that indicates how much two vectors point in the same direction. We can easily express this as a matrix multiplication by using the product of a row vector and a column vector.
  • In linear algebra, it's common to represent a vector as a column vector. However, to express the dot product of two vectors—which is the operation of multiplying their corresponding components and summing the results—using the rules of matrix multiplication, the first vector must be a row vector and the second must be a column vector. The operation that changes vector v into a row vector is called the transpose (vT)

Matrix-vector Multiplication


Matrix-vector multiplication acts as a linear transformation of a vector.

  • Concept: It's an operation where a matrix acts on a vector, changing its direction and/or magnitude.

  • How It Works: A new vector is created by taking the dot product of each row of the matrix with the vector. This operation is only possible if the number of columns in the matrix is equal to the number of rows in the vector.

Matrix-Matrix Multiplication



Matrix-matrix multiplication can be seen as applying multiple linear transformations in sequence.

  • Concept: It's an operation for transforming multiple vectors at once or for combining two or more linear transformations.

  • How It Works: The second matrix is broken down into several column vectors. Then, matrix-vector multiplication is performed repeatedly, multiplying the first matrix with each of the second matrix's column vectors. The results are then combined to form a single new matrix.

Identity Matrix



  • Concept: The identity matrix, denoted as I, is a square matrix with all ones on the main diagonal and zeros everywhere else. Just like the number 1, multiplying any matrix by the identity matrix leaves the original matrix unchanged.
  • how to make: np.eye()

Inverse Matrix






  • how to make: np.linalg,inv()

Comments

Popular posts from this blog

Datastructure & Algorithm: Tree

 What is Tree? Tree is a data structure which has a hierarchy system.   Concepts & Words Node: each element of the tree Root: The uppermost node of the tree Parent: a node that is above another node Child: a node that is under another node Leaf: Nodes have no children Edge: The line connects nodes Level: The depth from the root (The level of the root is 0) Height: Max level of the tree Features of Tree Hierarchy System: the relationship between subordinates and superiors No cycle: you can't go backward Connections: All nodes are connected from the root Distinct route: There is only one route between two random nodes Implementing the Tree Core Elements to implement 1. Node Class: Each node has the reference to its child node and data 2. Main Functions Insert: Adding a new node Search: Finding a specific value Traversal: Visiting all nodes Delete: Removing a node 3. Ways Of Traversing A Tree Preorder: Root -> Left -> Right Inorder: Left -> Root -> Right Post...

Quick Look at Numpy

 What is numpy? NumPy is an open source project that enables numerical computing with Python. Numpy is implemented in C, its performance is faster than a Python list. Document Core Features ndarray : N-dimensional Array Object, the basic data type of NumPy. Vectorization : Operations (or Calculations) performed on the entire array without explicit loops. Broadcasting : Enables operations between arrays of different shapes (or sizes) Array Operations and Attributes 1. Array Creation np.array(): Creates an array from a Python list. np.zeros(): An array filled with zeros. np.ones(): An array filled with ones. np.arange(): An array of consecutive numbers. np.linspace(): An array with evenly spaced numbers. np.random: Module for generating random arrays. 2. Array Attributes shape: The array's dimensions/axes (rows, columns). dtype: Data type. ndim: Number of dimensions. size: Total number of elements. 3. Array Indexing and Slicing Basic Indexing: arr[0], arr[1, 2] Slicing: arr[1:3]...