Machine Learning

Machine Learning

Definition: Field of study that gives computers the ability to learn without being explicitly programmed.

In general, there are three fields in machine learning, that is, supervised learning, unsupervised learning and reinforcement learning. Their relationships are as follow:

1

Fig. 1. Fields of machine learning

Supervised learning

Supervised learning is the most commonly used machine learning. It's definition is:

Supervised learning refers to algorithms that learn input to output mappings. The key of it is that you give the input(x) and the correct output(y) for the algorithms to learn. Then, the learning algorithms eventually learn to get the input alone and give a reasonably accurate prediction or guess of the output. In short, supervised learning learns from data labeled with the "right answers".

There are generally two types of supervised learning, regression and classification.

Regression

Regression is a kind of supervised learning that predicts a number, so its output contains infinitely many possible results. For example, housing price prediction:

2

Fig. 2. House price prediction

Classification

Classification is a kind of supervised learning that predicts categories which can be number or non numeric, so its output just contains a small number of results. For example, breast cancer detection:

3

Fig. 3. Breast cancer detection

Both regression and classification can have more than one inputs.

Unsupervised learning

Unsupervised learning's definition:

Unsupervised learning refers to algorithm that find something interesting in unlabed data. In unsupervised learning,the algorithms figure out all by themselves what's interesting or what patterns or structures might be in the dataset. Data only comes with input(x), but not output labels(y). Algorithm has to find structure in the data.

There are generally three types of unsupervised learning, clustering, anomaly detection and dimensionality reduction.

Clustering

Clustering is a kind of unsupervised learning that groups data into different categories by any standard. For example, grouping customers:

4

Fig. 4. Grouping customers

Anomaly detection

Anomaly detection is a kind of unsupervised learning that finds unusual data points. It is very useful in financial system.

Dimensionality reduction

This kind of unsupervised learning compresses data using fewer numbers.

Reinforcement learning

Reinforcement learning is an new field in machine learning that has not yet been well applied but has a promising future.

Tool

Jupyter notebook

Jupyter notebook is the default environment that most researchers use to code up and experiment.

PyTorch

see PyTorch.

Tensorflow

See Tensorflow.

NumOy

See NumPy.

Scikit-learn

See Scikit-learn.

Terminology

  • Training set: Data used to train a model.
  • Test set: Data used to evaluate a model.

Notation

  • ${x}$ = "input" variable or feature;
  • ${y}$ = "output" variable or "target" variable;
  • ${m}$ = number of training examples;
  • $(x, y)$ = a single training example;
  • $(x^i,y^i)$ = $i^{th}$ training example;
  • $x_j$ = $j^{th}$ feature;
  • $n$ = number of features;
  • ${\vec{x}^{(i)}}$ = features of $i^{th}$ training example.
  • $x_j^{(i)}$ = value of feature $j$ in $i^{th}$ training example ($x$ can also be $\vec{x}$).