JourneyToCoding

Paper001: Dataset Condensation with Gradient Matching

Posted on 2023-06-12 Edited on 2023-09-20 In Paper Word count in article: 4k Reading time ≈ 15 mins.

这篇论文是对传统的数据蒸馏（Dataset Distillation）的一种改进，称“数据缩合”（Dataset Condensation）。这是数据蒸馏领域最具突破性的研究之一。该论文首次提出了梯度匹配策略，以该策略蒸馏出来的数据集的测试准确率和泛化精度都有了极大的提升。

Transformer: Self-Attention and Parallelization

Posted on 2023-06-07 Edited on 2023-08-11 In Dive Into Deep Learning Word count in article: 1.4k Reading time ≈ 5 mins.

The transformer which is distinguished by its adoption of self-attention and multi-head attention, is a deep learning model using the encoder-decoder architecture. It can be used in both CV and NLP. The encoder of transformer generates BERT and the decoder of transformer generates GPT.

Attention Mechanisms: More Targeted Information Extraction

Posted on 2023-06-06 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 1.7k Reading time ≈ 6 mins.

Attention mechanisms is a layer of neural networks added to deep learning models to focus their attention to specific parts of data, based on different weights assigned to different parts. Just as the neural network is an effort to mimic human brain actions in a simplified manner, the attention mechanism is also an attempt to implement the same action of selectively concentrating on a few relevant things while ignoring others in neural networks.

Optimization Algorithms

Posted on 2023-06-02 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 633 Reading time ≈ 2 mins.

Some more advanced optimization algorithms in addition to SGD.

The Encoder-Decoder Architecture

Posted on 2023-05-31 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 993 Reading time ≈ 4 mins.

The Encoder-Decoder Architecture views neural networks in a new perspective. It takes the neural network a kind of signal processor which encode the input and decode it to generate output.

Common RNN Models

Posted on 2023-05-29 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 1.2k Reading time ≈ 4 mins.

Some commonly used RNN models: GRU, LSTM, DRNN, BRNN...

RNN: a Special Kind of MLP

Posted on 2023-05-25 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 1.2k Reading time ≈ 4 mins.

CNN is good at processing spatial information but it is not good at processing sequence information. RNN (Recurrent Neural Network) can better process sequence information than other neural networks.

Multiple GPUs and Parallelism

Posted on 2023-05-19 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 158 Reading time ≈ 1 mins.

CPU, GPU, DPS, FPGA, ASIC(TPU).

Common CNN Models

Posted on 2023-05-14 Edited on 2023-08-09 In Dive Into Deep Learning Word count in article: 2.5k Reading time ≈ 9 mins.

Some commonly used CNN models: LeNet-5, AlexNet, VGG, NiN, GoogLeNet, ResNet.

CNN: Feature Extraction

Posted on 2023-05-10 Edited on 2024-09-29 In Dive Into Deep Learning Word count in article: 2.4k Reading time ≈ 9 mins.

CNN is a special kind of MLP. Why do we still need CNN even though MLP can work well? This involves a classic problem in the computer field: the trade-off between memory and computing speed. CNN is widely used in image processing. An image is characterized by its representation in the computer by millions of pixels. Each pixel is a feature of the image. It is unbearable for GPU to store so many model parameters for these features. Hence, we need CNN to compress the number of parameters and extract features from an image.