Paper001: Dataset Condensation with Gradient Matching
这篇论文是对传统的数据蒸馏(Dataset Distillation)的一种改进,称“数据缩合”(Dataset Condensation)。这是数据蒸馏领域最具突破性的研究之一。该论文首次提出了梯度匹配策略,以该策略蒸馏出来的数据集的测试准确率和泛化精度都有了极大的提升。
这篇论文是对传统的数据蒸馏(Dataset Distillation)的一种改进,称“数据缩合”(Dataset Condensation)。这是数据蒸馏领域最具突破性的研究之一。该论文首次提出了梯度匹配策略,以该策略蒸馏出来的数据集的测试准确率和泛化精度都有了极大的提升。
The transformer which is distinguished by its adoption of self-attention and multi-head attention, is a deep learning model using the encoder-decoder architecture. It can be used in both CV and NLP. The encoder of transformer generates BERT and the decoder of transformer generates GPT.
Attention mechanisms is a layer of neural networks added to deep learning models to focus their attention to specific parts of data, based on different weights assigned to different parts. Just as the neural network is an effort to mimic human brain actions in a simplified manner, the attention mechanism is also an attempt to implement the same action of selectively concentrating on a few relevant things while ignoring others in neural networks.
Some more advanced optimization algorithms in addition to SGD.
The Encoder-Decoder Architecture views neural networks in a new perspective. It takes the neural network a kind of signal processor which encode the input and decode it to generate output.
Some commonly used RNN models: GRU, LSTM, DRNN, BRNN...
CNN is good at processing spatial information but it is not good at processing sequence information. RNN (Recurrent Neural Network) can better process sequence information than other neural networks.
CPU, GPU, DPS, FPGA, ASIC(TPU).
Some commonly used CNN models: LeNet-5, AlexNet, VGG, NiN, GoogLeNet, ResNet.
CNN is a special kind of MLP. Why do we still need CNN even though MLP can work well? This involves a classic problem in the computer field: the trade-off between memory and computing speed. CNN is widely used in image processing. An image is characterized by its representation in the computer by millions of pixels. Each pixel is a feature of the image. It is unbearable for GPU to store so many model parameters for these features. Hence, we need CNN to compress the number of parameters and extract features from an image.