JourneyToCoding

Code for Fun

这篇论文提出的方法是对作者提出的基于梯度匹配的数据缩合法的改进。后者的缩合需要进行两个方向上的梯度下降还要求二阶导,这需要大量的计算,限制了其在大数据集上的应用。这篇论文提出的基于分布匹配的数据缩合则很好地解决了这个问题。

Read more »

这篇论文是对传统的数据蒸馏(Dataset Distillation)的一种改进,称“数据缩合”(Dataset Condensation)。这是数据蒸馏领域最具突破性的研究之一。该论文首次提出了梯度匹配策略,以该策略蒸馏出来的数据集的测试准确率和泛化精度都有了极大的提升。

Read more »

The transformer which is distinguished by its adoption of self-attention and multi-head attention, is a deep learning model using the encoder-decoder architecture. It can be used in both CV and NLP. The encoder of transformer generates BERT and the decoder of transformer generates GPT.

Read more »

Attention mechanisms is a layer of neural networks added to deep learning models to focus their attention to specific parts of data, based on different weights assigned to different parts. Just as the neural network is an effort to mimic human brain actions in a simplified manner, the attention mechanism is also an attempt to implement the same action of selectively concentrating on a few relevant things while ignoring others in neural networks.

Read more »

The Encoder-Decoder Architecture views neural networks in a new perspective. It takes the neural network a kind of signal processor which encode the input and decode it to generate output.

Read more »

CNN is good at processing spatial information but it is not good at processing sequence information. RNN (Recurrent Neural Network) can better process sequence information than other neural networks.

Read more »
0%