Transformer: Self-Attention and Parallelization
The transformer which is distinguished by its adoption of self-attention and multi-head attention, is a deep learning model using the encoder-decoder architecture. It can be used in both CV and NLP. The encoder of transformer generates BERT and the decoder of transformer generates GPT.