The core strength of the Transformer models is their ability to process text in parallel, increasing efficiency for language tasks. This lesson explores the intricacies of the Transformer architecture, delving into its two primary components: attention mechanisms and the encoder-decoder structure. Learning about these elements will enable us to better understand how modern LLMs like generative pre-trained transformers (GPT) function and excel in language tasks.

Here is everything you need to learn about transformers ( no not the movie or the electrical ones)

Youtube Videos
  1. Attention is all you need by Umar Jamil
  2. Code a Transformer from scratch by Umar Jamil
  3. Transformer Neural Networks, ChatGPT’s foundation, Clearly Explained!!! by StatQuest --- This is by far my favourite one

If you are just looking for a quick introduction or recap, go with this

  1. Transformers, explained: Understand the model behind GPT, BERT, and T5

I have built two small projects based on my learnings and compiled them in the form of Medium Articles. If hands-on is your thing, then do give it a try!

  1. Text Summarization with Transformers
  2. Semantic Search with Transformers


There is no resource more valuable than a single book that explains each of the concepts in a lucid yet comprehensive manner, covering all the aspects. Luckily, I found such a set of books for learning about transformers.

Transformers for Natural Language Processing by Denis Rothman