The core strength of the Transformer models is their ability to process text in parallel, increasing efficiency for language tasks. This lesson explores the intricacies of the Transformer architecture, delving into its two primary components: attention mechanisms and the encoder-decoder structure. Learning about these elements will enable us to better understand how modern LLMs like generative pre-trained transformers (GPT) function and excel in language tasks.
Here is everything you need to learn about transformers ( no not the movie or the electrical ones)
Youtube Videos
- Attention is all you need by Umar Jamil
- Code a Transformer from scratch by Umar Jamil
- Transformer Neural Networks, ChatGPTās foundation, Clearly Explained!!! by StatQuest --- This is by far my favourite one
If you are just looking for a quick introduction or recap, go with this
I have built two small projects based on my learnings and compiled them in the form of Medium Articles. If hands-on is your thing, then do give it a try!
Books
There is no resource more valuable than a single book that explains each of the concepts in a lucid yet comprehensive manner, covering all the aspects. Luckily, I found such a set of books for learning about transformers.
Transformers for Natural Language Processing by Denis Rothman