post cover

Vision Models

At the same time LLMs are changing the world, we have witnessed fast progress in AI image and video generation. AI models are now able to synthesis high quality, high fidelity and high resolution images via text prompting. Mainstream generative model architectures have shifted from VAEs, flows and GANs to diffusions and transformers. In this post I take notes of several vision models. It may be regularly updated to reflect latest research development....

February 12, 2024 · 28 min · Fei Li
post cover

Better Transformers

In this post I will walk through the transformer layer and several improvements over this architecture that are commonly employed in many popular open source large language models (LLMs) today, for example Llama. Discussed include SwiGLU and RMSNorm layers, RoPE and ALiBi position embeddings; and finally Flash Attention for scaling attention calculation to long sequences. We will use Llama source code as example implementation, and toward the end I’ll go through the rest of Llama’s source code....

August 31, 2023 · 28 min · Fei Li