Skip to main content

Lora

 LoRA Adapters are, to me, one of the smartest strategies used in Machine Learning in recent years! It is one of those things where I think, "Wait! How didn't we think about that before?"


LoRA adapters came as a very natural strategy for fine-tuning models. The idea is to realize that any matrix of model parameters in a neural network of a trained model is just a sum of the initial values and the following gradient descent updates learned on the training data mini-batches:


𝜃(trained model) = 𝜃(initial value) + gradient descent updates


From there, we can understand a fine-tuned model as a set of model parameters where we continued to aggregate the gradients further on some specialized dataset:


𝜃(fine-tuned) = 𝜃(trained model) + more gradient descent updates


When we realize that we can decompose the pretraining learning and the fine-tuning learning into those 2 terms:


𝜃(fine-tuned) = 𝜃(trained model) + ΔW


then we understand that we don't need that decomposition to happen into the same matrix; we could sum the output of 2 different matrices instead. That is the idea behind LoRA: we allocate new weight parameters that will specialize in learning the fine-tuning data, and we freeze the original weights. 


As such, it is not very interesting because new matrices of model parameters would just double the required memory allocated to the model. So, the trick is to use a low-rank matrix approximation to reduce the number of operations and required memory. We introduce 2 new matrices A and B to approximate ΔW:


ΔW ~ BA


It is important to realize that, typically, the amount of training data used for fine-tuning is much smaller than the data used for pretraining. As a consequence, it is unlikely that we could even have enough data to get good statistical convergence on the full matrix ΔW. The low-rank approximation acts as a regularization technique that will help the model generalize better on unseen data.


--


👉 Learn more about ML in my newsletter https://newsletter.TheAiEdge.io


--


Comments

Popular posts from this blog

Python road map

 

Ways of pandas making faster

 FireDucks makes Pandas 125x Faster (changing one line of code) 🧠 Pandas has some major limitations: - Pandas only uses a single CPU core. - It often creates memory-heavy DataFrames. - Its eager (immediate) execution prevents global optimization of operation sequences. FireDucks is a highly optimized, drop-in replacement for Pandas with the same API.  There are three ways to use it: 1) Load the extension:  ↳ %𝐥𝐨𝐚𝐝_𝐞𝐱𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬; 𝗶𝗺𝗽𝗼𝗿𝘁 𝗽𝗮𝗻𝗱𝗮𝘀 𝗮𝘀 𝗽𝗱 2) Import FireDucks instead of Pandas:  ↳ 𝐢𝐦𝐩𝐨𝐫𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬 𝐚𝐬 𝐩𝐝 3) If you have a Python script, execute is as follows:  ↳ 𝗽𝘆𝘁𝗵𝗼𝗻3 -𝗺 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝗽𝗮𝗻𝗱𝗮𝘀 𝗰𝗼𝗱𝗲.𝗽𝘆 Done! ✔️ A performance comparison of FireDucks vs. DuckDB, Polars, and Pandas is shown in the video below. Official benchmarks indicate: ↳ Modin: ~1.0x faster than Pandas ↳ Polars: ~57x faster than Pandas ↳ FireDucks: ~125x faster than Pandas Credit- Ultan...

Top excel formula,master it