Skip to main content

AdamW optimizer

 Nowadays, most LLMs get trained with the AdamW optimizer as opposed to the Adam optimizer. Why? 


There used to be a time when Adam was the king among optimizers, and it didn't make much sense to spend too much time trying to find a better one. This has changed recently, and AdamW has become the default optimizer for the LLM practitioners. 


It all depends on how we apply the regularization terms to the weight parameters. For the typical gradient descent algorithm, if we want to apply an L2​ regularization term, we modify the loss function such that: 


regularized loss = loss + L2 term


Then, we compute the gradient of that new loss to update the model parameters. The goal of the regularization term is to ensure that the weights don't grow too large and it acts as a weight decay mechanism when we update the weights.


In Adam, when we apply the L2 regularization, we regularize the loss function as well but it gets used differently. The loss function is used to compute the first and second moments and when we update the weights, the regularization term is in the numerator and denominator of the gradient update term. Because of it, the effect of the L2 regularization term is minimized and cannot act as a weight decay mechanism.


In AdamW, on the other hand, we DO NOT regularize the loss function and compute the gradient update independent from the regularization term. Only during the weights update do we add the regularization term so that it acts on the weights and not on the loss function. Because of it,  training with AdamW tends to be more stable and leads to models that generalize better! Good to know, right?


Comments

Popular posts from this blog

Python road map

 

Ways of pandas making faster

 FireDucks makes Pandas 125x Faster (changing one line of code) 🧠 Pandas has some major limitations: - Pandas only uses a single CPU core. - It often creates memory-heavy DataFrames. - Its eager (immediate) execution prevents global optimization of operation sequences. FireDucks is a highly optimized, drop-in replacement for Pandas with the same API.  There are three ways to use it: 1) Load the extension:  ↳ %𝐥𝐨𝐚𝐝_𝐞𝐱𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬; 𝗶𝗺𝗽𝗼𝗿𝘁 𝗽𝗮𝗻𝗱𝗮𝘀 𝗮𝘀 𝗽𝗱 2) Import FireDucks instead of Pandas:  ↳ 𝐢𝐦𝐩𝐨𝐫𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬 𝐚𝐬 𝐩𝐝 3) If you have a Python script, execute is as follows:  ↳ 𝗽𝘆𝘁𝗵𝗼𝗻3 -𝗺 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝗽𝗮𝗻𝗱𝗮𝘀 𝗰𝗼𝗱𝗲.𝗽𝘆 Done! ✔️ A performance comparison of FireDucks vs. DuckDB, Polars, and Pandas is shown in the video below. Official benchmarks indicate: ↳ Modin: ~1.0x faster than Pandas ↳ Polars: ~57x faster than Pandas ↳ FireDucks: ~125x faster than Pandas Credit- Ultan...

Top excel formula,master it