Is this the end of LoRA as a fine-tuning approach? Singular Value fine-tuning is here!
We have a new paper in town called the "Transformers Squared". It promises to adapt any LLM to any task without external intervention.
The core idea of the paper is to use Singular Value Decomposition (SVD) to factorize the weight matrices of transformers. During training, we learn different singular values for different tasks. More specifically, we learn to scale the singular values for different tasks. Tasks can be as diverse as math reasoning or coding. The possibilities are endless.
During inference, we do a two-pass inference. In the first pass, the LLM decides which "scale" to use for which task. In the second pass, the LLM adapts itself to be a specialist in the task and responds to our prompt.
How cool is that?
Paper Title: Transformer2: Self-adaptive LLMs
Paper: https://sakana.ai/transformer-squared
Blog: https://arxiv.org/abs/2501.06252
Video Explanation: https://youtu.be/r4UG8YfKseE?si=SiaFCH4rJ9UF-T1y
Hope it's useful!
Comments
Post a Comment