In the field of artificial intelligence, the latest research result of Zhang Muhan's team at Peking University, the Long Input Fine-Tuning (LIFT) framework, has brought revolutionary breakthroughs to long text processing. This innovative method completely changes the traditional long text processing idea by training long input text into model parameters, allowing any short context window model to obtain long text processing capabilities.

Currently, large models face two major challenges when processing long texts: first, the square complexity of the traditional attention mechanism leads to huge computational and memory overhead when processing long texts; second, the model is difficult to understand the long-range dependencies scattered throughout long texts. Existing solutions such as RAG and long context adaptation have their own limitations. RAG relies on accurate retrieval and is prone to introduce noise to cause hallucinations. However, long context adaptation has high inference complexity and the context window is still limited.
The LIFT framework includes three key components: dynamic and efficient long input training, gated memory adapter that balances model capabilities, and auxiliary task training. Split long text into overlapping fragments through segmented language modeling to avoid the increase in reasoning complexity and long-range dependency loss caused by excessively long context. Design a dedicated Gated Memory Adapter architecture to dynamically balance the In-Context Learning capabilities of the original model and memory understanding of long inputs. By pre-training LLM automatically generates Q&A assisted tasks based on long text, compensating for the possible loss of the model's ability in segmentation training.

LIFT has achieved significant improvements in multiple long context benchmarks. For example, in the LooGLE long-dependency Q&A, the accuracy rate of Llama38B increased from 15.44% to 29.97%; in the LooGLE short-dependency Q&A, the accuracy rate of Gemma29B increased from 37.37% to 50.33%; in the LongBench multiple subtasks, Llama3 has significantly improved in 4 of the 5 subtasks through LIFT. Ablation experiments show that the Gated Memory architecture improves GPT-4score on the LooGLE ShortQA dataset by 5.48% compared to the original model fine-tuned using PiSSA.
Although LIFT has achieved remarkable results, there are still some limitations: the effect of the "find needle in a haystack" task that requires accurate information extraction is still not ideal; the model needs to optimize the parametric knowledge extraction ability obtained by LIFT; the design of auxiliary tasks is heavily dependent on downstream testing tasks, and its universality is limited; how to better balance memory and original abilities is still the focus of research. The research team encourages the community to jointly explore the potential of LIFT with broader training data, richer models, more advanced auxiliary task design and stronger computing resources.

LIFT provides a new long text processing paradigm that transforms contextual knowledge into parameterized knowledge, a way of thinking similar to the process of human short-term memory transforming into long-term memory. Although there is still a distance from completely solving long-context challenges, LIFT has opened up a promising research direction. Paper address: https://arxiv.org/abs/2502.14644