Skip to content
Libro Library Management System
Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting cover
Bibliographic record

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

Authors
Howard Chen, Noam Razin, Karthik Narasimhan, Danqi Chen
Publication year
2025
OA status
oa_green
Print

Need access?

Ask circulation staff for physical copies or request digital delivery via Ask a Librarian.

Digital copy

Unavailable in your region (PD status unclear).

Abstract

Adapting language models (LMs) to new tasks via post-training carries the
risk of degrading existing capabilities -- a phenomenon classically known as
catastrophic forgetting. In this paper, toward identifying guidelines for
mitigating this phenomenon, we systematically compare the forgetting patterns
of two widely adopted post-training methods: supervised fine-tuning (SFT) and
reinforcement learning (RL). Our experiments reveal a consistent trend across
LM families (Llama, Qwen) and tasks (instruction following, general knowledge,
and arithmetic reasoning): RL leads to less forgetting than SFT while achieving
comparable or higher target task performance. To investigate the cause for this
difference, we consider a simplified setting in which the LM is modeled as a
mixture of two distributions, one corresponding to prior knowledge and the
other to the target task. We identify that the mode-seeking nature of RL, which
stems from its use of on-policy data, enables keeping prior knowledge intact
when learning the target task. We then verify this insight by demonstrating
that the use on-policy data underlies the robustness of RL to forgetting in
practical settings, as opposed to other algorithmic choices such as the KL
regularization or advantage estimation. Lastly, as a practical implication, our
results highlight the potential of mitigating forgetting using approximately
on-policy data, which can be substantially more efficient to obtain than fully
on-policy data.

Copies & availability

Realtime status across circulation, reserve, and Filipiniana sections.

Self-checkout (no login required)

  • Enter your student ID, system ID, or full name directly in the table.
  • Provide your identifier so we can match your patron record.
  • Choose Self-checkout to send the request; circulation staff are notified instantly.
Barcode Location Material type Status Action
No holdings recorded.

Digital files

Preview digitized copies when embargo permits.

Links & eResources

Access licensed or open resources connected to this record.

  • oa Direct