Fu, Haowei; Ni, Bo; Xu, Han; Liu, Kunpeng; Lin, Dan; Derr, Tyler. (2026).Ìý.ÌýIn Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics: Findings of EACL 2026, 2786–2799.Ìý
Retrieval-Augmented Generation, or RAG, and Supervised Fine-Tuning, or SFT, are two common ways to give large language models outside knowledge so they can do better on tasks that require a lot of information. But adding this knowledge can also create new privacy risks. One concern is membership inference attacks, or MIAs, which try to figure out whether a particular piece of data was used to train the model. That can be especially serious in sensitive settings where privacy and trust matter. In this study, the researchers first tested how vulnerable RAG-based and SFT-based language models are to different kinds of these attacks. They then introduced a new defense method called Ensemble Privacy Defense, or EPD, which combines the outputs of three systems: the knowledge-enhanced language model, a base language model, and a separate judge model. Together, these models help decide on answers in a way that makes it harder for attackers to infer whether training data was included. The experiments showed that EPD reduced the success of membership inference attacks while still keeping answer quality strong.
