large language model | VALIANT

A Survey on LLM-based Conversational User Simulation

waddelma — Wed, 17 Jun 2026 19:19:23 +0000

Ni, Bo; Wang, Leyao; Wang, Yu; Kveton, Branislav; Dernoncourt, Franck; Xia, Yu; Chen, Hongjie; Leura, Reuben; Basu, Samyadeep; Mukherjee, Subhojyoti; Mathur, Puneet; Ahmed, Nesreen; Wu, Junda; Li, Li; Zhang, Huixin; Zhang, Ruiyi; Yu, Tong; Kim, Sungchul; Gu, Jiuxiang; Tu, Zhengzhong; Siu, Alexa; Wang, Zichao; Yoon, David Seunghyun; Lipka, Nedim; Park, Namyong; Lin, Zihao; Bui, Trung; Zhao, Yue; Derr, Tyler; Rossi, Ryan A. (2026).��.��EACL 2026 – 19th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Vol. 1 – (Long Papers), 4266–4301.��

User simulation has long been important in computer science because it helps researchers build and test many kinds of systems. Since language is the main way people communicate, being able to simulate conversations has become especially valuable. Recent advances in large language models, or LLMs, have made this much more realistic by allowing computers to generate synthetic user conversations that closely resemble human dialogue. This paper reviews recent progress in LLM-based conversational user simulation. It introduces a new framework for organizing the field based on how detailed the simulated user is and what the simulation is meant to achieve. The paper also examines the main methods used and how researchers evaluate these systems. Overall, it aims to give readers a clear view of the latest developments in conversational user simulation and to identify the main challenges that still need to be solved.

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks

waddelma — Wed, 17 Jun 2026 15:47:09 +0000

Fu, Haowei; Ni, Bo; Xu, Han; Liu, Kunpeng; Lin, Dan; Derr, Tyler. (2026).��.��In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics: Findings of EACL 2026, 2786–2799.��

Retrieval-Augmented Generation, or RAG, and Supervised Fine-Tuning, or SFT, are two common ways to give large language models outside knowledge so they can do better on tasks that require a lot of information. But adding this knowledge can also create new privacy risks. One concern is membership inference attacks, or MIAs, which try to figure out whether a particular piece of data was used to train the model. That can be especially serious in sensitive settings where privacy and trust matter. In this study, the researchers first tested how vulnerable RAG-based and SFT-based language models are to different kinds of these attacks. They then introduced a new defense method called Ensemble Privacy Defense, or EPD, which combines the outputs of three systems: the knowledge-enhanced language model, a base language model, and a separate judge model. Together, these models help decide on answers in a way that makes it harder for attackers to infer whether training data was included. The experiments showed that EPD reduced the success of membership inference attacks while still keeping answer quality strong.

A Novel Approach to Evaluating the Effectiveness of Large Language Models for Multimodal Analysis of Embodied Learning in Classrooms

waddelma — Wed, 17 Jun 2026 15:45:48 +0000

Fonteles, Joyce Horn; Sivakumaran, Nithin; Cohn, Clayton; Coursey, Austin; Yu, Shoubin; Stengel-Eskin, Elias; Ashwin, T. S.; Bansal, Mohit; Biswas, Gautam. (2026).��.��16th International Learning Analytics and Knowledge Conference, LAK 2026, 536–546.��

This paper describes a method that uses large language models, or LLMs, to combine information from several sources and infer students’ metacognitive behaviors, meaning the ways they plan, monitor, reflect on, and adjust their own learning. The study analyzes multimodal classroom data, including students’ movements, gaze, gestures, and speech, collected during a mixed-reality simulation shown on a classroom screen. Instead of processing each type of data separately, the researchers use the LLM at a late-fusion stage, meaning after the different signals have already been analyzed, to bring them together through prompting strategies such as zero-shot prompting, self-consistency reasoning, and carefully designed prompts. They also test whether an “LLM-as-a-Judge,” an LLM used to evaluate model outputs, can reliably assess these behavior labels at scale and reduce the need for manual review. Using a balanced set of human-checked examples and control cases, they compare text-based LLMs, such as GPT-5, with visual-language models, or VLMs, such as Qwen2.5-VL, which can directly process images or video. The results show that text-based LLMs used in this late-fusion way can outperform VLMs even without raw video, and that prompt design can shift the model toward being more precise or more sensitive when the behavior is subtle or brief. Overall, the findings suggest that LLMs can be effective tools for combining multimodal learning data and that LLM-as-a-Judge can support scalable, human-in-the-loop evaluation.

Figure 1:

On-screen, each tracked student is represented as a molecule. As they move through the classroom space, their corresponding molecule moves in real time, allowing them to explore the photosynthesis cycle through embodied interaction.

Using Large Language Models to Detect Socially Shared Regulation of Collaborative Learning

waddelma — Wed, 17 Jun 2026 15:43:51 +0000

Zhang, Jiayi; Borchers, Conrad; Cohn, Clayton; Srivastava, Namarata; Snyder, Caitlin; Guo, Siyuan; Ashwin, T. S.; Mohammed, Naveeduddin; Noh, Haley; Biswas, Gautam. (2026).��.��16th International Learning Analytics and Knowledge Conference, LAK 2026, 883–890.��

The field of learning analytics has made progress in automatically identifying complicated learning behaviors from different kinds of data, including text, audio, and system logs. But most of this work has focused on individual problem-solving, not on collaborative, open-ended problem-solving, where students work together on tasks and their behavior can be harder to interpret because group interactions are more varied and less predictable. This study extended predictive models to detect socially shared regulation of learning, or SSRL, which refers to the ways group members plan, monitor, and adjust their learning together, in collaborative computational modeling environments. The researchers used large language models, or LLMs, as summarization tools to turn student dialogue into task-relevant representations that could be matched with system log data. They compared several kinds of inputs, including text-only embeddings, which are numerical representations of the dialogue; context-rich embeddings, which include more information about the situation; and features derived from the logs. The results showed that text-only embeddings were often best at detecting SSRL behaviors tied to carrying out the task or group activity, such as going off task or asking for help. Meanwhile, the context-rich and multimodal features were especially useful for identifying planning and reflection. Overall, the study suggests that embedding-based models could make it possible to detect group learning behaviors at scale and eventually support real-time feedback and adaptive help in collaborative learning settings.

Figure 1:

Example solution for the C2STEM Truck Task with Task Context categories [].

The Role of LLM-Powered Conversational Agents in Supporting Inquiry in a Narrative-Centered Learning Environment: A Learning Analytics Study

waddelma — Wed, 17 Jun 2026 14:26:31 +0000

Srivastava, Namrata; Humburg, Megan; Burriss, Sarah; Jain, Shruti; Cohn, Clayton; Kim, Yeojin; Timalsina, Umesh; Danish, Joshua; Hmelo-Silver, Cindy E.; Glazewski, Krista; Lester, James; Biswas, Gautam. (2026).��.��In 16th International Learning Analytics and Knowledge Conference, LAK 2026��(pp. 947–954).��

This study looked at how students use AI chat agents in problem-based learning, or PBL, where learners solve open-ended problems by exploring information and ideas. The researchers focused on conversational agents, or CAs, powered by large language models (LLMs), which are AI systems that can generate human-like text. These agents were built to support different parts of inquiry in a story-based learning environment: one provided content knowledge, one gave feedback on arguments, and one evaluated arguments. Using log data from 15 student groups and Pedaste et al.’s inquiry cycle, a framework that describes the main stages of inquiry learning, the researchers examined how students interacted with each agent over time. They found that the agents played different but complementary roles: they could help students search for information, revise their ideas, and reflect on their thinking. At the same time, the agents sometimes steered students in ways that limited exploration. Overall, the study shows that learning analytics, which uses digital data to study learning behavior, can help educators understand how students work with AI support and design more flexible tools for classroom use.

Figure 1:

Example of game characters and conversational agent interface.

ComCat: Expertise-Guided Context Generation to Enhance Code Comprehension

waddelma — Thu, 26 Mar 2026 19:50:56 +0000

Skyler Grandel; Scott Thomas Andersen; Yu Huang; Kevin Leach (2026).��.��ACM Transactions on Software Engineering and Methodology, 35(3), Article 82.��

Software maintenance makes up a large share of the total cost of software over its lifetime, and a big part of that cost comes from understanding existing code. One way to make code easier to understand is through documentation, especially comments that summarize what the code does or explain why it does it. In this work, we introduce ComCat, a system that uses large language models (LLMs, which are AI models trained on very large amounts of text) together with expert guidance to automatically generate useful comments for source code. ComCat is designed to choose the most relevant and informative comment for a specific piece of code. For C/C++ files, the system works in three steps: it first finds places where comments would be most helpful, then decides what kind of comment is needed, and finally writes the comment. In a study with human participants, ComCat’s comments improved code understanding on three software engineering tasks by up to 13% for most participants. The generated comments were also judged to be at least as accurate and readable as human-written comments, and they were preferred over standard ChatGPT-generated comments for up to 92% of code snippets. We also released a dataset containing code snippets, human-written comments, and human-labeled comment categories. Overall, ComCat shows that LLMs can be used to meaningfully improve how well people understand code.

Fig. 1.

ComCat��pipeline and study procedure. We use three instances of HSR to inform��ComCat’s design (1) and evaluate developer performance (2) and preference (3) with our tool.��ComCat��takes C/C++ code as input, using a Code Parser to identify code Snippets to be commented. These Snippets are classified, and the class of each Snippet is used in combination with our Template Catalog to create a prompt for each Snippet. These prompt ChatGPT, which outputs the commented code. This pipeline is informed by developer expertise, but it is fully automated and requires no human intervention.

Demystifying the Power of Large Language Models in Graph Structure Generation

waddelma — Wed, 25 Feb 2026 02:21:16 +0000

Wang, Yu; Rossi, Ryan A.; Park, Namyong; Ahmed, Nesreen K.; Koutra, Danai; Dernoncourt, Franck; & Derr, Tyler. (2025).��.��2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025, 8189–8204.��

Large Language Models (LLMs) have been very successful at analyzing graphs—such as predicting node classification (labeling items in a network) and link prediction (predicting missing connections). However, little research has explored whether LLMs can actually generate new graph structures. This study investigates that question.

We designed prompts that guide LLMs to write code that creates graphs with specific structural properties, using ideas from network science. Different types of networks—such as social networks or transportation networks—have different structural patterns. For example, the clustering coefficient measures how often triangles appear in social networks, while square patterns may reflect road layouts in transportation systems. We first tested whether LLMs could generate graphs that match these kinds of domain-specific structural properties.

Next, we selected the best-performing configurations and compared LLM-generated graphs with those produced by established graph generative models across multiple domains. Our results provide insight into how well LLMs can generate realistic network structures and where their strengths and limitations lie.

Knowledge distillation and dataset distillation of large language models: emerging trends, challenges, and future directions

waddelma — Fri, 19 Dec 2025 16:23:50 +0000

Fang, L., Yu, X., Cai, J., Chen, Y., Wu, S., Liu, Z., Yang, Z., Lu, H., Gong, X., Liu, Y., Ma, T., Ruan, W., Abbasi, A., Zhang, J., Wang, T., Latif, E., Liu, W., Zhang, W., Kolouri, S., Zhai, X., Zhu, D., Zhong, W., Liu, T., & Ma, P. (2026).��.��Artificial Intelligence Review,��59(1), 17.��

The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and linguistic diversity. We first examine key methodologies in KD, such as task-specific alignment, rationale-based training, and multi-teacher frameworks, alongside DD techniques that synthesize compact, high-impact datasets through optimization-based gradient matching, latent space regularization, and generative synthesis. Building on these foundations, we explore how integrating KD and DD can produce more effective and scalable compression strategies. Together, these approaches address persistent challenges in model scalability, architectural heterogeneity, and the preservation of emergent LLM abilities. We further highlight applications across domains such as healthcare and education, where distillation enables efficient deployment without sacrificing performance. Despite substantial progress, open challenges remain in preserving emergent reasoning and linguistic diversity, enabling efficient adaptation to continually evolving teacher models and datasets, and establishing comprehensive evaluation protocols. By synthesizing methodological innovations, theoretical foundations, and practical insights, our survey charts a path toward sustainable, resource-efficient LLMs through the tighter integration of KD and DD principles.

Fig. 2

Overview of knowledge distillation in LLMs. Knowledge is distilled from a teacher LLM, which has been trained on a large existing database. This knowledge, potentially enriched with current, task-specific data, is transferred to a smaller student LLM. By learning from both the teacher’s guidance and the current data, the student LLM becomes more efficient and effective at performing downstream tasks

Enhancing Code LLM Training with Programmer Attention

waddelma — Fri, 26 Sep 2025 19:52:44 +0000

Zhang, Yifan, Huang, Chen, Karas, Zachary, Nguyen, Thuy Dung, Leach, Kevin, & Huang, Yu. (2025). Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering.

Human attention, such as where programmers look while reading or writing code, provides valuable signals that are not yet fully used in training large language models (LLMs) for code. These signals offer insights that go beyond machine-driven attention. However, collecting eye-tracking data is complex and costly, and there has been little progress in systematically applying these signals for training code LLMs.

To address this, we propose a full pipeline that combines data augmentation and reward-based fine-tuning. Specifically, we introduce: (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that transforms raw fixations into learnable attention motifs, and (3) a reward-guided strategy that integrates these insights into a CodeT5 supervised fine-tuning process.

Our experiments show a +7.16 improvement in CodeBLEU on the CodeXGlue benchmark for code summarization, demonstrating that combining human and machine attention can significantly enhance code intelligence. We hope this work encourages further exploration of human-centered approaches in next-generation AI for Software Engineering (AI4SE).

What are the future directions for microplastics characterization? A regex-llama data mining approach for identifying emerging trends

waddelma — Mon, 25 Aug 2025 21:03:50 +0000

Gomes, Fernando, Bhansali, Shekhar, da Silveira Maranhão, Fabiola, Valladão, Viviane Silva, & Velasco, Karine. (2025). “.” Anais da Academia Brasileira de Ciências, 97, e20241345.

This study presents a new hybrid method to identify and analyze techniques used to study microplastics. By combining pattern-recognition software (regex) with the Llama 3.2:3b language model, we can better detect and understand both traditional and emerging techniques. Established methods like Raman and FTIR spectroscopy are examined alongside advanced tools such as X-ray Photoelectron Spectroscopy (XPS) and Surface-Enhanced Raman Spectroscopy (SERS). This approach improves both the speed and accuracy of identifying complex terms used in microplastics research. Using VOSDataAnalyzer and VOSviewer, we mapped connections and trends among related terms, identifying the 15 most commonly used and emerging techniques. Our analysis shows a shift toward more sensitive and innovative methods in microplastic studies. This Regex-Llama approach, introduced here for the first time, can be applied broadly to tasks such as studying pollutants in the environment, evaluating material breakdown in engineering, and assessing the health impacts of tiny contaminants. Overall, this strategy helps support environmental assessments and guide pollution reduction efforts across multiple fields.

Figure 1. �� Representation of the chemical structure of the most common polymers found in microplastic pollution, in sequence: Polyethylene (PE), Polypropylene (PP), Polystyrene (PS), and Polyethylene Terephthalate (PET).