Skip to content
Go back

arXiv AI Publications - 2025 Week 49

Published:  at  11:00 AM
Available Languages:

Publications de la semaine #49 - 2025

Here are the top 5 most relevant AI papers from arXiv week 49/2025, complete with analysis and insights.

Publications at a Glance


When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents

Published
12/2/2025
arXiv ID
Authors
Tsimur Hadeliya, Mohammad Ali Jauhar, Nidhi Sakpal, Diogo Cruz

Key Insights

This research uncovers significant vulnerabilities in large language model agents when operating within extended context windows, revealing that performance can degrade dramatically due to the length and nature of input. It highlights the unpredictable behavior of refusal mechanisms in these models, which has not been thoroughly examined in prior studies.

Potential Impact

The findings suggest a need to reevaluate the safety protocols and performance metrics for LLM agents, particularly in applications requiring long-term reasoning or tool usage. This could lead to improved design and implementation of LLMs to ensure safer interactions in complex scenarios, ultimately influencing how these models are integrated into various fields.

back to list

Rectifying LLM Thought from Lens of Optimization

Published
12/1/2025
arXiv ID
Authors
Junnan Liu, Hongwei Liu, Songyang Zhang, Kai Chen

Key Insights

This research introduces a novel framework, RePro, to refine the reasoning capabilities of large language models by viewing chain-of-thought prompting through an optimization lens. By defining a surrogate objective function and employing a dual scoring mechanism, the study addresses and mitigates common suboptimal behaviors in LLM reasoning.

Potential Impact

The implementation of RePro could significantly enhance the performance of LLMs in various applications by leading to more efficient and effective reasoning processes, particularly in complex tasks like mathematics and coding. This advancement may reshape the development and deployment strategies for LLMs, promoting their use in scenarios where optimal reasoning is critical.

back to list

Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

Published
12/2/2025
arXiv ID
Authors
Julian Ma, Jun Wang, Zafeirios Fountas

Key Insights

This research introduces a novel behavioral benchmark, BayesBench, to evaluate the multimodal integration capabilities of large language models (LLMs) through a psychophysics lens, revealing that these models can exhibit Bayesian-like behavior even without explicit training. The study highlights a critical distinction between performance accuracy and the robustness of uncertainty handling, suggesting that existing benchmarks may overlook important aspects of model behavior.

Potential Impact

By providing tools like BayesBench and the Bayesian Consistency Score, this research could reshape how LLMs are evaluated and developed, emphasizing the need for rigorous assessments of uncertainty handling in addition to traditional accuracy metrics. This shift could influence the design of future multimodal architectures, leading to more reliable and adaptable AI systems in practical applications.

back to list

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Published
12/1/2025
arXiv ID
Authors
Cheng Gao, Huimin Chen, Chaojun Xiao, Zhiyi Chen, Zhiyuan Liu, Maosong Sun

Key Insights

This research identifies a specific subset of neurons, termed H-Neurons, that can predict hallucinations in large language models, revealing a novel neuron-level mechanism behind these inaccuracies. The study establishes a causal link between these neurons and over-compliance behaviors, contributing new understanding to the emergence of hallucinations during pre-training.

Potential Impact

By elucidating the neuron-level origins of hallucinations, this work paves the way for more targeted interventions in large language models, potentially enhancing their reliability and reducing misinformation. The findings may influence future model design and training strategies, leading to improved applications in critical fields such as healthcare, law, and education where accuracy is paramount.

back to list



Previous Post
arXiv AI Publications - 2025 Week 50
Next Post
arXiv AI Publications - 2025 Week 48