arXiv AI Publications - 2025 Week 50

Publications de la semaine #50 - 2025

Here are the top 5 most relevant AI papers from arXiv week 50/2025, complete with analysis and insights.

Publications at a Glance

WOLF: Werewolf-based Observations for LLM Deception and Falsehoods Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O'Brien, Kevin Zhu | 12/9/2025

rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection Sijia Chen, Baochun Li, Di Niu | 12/9/2025

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning Nearchos Potamitis, Lars Klein, Akhil Arora | 12/8/2025

FOAM: Blocked State Folding for Memory-Efficient LLM Training Ziqing Wen, Jiahuan Wang, Ping Luo, Dongsheng Li, Tao Sun | 12/8/2025

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering Jehyeok Yeon, Federico Cinus, Yifan Wu, Luca Luceri | 12/7/2025

WOLF: Werewolf-based Observations for LLM Deception and Falsehoods

Published

12/9/2025

arXiv ID

[2512.09187v1]

Authors

Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O'Brien, Kevin Zhu

Key Insights

The research introduces WOLF, a novel benchmark that facilitates the measurement of both deception production and detection in large language models through an interactive multi-agent framework inspired by the game Werewolf. This approach addresses the limitations of static evaluations by incorporating dynamic, adversarial interactions and a detailed taxonomy of deceptive behaviors.

Potential Impact

WOLF has the potential to significantly enhance the understanding and development of deception detection mechanisms in AI systems, moving beyond traditional methods to more accurately reflect real-world scenarios. By providing a structured and reproducible environment for evaluating deception, it could lead to improved applications in security, negotiation, and social robotics where trust and honesty are critical.

back to list

rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection

Published

12/9/2025

arXiv ID

[2512.08300v1]

Authors

Sijia Chen, Baochun Li, Di Niu

Key Insights

This research introduces rSIM, a novel mechanism that enhances the reasoning capabilities of large language models by integrating a small planner through multi-agent reinforcement learning. It demonstrates that even smaller LLMs can achieve significant reasoning advancements, outperforming larger counterparts by strategically guiding their thought processes.

Potential Impact

The rSIM approach could revolutionize how LLMs are utilized in applications requiring complex reasoning and problem-solving, making them more efficient and accessible. Its plug-in nature allows for easy integration into existing systems, enabling continual learning and adaptation across diverse tasks, which may lead to broader applications in fields like education, healthcare, and automated reasoning.

back to list

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

Published

12/8/2025

arXiv ID

[2512.07795v1]

Authors

Nearchos Potamitis, Lars Klein, Akhil Arora

Key Insights

This research introduces ReasonBENCH, a novel benchmark that quantifies the instability in large language model (LLM) reasoning, addressing a significant gap in current evaluation practices that often overlook the variability in model performance. By providing a modular evaluation library and a multi-run protocol, it enables more reliable assessments of LLMs in reasoning tasks, emphasizing the importance of reproducibility and cost-consistency.

Potential Impact

ReasonBENCH could fundamentally change how practitioners evaluate and select reasoning strategies for LLMs, encouraging a shift towards variance-aware reporting that enhances the reliability of model performance assessments. This benchmark may lead to the development of more robust LLMs, ultimately improving their application in critical areas requiring stable and reproducible reasoning, such as decision-making and automated problem-solving.

back to list

FOAM: Blocked State Folding for Memory-Efficient LLM Training

Published

12/8/2025

arXiv ID

[2512.07112v1]

Authors

Ziqing Wen, Jiahuan Wang, Ping Luo, Dongsheng Li, Tao Sun

Key Insights

The research introduces FOAM, a novel optimizer that significantly reduces memory usage during the training of large language models by compressing optimizer states while maintaining convergence rates similar to conventional methods. This approach innovatively combines block-wise gradient means and residual corrections to optimize memory efficiency without compromising model performance.

Potential Impact

FOAM has the potential to revolutionize the training of large language models by enabling the use of more complex models on hardware with limited memory, thus broadening accessibility for researchers and practitioners. Additionally, its compatibility with existing memory-efficient optimizers could lead to widespread adoption and enhancement of training efficiency across various applications in natural language processing and beyond.

back to list

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

Published

12/7/2025

arXiv ID

[2512.06655v1]

Authors

Jehyeok Yeon, Federico Cinus, Yifan Wu, Luca Luceri

Key Insights

The introduction of Graph-Regularized Sparse Autoencoders (GSAEs) represents a significant advancement in the safety steering of large language models by enabling the representation of complex safety concepts across multiple latent features rather than a single dimension. This innovative approach not only enhances the model's ability to refuse harmful prompts effectively but also preserves its utility for benign queries.

Potential Impact

GSAEs could transform the landscape of LLM safety by providing a more nuanced and adaptive framework for handling adversarial inputs, significantly improving the robustness of these models against manipulation. This could pave the way for broader applications of LLMs in sensitive areas, as enhanced safety measures might increase trust and reliance on their outputs across various domains.

back to list