arXiv AI Publications - 2025 Week 45

Publications de la semaine #45 - 2025

Here are the top 5 most relevant AI papers from arXiv week 45/2025, complete with analysis and insights.

Publications at a Glance

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers Yi-Fei Liu, Yi-Long Lu, Di He, Hang Zhang | 11/5/2025

Large language models require a new form of oversight: capability-based monitoring Katherine C. Kellogg, Bingyang Ye, Yifan Hu, Guergana K. Savova, Byron Wallace, Danielle S. Bitterman | 11/5/2025

DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models Ruofan Zhang, Bin Xia, Zhen Cheng, Cairen Jian, Minglun Yang, Ngai Wong, Yuan Cheng | 11/3/2025

Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting Enhong Mu, Jinyu Cai, Yijun Lu, Mingyue Zhang, Kenji Tei, Jialong Li | 11/4/2025

How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks Wanda Hou, Leon Zhou, Hong-Ye Hu, Yi-Zhuang You, Xiao-Liang Qi | 11/2/2025

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

Published

11/5/2025

arXiv ID

[2511.03235v1]

Authors

Yi-Fei Liu, Yi-Long Lu, Di He, Hang Zhang

Key Insights

This research reveals that Large Language Models (LLMs) can accurately model the intercorrelation of psychological traits using minimal quantitative inputs, achieving performance that rivals traditional machine learning methods. The study highlights LLMs' ability to generate compressed, interpretable summaries of personality data, capturing complex psychological interactions.

Potential Impact

By enabling precise psychological profiling with minimal data, this approach could revolutionize applications in mental health assessment, personalized therapy, and human-computer interaction. Furthermore, it offers a novel framework for understanding the emergent reasoning capabilities of LLMs, potentially influencing future research in both psychology and AI.

back to list

Large language models require a new form of oversight: capability-based monitoring

Published

11/5/2025

arXiv ID

[2511.03106v1]

Authors

Katherine C. Kellogg, Bingyang Ye, Yifan Hu, Guergana K. Savova, Byron Wallace, Danielle S. Bitterman

Key Insights

This research introduces a novel oversight framework for large language models (LLMs) called capability-based monitoring, which shifts the focus from traditional task-based evaluations to an approach that assesses shared model capabilities. This innovative method addresses the unique challenges posed by LLMs in healthcare, emphasizing the need for systemic evaluations rather than isolated task assessments.

Potential Impact

By implementing capability-based monitoring, healthcare organizations can enhance the safety and effectiveness of LLMs, enabling more robust detection of systemic weaknesses and emergent behaviors across various applications. This approach could ultimately lead to more reliable and adaptive uses of AI in healthcare, fostering a collaborative environment for ongoing model improvement and oversight.

back to list

DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models

Published

11/3/2025

arXiv ID

[2511.01170v1]

Authors

Ruofan Zhang, Bin Xia, Zhen Cheng, Cairen Jian, Minglun Yang, Ngai Wong, Yuan Cheng

Key Insights

The DART framework introduces a novel approach to adaptive reasoning by adjusting the length of reasoning processes according to the difficulty of problems, effectively balancing efficiency and accuracy in large language models. This method significantly reduces computational effort while maintaining or enhancing performance, achieving a remarkable 81.2% reasoning truncation with a 5.33x speedup.

Potential Impact

DART could transform the way large language models are used in practical applications by enabling more efficient resource allocation and faster response times, particularly in complex problem-solving scenarios. This advancement may lead to broader adoption of LLMs in real-time applications, making them more accessible and practical for users across various fields.

back to list

Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting

Published

11/4/2025

arXiv ID

[2511.02534v1]

Authors

Enhong Mu, Jinyu Cai, Yijun Lu, Mingyue Zhang, Kenji Tei, Jialong Li

Key Insights

This research introduces the KLPEG framework, which integrates Knowledge Graphs with Large Language Models to enhance the efficiency and specificity of automated playtesting in modern video games. The innovative use of multi-hop reasoning allows for a more structured approach to identifying impacted functionalities from incremental updates, setting a new standard for automated testing methodologies.

Potential Impact

By improving the accuracy and efficiency of playtesting, this framework could significantly reduce the time and resources required for game developers to ensure quality in frequent updates. It has the potential to transform the quality assurance process in the gaming industry, allowing for more responsive and adaptive testing strategies that keep pace with rapid development cycles.

back to list

How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks

Published

11/2/2025

arXiv ID

[2511.00763v1]

Authors

Wanda Hou, Leon Zhou, Hong-Ye Hu, Yi-Zhuang You, Xiao-Liang Qi

Key Insights

This research introduces a novel quantitative framework for assessing the performance of large language models (LLMs) in repetitive deterministic tasks, revealing a sharp double exponential drop in accuracy that indicates a significant transition from reliable to unstable generation. The study also establishes a connection between attention-induced interference and sequence-level failures, providing valuable insights into the limitations of LLMs in executing independent operations.

Potential Impact

This work could fundamentally alter how researchers and developers approach the design and application of LLMs, particularly in tasks requiring high levels of precision and reliability. By identifying the intrinsic error rates and mechanisms behind performance degradation, it opens pathways for more targeted improvements in model architecture and training strategies, potentially enhancing the usability of LLMs in critical applications.

back to list