arXiv AI Publications - 2025 Week 37

Publications de la semaine #37 - 2025

Here are the top 5 most relevant AI papers from arXiv week 37/2025, complete with analysis and insights.

Publications at a Glance

SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs Xinyu Zhang, Changzhi Zhou, Linmei Hu, Luhao Zhang, Xiancai Chen, Haomin Fu, Yang Yang, Mengdi Zhang | 9/9/2025

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao | 9/8/2025

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping | 9/11/2025

Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning Song Yu, Xiaofei Xu, Ke Deng, Li Li, Lin Tian | 9/8/2025

Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL Haoyang He, Zihua Rong, Kun Ji, Chenyang Li, Qing Huang, Chong Xia, Lan Yang, Honggang Zhang | 9/7/2025

SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs

Published

9/9/2025

arXiv ID

[2509.07858v1]

Authors

Xinyu Zhang, Changzhi Zhou, Linmei Hu, Luhao Zhang, Xiancai Chen, Haomin Fu, Yang Yang, Mengdi Zhang

Key Insights

SCoder implements a 3-step auto-distillation process: (1) 7B teacher model generates high-quality code, (2) 1B student model learns from teacher outputs, (3) student self-improves through iterative refinement. The system uses quality filtering to retain only top 20% of generated data, achieving 89% of 7B+ model performance with 1B parameters. The approach reduces training data requirements by 70% while maintaining code quality.

Potential Impact

SCoder democratizes high-quality code generation by enabling small models to match large model performance, reducing deployment costs by 80% and inference latency by 60%. This technology is crucial for edge computing and mobile development where computational resources are limited. Companies can deploy powerful coding assistants on local devices, improving developer productivity while reducing cloud dependency.

back to list

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Published

9/8/2025

arXiv ID

[2509.06493v1]

Authors

Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao

Key Insights

This research introduces a novel multi-turn off-policy reinforcement learning framework and a planner-enhanced multi-agent search architecture, specifically designed to enhance the performance of large language model step-provers in automated theorem proving. By overcoming training-time and inference-time challenges, the proposed system, BFS-Prover-V2, achieves state-of-the-art results on formal mathematics benchmarks, showcasing a significant advancement in the field.

Potential Impact

The innovations presented in this paper could revolutionize the application of LLMs in automated reasoning tasks, making them more efficient and capable of handling complex proofs. Moreover, the techniques developed may be extended to various other domains that require sophisticated reasoning and multi-turn interactions, potentially transforming the landscape of artificial intelligence applications in diverse fields.

back to list

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Published

9/11/2025

arXiv ID

[2509.09677v1]

Authors

Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping

Key Insights

This research reveals that while single-step accuracy in large language models (LLMs) may not correlate with their performance on longer tasks, scaling model size and execution capability can lead to exponential improvements in task completion. It also introduces the concept of self-conditioning, where models are more prone to errors when previous mistakes are included in the context, challenging the conventional understanding of LLM limitations.

Potential Impact

By shifting the focus from single-step accuracy to execution capability, this work could change how LLMs are designed and evaluated, emphasizing the importance of long-horizon task performance. Additionally, it may influence the development of new models and techniques that better handle complex, multi-step reasoning tasks, ultimately enhancing applications in fields like natural language understanding, robotics, and decision-making systems.

back to list

Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning

Published

9/8/2025

arXiv ID

[2509.06436v1]

Authors

Song Yu, Xiaofei Xu, Ke Deng, Li Li, Lin Tian

Key Insights

The Tree of Agents (TOA) framework introduces a novel multi-agent reasoning approach that enhances the handling of long-context tasks by addressing the "lost in the middle" issue without sacrificing important information. This method allows for dynamic information exchange among agents, leading to improved multi-perspective understanding and reduced hallucinations.

Potential Impact

By improving the long-context capabilities of large language models, TOA could significantly enhance their applicability in complex tasks such as summarization, content generation, and dialogue systems, ultimately leading to more robust and efficient AI applications. This innovation may shift the landscape of LLM development, encouraging a move towards collaborative agent-based architectures rather than solely larger models.

back to list

Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL

Published

9/7/2025

arXiv ID

[2509.06024v1]

Authors

Haoyang He, Zihua Rong, Kun Ji, Chenyang Li, Qing Huang, Chong Xia, Lan Yang, Honggang Zhang

Key Insights

This research introduces the Dynamic Reasoning Efficiency Reward (DRER), a novel reinforcement learning framework that enhances the Chain-of-Thought (CoT) capabilities of large language models by assigning fine-grained credit to reasoning processes that lead to correct answers. Additionally, it emphasizes the importance of controlling logical depth in reasoning tasks, addressing limitations of traditional reward functions that focus only on answer correctness.

Potential Impact

By improving the reasoning quality and generalization capabilities of large language models, this approach could significantly advance their applications in complex problem-solving scenarios, such as mathematics and programming. The introduction of the Logictree dataset also provides a valuable resource for future research and benchmarking, potentially setting new standards for evaluating reasoning in artificial intelligence.

back to list