arXiv AI Publications - 2025 Week 44

Publications de la semaine #44 - 2025

Here are the top 5 most relevant AI papers from arXiv week 44/2025, complete with analysis and insights.

Publications at a Glance

Multi-Agent Evolve: LLM Self-Improve through Co-evolution Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhang, Mostofa Patwary, Jiaxuan You | 10/27/2025

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning Qianli Shen, Daoyuan Chen, Yilun Huang, Zhenqing Ling, Yaliang Li, Bolin Ding, Jingren Zhou | 10/30/2025

Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank Jiayu Liu, Wei Dai, Zhenya Huang, Ning Miao, Enhong Chen | 10/28/2025

Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion Xianjun Gao, Jianchun Liu, Hongli Xu, Liusheng Huang | 10/28/2025

Zero Reinforcement Learning Towards General Domains Yuyuan Zeng, Yufei Huang, Can Xu, Qingfeng Sun, Jianfeng Yan, Guanghui Xu, Tao Yang, Fengzong Lian | 10/29/2025

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

Published

10/27/2025

arXiv ID

[2510.23595v3]

Authors

Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhang, Mostofa Patwary, Jiaxuan You

Key Insights

The Multi-Agent Evolve (MAE) framework introduces a novel approach to enhancing large language models (LLMs) by enabling them to self-improve through co-evolution, utilizing a triplet of interacting agents for question generation, solution attempts, and evaluation. This method significantly reduces dependence on human-curated datasets and demonstrates a measurable improvement in reasoning capabilities across diverse tasks.

Potential Impact

By minimizing reliance on human annotation and allowing LLMs to evolve autonomously, MAE could revolutionize the scalability and generalization of reinforcement learning applications in language models. This innovation has the potential to expand the practical deployment of LLMs in various domains, making them more adaptable and efficient in real-world scenarios.

back to list

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

Published

10/30/2025

arXiv ID

[2510.26374v1]

Authors

Qianli Shen, Daoyuan Chen, Yilun Huang, Zhenqing Ling, Yaliang Li, Bolin Ding, Jingren Zhou

Key Insights

The BOTS framework introduces a novel approach to task selection in reinforcement finetuning for large language models by leveraging Bayesian inference to adaptively estimate task difficulties. This method significantly enhances data efficiency and model performance by providing a structured balance between exploration and exploitation without incurring high rollout costs.

Potential Impact

By optimizing task selection in reinforcement finetuning, BOTS could lead to more effective training protocols for LLMs, enabling them to better align with human preferences and improve reasoning capabilities. This advancement may transform applications across various domains, allowing for more nuanced and efficient deployment of LLMs in real-world scenarios.

back to list

Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank

Published

10/28/2025

arXiv ID

[2510.24299v1]

Authors

Jiayu Liu, Wei Dai, Zhenya Huang, Ning Miao, Enhong Chen

Key Insights

This research introduces a novel method, the Self-Indicator, which leverages the internal correlation matrix of large language models (LLMs) to assess the credibility of their reasoning paths without external resources. It demonstrates that this approach can significantly enhance the accuracy of reasoning path verification while minimizing computational overhead.

Potential Impact

By providing a more efficient and resource-independent method for verifying reasoning in LLMs, this research could streamline the deployment of these models in practical applications where accuracy is critical, such as in healthcare or legal domains. This innovation may lead to broader adoption of LLMs by reducing reliance on complex external verification systems and improving trust in their outputs.

back to list

Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion

Published

10/28/2025

arXiv ID

[2510.24390v1]

Authors

Xianjun Gao, Jianchun Liu, Hongli Xu, Liusheng Huang

Key Insights

This research introduces Orion, an innovative framework that enhances Large Language Model (LLM) reasoning by employing dependency-aware query decomposition and logic-parallel content expansion, addressing both efficiency and quality. By effectively separating the reasoning process into key point generation and content expansion, Orion significantly improves token generation speed and reduces response latency while maintaining higher reasoning accuracy.

Potential Impact

Orion's approach could revolutionize the integration of LLMs in real-time applications, enabling more sophisticated and responsive AI-powered search and conversational agents that meet modern web demands. This advancement could lead to broader adoption of LLMs in various domains, enhancing user experiences and expanding the capabilities of interactive services.

back to list

Zero Reinforcement Learning Towards General Domains

Published

10/29/2025

arXiv ID

[2510.25528v1]

Authors

Yuyuan Zeng, Yufei Huang, Can Xu, Qingfeng Sun, Jianfeng Yan, Guanghui Xu, Tao Yang, Fengzong Lian

Key Insights

This research introduces a novel zero reinforcement learning (Zero-RL) paradigm that enhances the reasoning capabilities of large language models (LLMs) by integrating verifiable and non-verifiable reward signals, addressing a significant gap in existing methodologies. The proposed approach not only improves reasoning in complex and diverse scenarios but also incorporates a smooth length penalty to prevent reward hacking, marking an innovative step in reinforcement learning applications.

Potential Impact

By enabling LLMs to operate effectively across a broader range of domains, including those with less straightforward reward verification, this research could significantly enhance the versatility of AI in real-world applications. This advancement may lead to more robust AI systems capable of tackling a variety of reasoning tasks, thus expanding the potential uses of LLMs in fields such as education, decision-making, and complex problem-solving.

back to list