arXiv AI Publications - 2025 Week 42

Publications de la semaine #42 - 2025

Here are the top 5 most relevant AI papers from arXiv week 42/2025, complete with analysis and insights.

Publications at a Glance

Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch? Yijie Hu, Zihao Zhou, Kaizhu Huang, Xiaowei Huang, Qiufeng Wang | 10/16/2025

Adaptive Selection of Symbolic Languages for Improving LLM Logical Reasoning Xiangyu Wang, Haocheng Yang, Fengxiang Cheng, Fenrong Liu | 10/12/2025

Confidence as a Reward: Transforming LLMs into Reward Models He Du, Bowen Li, Chengxing Xie, Chang Gao, Kai Chen, Dacheng Tao | 10/15/2025

Boosting Instruction Following at Scale Ben Elder, Evelyn Duesterwald, Vinod Muthusamy | 10/16/2025

GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning Yao Zhang, Yu Wu, Haowei Zhang, Weiguo Li, Haokun Chen, Jingpei Wu, Guohao Li, Zhen Han, Volker Tresp | 10/16/2025

Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?

Published

10/16/2025

arXiv ID

[2510.14387v1]

Authors

Yijie Hu, Zihao Zhou, Kaizhu Huang, Xiaowei Huang, Qiufeng Wang

Key Insights

This research introduces IP-Merging, an innovative method that allows multi-modal large language models (MLLMs) to absorb mathematical reasoning abilities from off-the-shelf language models (LLMs) without the need for tuning. The study identifies critical reasoning-associated layers and addresses the parameter space alignment issue, significantly enhancing MLLMs' math reasoning performance.

Potential Impact

By enabling MLLMs to leverage the math reasoning capabilities of LLMs seamlessly, this approach could revolutionize applications in education, scientific research, and any domain requiring advanced mathematical problem-solving. It may also lead to more efficient model development practices by reducing the need for extensive retraining while retaining the models' broader functionalities.

back to list

Adaptive Selection of Symbolic Languages for Improving LLM Logical Reasoning

Published

10/12/2025

arXiv ID

[2510.10703v1]

Authors

Xiangyu Wang, Haocheng Yang, Fengxiang Cheng, Fenrong Liu

Key Insights

This research introduces a novel approach for improving logical reasoning in Large Language Models (LLMs) by adaptively selecting the most suitable symbolic language (SL) for translating natural language problems. It highlights that different logical reasoning tasks are best served by specific SL types, a factor previously overlooked in the existing literature.

Potential Impact

By enhancing the translation accuracy of logical reasoning problems through targeted SL selection, this method could significantly improve the performance of LLMs in applications requiring complex reasoning, such as automated theorem proving and decision-making systems. This advancement may also inspire new frameworks for integrating symbolic reasoning with LLMs, potentially reshaping the landscape of artificial intelligence in reasoning tasks.

back to list

Confidence as a Reward: Transforming LLMs into Reward Models

Published

10/15/2025

arXiv ID

[2510.13501v1]

Authors

He Du, Bowen Li, Chengxing Xie, Chang Gao, Kai Chen, Dacheng Tao

Key Insights

This research introduces Confidence-as-a-Reward (CRew), a novel training-free method that utilizes token-level confidence in large language models (LLMs) to enhance their reasoning capabilities without the need for extensive curated data. The study demonstrates that CRew not only outperforms existing training-free reward approaches but also aligns closely with actual reasoning performance, showcasing its potential as a robust evaluation metric.

Potential Impact

By leveraging model confidence as a reward metric, CRew could streamline the development of more efficient LLMs, reducing the reliance on costly training datasets and enabling quicker iterations in model training. Furthermore, the proposed CRew-DPO strategy has the potential to significantly improve self-training methods, thereby advancing applications in areas requiring high-quality reasoning and decision-making, such as education and automated systems.

back to list

Boosting Instruction Following at Scale

Published

10/16/2025

arXiv ID

[2510.14842v1]

Authors

Ben Elder, Evelyn Duesterwald, Vinod Muthusamy

Key Insights

This research introduces Instruction Boosting, a novel post-generation method designed to enhance the reliability of instruction following in large language models (LLMs), demonstrating significant improvements in instruction adherence rates. Additionally, the study presents the SCALEDIF benchmark to analyze the performance degradation associated with increasing instruction counts, revealing the underlying conflicts that contribute to this trend.

Potential Impact

By providing a systematic approach to improve instruction adherence in LLMs, Instruction Boosting could fundamentally change how developers create and optimize prompts, leading to more effective applications in various domains. The quantitative conflict scoring tool also offers actionable feedback, enabling a more nuanced understanding of instruction dynamics and potentially enhancing the overall performance of LLMs in complex tasks.

back to list

GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning

Published

10/16/2025

arXiv ID

[2510.14942v1]

Authors

Yao Zhang, Yu Wu, Haowei Zhang, Weiguo Li, Haokun Chen, Jingpei Wu, Guohao Li, Zhen Han, Volker Tresp

Key Insights

GroundedPRM introduces a novel tree-guided and fidelity-aware framework that significantly enhances the training of Process Reward Models (PRMs) by reducing noise in reward signals and improving step-level validation through external tool verification. This approach effectively combines structured reasoning paths and a hybrid reward aggregation mechanism, achieving notable performance improvements with a fraction of the data typically required.

Potential Impact

By providing a scalable and more reliable method for process supervision in Large Language Models, GroundedPRM could revolutionize multi-step reasoning applications, making them more efficient and accurate. Its ability to outperform existing methods, even with human-labeled supervision, suggests a shift towards automated, high-quality reasoning models that can be more widely adopted across various fields.

back to list