arXiv AI Publications - 2025 Week 36

Publications de la semaine #36 - 2025

Here are the top 5 most relevant AI papers from arXiv week 36/2025, complete with analysis and insights.

Publications at a Glance

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel | 9/3/2025

Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent Chunlong Wu, Zhibo Qu | 9/4/2025

CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs Jay Vaghasiya, Omkar Ghugarkar, Vishvesh Bhat, Vipul Dholaria, Julian McAuley | 8/31/2025

Counterfactual Sensitivity for Faithful Reasoning in Language Models Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma | 9/1/2025

Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu | 8/31/2025

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

Published

9/3/2025

arXiv ID

[2509.03581v1]

Authors

Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

Key Insights

The adaptive planning system uses a real-time difficulty classifier that categorizes tasks into 3 levels: simple (direct execution), medium (light planning), complex (full planning). The architecture implements an early stopping mechanism that halts planning when confidence reaches 85%, reducing inference costs by 45% while improving accuracy by 28%. The system uses a 2-stage training pipeline with curriculum learning to optimize planning decisions.

Potential Impact

This technology revolutionizes LLM agent economics by enabling dynamic resource allocation based on task complexity, reducing operational costs by 40-60% while maintaining performance. Companies can deploy more cost-effective AI agents that automatically optimize their computational usage, crucial for large-scale deployments. This approach could become the standard for intelligent agent systems in production environments.

back to list

Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent

Published

9/4/2025

arXiv ID

[2509.03990v1]

Authors

Chunlong Wu, Zhibo Qu

Key Insights

This research introduces Meta-Policy Reflexion (MPR), a hybrid framework that allows large language model agents to consolidate and reuse reflective knowledge across tasks without requiring model weight updates. MPR enhances adaptability and safety through a structured Meta-Policy Memory and mechanisms that enforce domain constraints while maintaining the benefits of language-based reflection.

Potential Impact

By enabling the reuse of corrective knowledge and improving task adaptability, MPR could significantly reduce the computational resources required for training LLM agents, making them more efficient in real-world applications. This innovation has the potential to change the landscape of LLM deployment across various domains, enhancing their reliability and effectiveness in complex multi-task environments.

back to list

CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs

Published

8/31/2025

arXiv ID

[2509.00971v2]

Authors

Jay Vaghasiya, Omkar Ghugarkar, Vishvesh Bhat, Vipul Dholaria, Julian McAuley

Key Insights

CoreThink introduces a novel reasoning method called General Symbolics, which enhances the reasoning capabilities of large language models (LLMs) without the need for fine-tuning, achieving state-of-the-art performance across multiple benchmarks. This innovative approach contrasts with existing paradigms by focusing on a symbolic reasoning layer that promotes better performance in tool-calling, code generation, and planning tasks.

Potential Impact

The implementation of CoreThink could significantly transform how LLMs are utilized in reasoning-intensive applications by providing a more efficient and effective method for task completion, ultimately leading to more reliable and accurate outcomes. As the limitations of current reasoning techniques become apparent, this research may drive a paradigm shift in the development and deployment of AI systems, encouraging a broader adoption of symbolic reasoning frameworks.

back to list

Counterfactual Sensitivity for Faithful Reasoning in Language Models

Published

9/1/2025

arXiv ID

[2509.01544v1]

Authors

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Key Insights

This research introduces Counterfactual Sensitivity Regularization (CSR), a novel training objective that enhances the integrity of reasoning in large language models by enforcing a direct dependence between intermediate reasoning processes and final outputs. Additionally, it proposes a new metric, Counterfactual Outcome Sensitivity (COS), to evaluate the faithfulness of model predictions under counterfactual perturbations.

Potential Impact

By improving the reliability of reasoning in language models, CSR could significantly enhance their applicability in high-stakes domains such as healthcare or legal decision-making, where trustworthiness is crucial. Furthermore, the approach may shift the paradigm in model training by integrating counterfactual reasoning techniques, leading to more robust AI systems capable of better understanding and generating human-like reasoning.

back to list

Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction

Published

8/31/2025

arXiv ID

[2509.03540v1]

Authors

Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu

Key Insights

This research introduces a novel framework for improving the factual accuracy of Large Language Models (LLMs) by dynamically constructing and expanding knowledge graphs during inference, which allows for better integration of internal and external information. By addressing the limitations of traditional Retrieval-Augmented Generation methods, this approach enhances compositional reasoning and identifies factual inconsistencies effectively.

Potential Impact

The implementation of inference-time knowledge graph construction could significantly transform the way LLMs generate responses, leading to more reliable and accurate outputs in applications such as automated question answering, content generation, and decision support systems. This advancement may also encourage further research into the integration of structured knowledge representations in AI, promoting a shift toward more interpretable and trustworthy AI systems.

back to list