arXiv AI Publications - 2025 Week 33

Publications de la semaine #33 - 2025

Here are the top 5 most relevant AI papers from arXiv week 33/2025, complete with analysis and insights.

Publications at a Glance

UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games Timo Bertram | 8/11/2025

From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework Yunkai Hu, Tianqiao Zhao, Meng Yue | 8/11/2025

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making Yang Cheng, Zilai Wang, Weiyu Ma, Wenhui Zhu, Yue Deng, Jian Zhao | 8/13/2025

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu | 8/14/2025

The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference Maël Jullien, Marco Valentino, André Freitas | 8/14/2025

UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games

Published

8/11/2025

arXiv ID

[2508.08382v1]

Authors

Timo Bertram

Key Insights

The research introduces UrzaGPT, a novel approach that utilizes Low-Rank Adaptation to fine-tune large language models for real-time drafting decisions in collectible card games, specifically Magic: The Gathering. This work demonstrates that large language models can effectively perform drafting tasks, achieving notable improvements in accuracy compared to untuned models and offering a promising alternative to domain-specific AI solutions.

Potential Impact

UrzaGPT's ability to adapt to different game expansions and improve drafting performance could revolutionize AI applications in collectible card games, making them more competitive with human players. Furthermore, this approach may influence the development of AI in other complex, dynamic environments where adaptability and decision-making are crucial, paving the way for broader applications in strategic gaming and beyond.

back to list

From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework

Published

8/11/2025

arXiv ID

[2508.08147v1]

Authors

Yunkai Hu, Tianqiao Zhao, Meng Yue

Key Insights

This research presents an innovative framework that utilizes Large Language Models (LLMs) to convert natural language descriptions of power system optimization problems into solver-ready formulations, which enhances both feasibility and solution quality. By integrating systematic validation and iterative repair, the approach outperforms traditional LLM applications that may generate infeasible or suboptimal results.

Potential Impact

This framework could revolutionize the way power system optimization problems are approached by making them more accessible to non-experts, thus broadening participation in energy decision-making. Additionally, it may significantly improve the efficiency and reliability of optimization processes, leading to better resource allocation and operational efficiency in energy systems.

back to list

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

Published

8/13/2025

arXiv ID

[2508.09586v1]

Authors

Yang Cheng, Zilai Wang, Weiyu Ma, Wenhui Zhu, Yue Deng, Jian Zhao

Key Insights

The research introduces EvoCurr, a self-evolving curriculum framework that optimally tailors problem instances to the learning progress of Large Language Models (LLMs), significantly enhancing their performance on complex decision-making tasks. By dynamically adjusting the difficulty of challenges, the framework fosters a more effective learning trajectory for LLMs, addressing the shortcomings of direct problem-solving approaches.

Potential Impact

EvoCurr has the potential to revolutionize the application of LLMs in high-complexity domains by enabling them to handle intricate decision-making scenarios more efficiently and accurately. This innovative curriculum learning approach could lead to advancements in automated reasoning, ultimately enhancing the capabilities of AI systems in real-world applications such as autonomous systems, robotics, and complex planning tasks.

back to list

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles

Published

8/14/2025

arXiv ID

[2508.10358v1]

Authors

Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu

Key Insights

This research introduces TurtleSoup-Bench, a novel interactive benchmark designed to assess the imaginative reasoning capabilities of Large Language Models (LLMs) through dynamic and exploratory puzzles. It also presents Mosaic-Agent, an innovative assessment tool that reveals significant performance gaps between LLMs and human reasoning.

Potential Impact

By providing a comprehensive framework for evaluating imaginative reasoning in LLMs, this work could lead to improved models that better mimic human-like exploratory behavior, enhancing applications in creative problem-solving and interactive AI systems. Additionally, it sets a new standard for future research in the field, potentially influencing how LLMs are trained and assessed in various domains.

back to list

The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference

Published

8/14/2025

arXiv ID

[2508.10777v1]

Authors

Maël Jullien, Marco Valentino, André Freitas

Key Insights

This research introduces a novel Clinical Trial Natural Language Inference benchmark that effectively distinguishes between factual knowledge and reasoning capabilities in large language models (LLMs). The findings highlight that while LLMs may possess substantial clinical knowledge, they struggle with complex reasoning tasks, revealing critical structural limitations in their internal representations.

Potential Impact

The explicit dissociation of knowledge and reasoning could reshape how LLMs are evaluated and utilized in high-stakes fields like healthcare, emphasizing the need for improved model architectures that integrate structured reasoning. This research may drive the development of more reliable AI systems for clinical decision-making, ultimately enhancing patient outcomes and safety.

back to list