Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
2024-07-01
![]()
The paper introduces Q*, a framework designed to enhance the multi-step reasoning capabilities of Large Language Models (LLMs) by employing deliberative planning. Q* uses a plug-and-play Q-value model as a heuristic to guide LLMs through their decoding process, preventing errors without needing fine-tuning for each task. Extensive experiments demonstrate that Q* provides superior performance on various datasets, making the LLMs more reliable and efficient.
Was this useful?