Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

2024-07-01

The paper introduces Q*, a framework designed to enhance the multi-step reasoning capabilities of Large Language Models (LLMs) by employing deliberative planning. Q* uses a plug-and-play Q-value model as a heuristic to guide LLMs through their decoding process, preventing errors without needing fine-tuning for each task. Extensive experiments demonstrate that Q* provides superior performance on various datasets, making the LLMs more reliable and efficient.

AI MachineLearning NLP LLMs DeliberativePlanning

Visit Original Article →

Was this useful?