Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs

Abstract

Task planning under uncertainty is essential for home-service robots operating in the real world. Tasks involve ambiguous human instructions, hidden or unknown object locations, and open-vocabulary object types, leading to significant open-ended uncertainty and a boundlessly large planning space. To address these challenges, we propose Tru-POMDP, a planner that combines structured belief generation using Large Language Models (LLMs) with principled POMDP planning. Tru-POMDP introduces a hierarchical Tree of Hypotheses (TOH), which systematically queries an LLM to construct high-quality particle beliefs over possible world states and human goals. We further formulate an open-ended POMDP model that enables rigorous Bayesian belief tracking and efficient belief-space planning over these LLM-generated hypotheses. Experiments on complex object rearrangement tasks across diverse kitchen environments show that \algname significantly outperforms state-of-the-art LLM-based and LLM-tree-search hybrid planners, achieving higher success rates with significantly better plans, stronger robustness to ambiguity and occlusion, and greater planning efficiency.

we propose Tru-POMDP, a new algorithm for task planning under uncertainty that tightly integrates commonsense reasoning by LLMs with explicit belief tracking and principled POMDP planning. Our main contributions include:

The first framework to integrate LLM-based reasoning with principled POMDP planning for household tasks.
A novel hybrid belief modeling approach that integrates LLM-generated hypotheses with principled Bayesian filtering.
A POMDP model for open-ended object rearrangement tasks and a practical belief-tree search planner for solving such tasks efficiently under large-scale uncertainty.

Experiment

We compare Tru-POMDP against a set of strong baselines, including both pure LLM-based planners and tree search planners integrated with LLM reasoning.

ReAct: : A closed-loop LLM-based planner that selects actions based on current observations and immediate feedback from the environment.
Reflexion: An extension of ReAct that adds a reflection module. Upon repeated failures, it analyzes the history and generates a revised plan. For fair comparison under online planning, we disable environment resets and trigger reflection after three consecutive failed actions.
ReAct* and Reflexion*: Prompt-augmented variants that provide the LLM with additional structured descriptions of the task domain, including object types, action semantics, and goal structures.
LLM-MCTS: A tree search method that generates a maximum-likelihood hypothesis of the task goal and hidden object placements using an LLM. It then performs Monte Carlo Tree Search (MCTS), repeatedly querying the LLM to guide action selection during simulation.

Tru-POMDP significantly outperforms all baselines, demonstrating its capability for planning under uncertainty.

Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs

Abstract

Video Presentation

Experiment