Solutions
Planners in SymbolicPlanners.jl return Solutions, which represent (potentially partial) descriptions of how a problem should be solved.
SymbolicPlanners.Solution — Typeabstract type SolutionAbstract type for solutions to planning problems. Minimally, a Solution should define what action should be taken at a particular step t, or at a particular state, by implementing get_action.
SymbolicPlanners.get_action — Methodget_action(sol, t, state)
Return an action for step t at state.
In the case that a problem is unsolved or unsolvable, planners may return a NullSolution:
SymbolicPlanners.NullSolution — TypeNullSolution([status])Null solution that indicates the problem was unsolved. The status field can be used to denote why the problem was unsolved. Defaults to :failure.
Ordered Solutions
One class of solutions returned by planners are [OrderedSolution]s(@ref), which define an ordered sequence of actions (i.e. a plan) that must be taken to reach a goal.
SymbolicPlanners.OrderedSolution — Typeabstract type OrderedSolution <: SolutionAbstract type for ordered planning solutions. OrderedSolutions should satisfy the iteration interface. Calling get_action(sol, t::Int) on an ordered solution should return the intended action for step t.
SymbolicPlanners.get_action — Methodget_action(sol, t)
Return action for step t.
SymbolicPlanners.OrderedPlan — TypeOrderedPlan(plan::AbstractVector{<:Term})Generic solution type for fully ordered plans.
Path Search Solutions
A particular type of OrderedSolution is returned by search-based shortest-path planners (i.e. BreadthFirstPlanner, ForwardPlanner, and BackwardPlanner). These PathSearchSolutions may store information about the search process in addition to the discovered plan, allowing such information to be used by future searches (e.g. through calls to refine!).
SymbolicPlanners.PathSearchSolution — TypePathSearchSolution(status, plan)
PathSearchSolution(status, plan, trajectory)
PathSearchSolution(status, plan, trajectory, expanded,
search_tree, search_frontier, search_order)Solution type for search-based planners that produce fully ordered plans.
Fields
statusplantrajectoryexpandedsearch_treesearch_frontiersearch_order
Bidirectional search-based planners also have a corresponding solution type:
SymbolicPlanners.BiPathSearchSolution — TypeBiPathSearchSolution(status, plan)
BiPathSearchSolution(status, plan, trajectory)
BiPathSearchSolution(status, plan, trajectory, expanded,
f_search_tree, f_frontier, f_expanded, f_trajectory,
b_search_tree, b_frontier, b_expanded, b_trajectory)Solution type for bidirectional search-based planners.
Fields
status: Status of the returned solution.plan: Sequence of actions that reach the goal. May be partial / incomplete.trajectory: Trajectory of states that will be traversed while following the plan.expanded: Number of nodes expanded during search.f_search_tree: Forward search tree.f_frontier: Forward search frontier.f_expanded: Number of nodes expanded via forward search.f_trajectory: Trajectory of states returned by forward search.b_search_tree: Backward search tree.b_frontier: Backward search frontier.b_expanded: Number of nodes expanded via backward search.b_trajectory: Trajectory of states returned by backward search.
Nodes in a search tree have the following type:
SymbolicPlanners.PathNode — TypePathNode(id::UInt, state::State, path_cost::Float32,
parent = nothing, child = nothing)Representation of search node with optional parent and child pointers, used by search-based planners. One or more parents or children may be stored as a linked list using the LinkedNodeRef data type.
Policy Solutions
Another important class of solutions are PolicySolutions, which specify the action to be taken given a particular state. This is especially useful when the true environment is stochastic, such that agents may end up in a state that is different than expected, or when it is desirable to reuse solutions from one initial state in a different initial state.
SymbolicPlanners.PolicySolution — Typeabstract type PolicySolution <: SolutionAbstract type for policy solutions. Minimally, PolicySolutions should implement the get_action(sol, state::State) method, defining the (potentially random) action to be taken at a particular state.
The following methods constitute the interface for PolicySolutions:
SymbolicPlanners.get_action — Methodget_action(sol, state)
Return action for the given state. If no actions are available, return missing.
SymbolicPlanners.best_action — Functionbest_action(sol, state)
Returns the best action for the given state. If no actions are available, return missing.
SymbolicPlanners.rand_action — Functionrand_action(sol, state)
Samples an action according to the policy for the given state. If no actions are available, return missing.
SymbolicPlanners.get_action_probs — Functionget_action_probs(sol, state)
Return a dictionary of action probabilities for the given state. If no actions are available, return an empty dictionary.
SymbolicPlanners.get_action_prob — Functionget_action_prob(sol, state, action)
Return the probability of taking an action at the given state. If the action is not available, return zero.
Some policies also store the value function (or equivalently, the negative cost-to-go) associated with states and actions:
SymbolicPlanners.get_value — Functionget_value(sol, state)
get_value(sol, state, action)Return value (i.e. expected future reward) of the given state (and action).
SymbolicPlanners.get_action_values — Functionget_action_values(sol, state)
Return a dictionary of action Q-values for the given state.
SymbolicPlanners.has_values — Functionhas_values(sol)
Trait that denotes whether the solution stores a value function.
SymbolicPlanners.has_cached_value — Functionhas_cached_value(sol, state)
has_cached_value(sol, state, action)Returns true if the value of state (and action) is cached in the state value or action value table of sol.
SymbolicPlanners.has_cached_action_values — Functionhas_cached_action_values(sol, state)Returns true if all actions that are available at state have cached values associated with them.
A NullPolicy can be used as a default when no information is known.
SymbolicPlanners.NullPolicy — TypeNullPolicy()Null policy which returns missing for calls to get_action, etc.
Deterministic Policies
SymbolicPlanners.jl provides the following deterministic policies, i.e., policies that always return the (estimated) best action for a given state:
SymbolicPlanners.TabularPolicy — TypeTabularPolicy(V::Dict, Q::Dict, default)
TabularPolicy(default = NullPolicy())Policy solution where state values and action Q-values are stored in lookup tables V and Q, where V maps state hashes to values, and Q maps state hashes to dictionaries of Q-values for each action in the corresponding state.
A default policy can be specified, so that if a state doesn't already exist in the lookup tables, the value returned by default will be used instead.
SymbolicPlanners.TabularVPolicy — TypeTabularVPolicy(V::Dict, domain, spec, default)
TabularVPolicy(domain, spec, default = NullPolicy())Policy solution where state values are stored in a lookup table V that maps state hashes to values. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values in each state.
A default policy can be specified, so that if a state doesn't already exist in the lookup table, the value returned by default will be used instead.
SymbolicPlanners.FunctionalVPolicy — TypeFunctionalVPolicy(evaluator, domain, spec)Policy solution where state values are defined by an evaluator, a one-argument function that outputs a value estimate for each state. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values for each state.
SymbolicPlanners.HeuristicVPolicy — TypeHeuristicVPolicy(heuristic:Heuristic, domain, spec)Policy solution where state values are defined by the (negated) goal distance estimates computed by a Heuristic for a domain and goal spec.
SymbolicPlanners.ReusableTreePolicy — TypeReusableTreePolicy(
value_policy::PolicySolution,
search_sol::PathSearchSolution,
[goal_tree::Dict{UInt, PathNode}]
)The policy returned by RealTimeHeuristicSearch, which stores a value table in the nested value_policy, a forward search tree in search_sol, and (when reuse_paths is true) a reusable goal_tree of cost-optimal paths to the goal.
When taking actions at states along some stored cost-optimal path, actions along that path are followed, as in Tree-Adaptive A* [1]. Otherwise, the highest value action according to value_policy is returned, with ties broken by the (possibly incomplete) plan in search_sol.
[1] C. Hernández, X. Sun, S. Koenig, and P. Meseguer, "Tree Adaptive A*," AAMAS (2011), pp. 123–130. https://dl.acm.org/doi/abs/10.5555/2030470.2030488.
Stochastic Policies
SymbolicPlanners.jl also provides stochastic policies, some of which are intended for use as wrappers around deterministic policies:
SymbolicPlanners.RandomPolicy — TypeRandomPolicy(domain, [rng::AbstractRNG])Policy that selects available actions uniformly at random. The domain has to be provided to determine the actions available in each state.
SymbolicPlanners.EpsilonGreedyPolicy — TypeEpsilonGreedyPolicy(domain, policy, epsilon, [rng::AbstractRNG])Policy that acts uniformly at random with epsilon chance, but otherwise selects the best action(s) according the underlying policy. If there is more than one best action, tie-breaking occurs randomly. The domain has to be provided to determine the actions available in each state.
SymbolicPlanners.BoltzmannPolicy — TypeBoltzmannPolicy(policy, temperature, [epsilon, rng::AbstractRNG])Policy that samples actions according to a Boltzmann distribution with the specified temperature. The unnormalized log probability of taking an action $a$ in state $s$ corresponds to its Q-value $Q(s, a)$ divided by the temperature $T$:
\[P(a|s) \propto \exp(Q(s, a) / T)\]
Higher temperatures lead to an increasingly random policy, whereas a temperature of zero corresponds to a deterministic policy. Q-values are computed according to the underlying policy provided as an argument to the constructor.
To prevent zero-probability actions, an epsilon chance of taking an action uniformly at random can be specified.
Note that wrapping an existing policy in a BoltzmannPolicy does not ensure consistency of the state values $V$ and Q-values $Q$ according to the Bellman equation, since this would require repeated Bellman updates to ensure convergence.
Mixture Policies
A subclass of stochastic policies are mixture policies, which randomly select one of their underlying policies to sample an action from:
SymbolicPlanners.MixturePolicy — TypeMixturePolicy(policies, [weights, rng::AbstractRNG])A mixture of underlying policies with associated weights. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed.
SymbolicPlanners.EpsilonMixturePolicy — TypeEpsilonMixturePolicy(domain, policy, epsilons, [weights, rng::AbstractRNG])A mixture of epsilon-greedy policies with different epsilons and mixture weights, specified as Vectors. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed. The domain is required to determine the actions available in each state.
SymbolicPlanners.BoltzmannMixturePolicy — TypeBoltzmannMixturePolicy(policy, temperatures, [weights,]
[epsilon, rng::AbstractRNG])A mixture of Boltzmann policies with different temperatures and mixture weights, specified as Vectors. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed. Q-values are computed according to the underlying policy provided as an argument to the constructor.
Similar to the BoltzmannPolicy, epsilon can be specified to add a an epsilon chance of taking an action uniformly at random.
Mixture policies are associated with a set of mixture weights. These can be accessed with get_mixture_weights:
SymbolicPlanners.get_mixture_weights — Functionget_mixture_weights(sol)Returns the mixture weights for a mixture policy.
get_mixture_weights(sol, state, action; normalize = true)Returns the posterior mixture weights for a mixture policy after an action has been taken at state. Posterior weights are normalized by default, but this can be disabled by setting normalize = false.
Combined Solutions
It is possible to combine multiple solutions into a single solution using a MultiSolution:
SymbolicPlanners.MultiSolution — TypeMultiSolution(solutions::Solution...)
MultiSolution(solutions::Tuple, [selector])A combination of multiple Solutions, which are selected between according to a selector function (solutions, [state]) -> sol that returns the solution to use (which may depend on the current state). The selector default to always returning the first solution.