Solutions
Planners in SymbolicPlanners.jl return Solution
s, which represent (potentially partial) descriptions of how a problem should be solved.
SymbolicPlanners.Solution
— Typeabstract type Solution
Abstract type for solutions to planning problems. Minimally, a Solution
should define what action should be taken at a particular step t
, or at a particular state
, by implementing get_action
.
SymbolicPlanners.get_action
— Methodget_action(sol, t, state)
Return an action for step t
at state
.
In the case that a problem is unsolved or unsolvable, planners may return a NullSolution
:
SymbolicPlanners.NullSolution
— TypeNullSolution([status])
Null solution that indicates the problem was unsolved. The status
field can be used to denote why the problem was unsolved. Defaults to :failure
.
Ordered Solutions
One class of solutions returned by planners are [OrderedSolution
]s(@ref), which define an ordered sequence of actions (i.e. a plan) that must be taken to reach a goal.
SymbolicPlanners.OrderedSolution
— Typeabstract type OrderedSolution <: Solution
Abstract type for ordered planning solutions. OrderedSolution
s should satisfy the iteration interface. Calling get_action(sol, t::Int)
on an ordered solution should return the intended action for step t
.
SymbolicPlanners.get_action
— Methodget_action(sol, t)
Return action for step t
.
SymbolicPlanners.OrderedPlan
— TypeOrderedPlan(plan::AbstractVector{<:Term})
Generic solution type for fully ordered plans.
Path Search Solutions
A particular type of OrderedSolution
is returned by search-based shortest-path planners (i.e. BreadthFirstPlanner
, ForwardPlanner
, and BackwardPlanner
). These PathSearchSolution
s may store information about the search process in addition to the discovered plan, allowing such information to be used by future searches (e.g. through calls to refine!
).
SymbolicPlanners.PathSearchSolution
— TypePathSearchSolution(status, plan)
PathSearchSolution(status, plan, trajectory)
PathSearchSolution(status, plan, trajectory, expanded,
search_tree, search_frontier, search_order)
Solution type for search-based planners that produce fully ordered plans.
Fields
status
plan
trajectory
expanded
search_tree
search_frontier
search_order
Bidirectional search-based planners also have a corresponding solution type:
SymbolicPlanners.BiPathSearchSolution
— TypeBiPathSearchSolution(status, plan)
BiPathSearchSolution(status, plan, trajectory)
BiPathSearchSolution(status, plan, trajectory, expanded,
f_search_tree, f_frontier, f_expanded, f_trajectory,
b_search_tree, b_frontier, b_expanded, b_trajectory)
Solution type for bidirectional search-based planners.
Fields
status
: Status of the returned solution.plan
: Sequence of actions that reach the goal. May be partial / incomplete.trajectory
: Trajectory of states that will be traversed while following the plan.expanded
: Number of nodes expanded during search.f_search_tree
: Forward search tree.f_frontier
: Forward search frontier.f_expanded
: Number of nodes expanded via forward search.f_trajectory
: Trajectory of states returned by forward search.b_search_tree
: Backward search tree.b_frontier
: Backward search frontier.b_expanded
: Number of nodes expanded via backward search.b_trajectory
: Trajectory of states returned by backward search.
Nodes in a search tree have the following type:
SymbolicPlanners.PathNode
— TypePathNode(id::UInt, state::State, path_cost::Float32,
parent = nothing, child = nothing)
Representation of search node with optional parent and child pointers, used by search-based planners. One or more parents or children may be stored as a linked list using the LinkedNodeRef
data type.
Policy Solutions
Another important class of solutions are PolicySolution
s, which specify the action to be taken given a particular state. This is especially useful when the true environment is stochastic, such that agents may end up in a state that is different than expected, or when it is desirable to reuse solutions from one initial state in a different initial state.
SymbolicPlanners.PolicySolution
— Typeabstract type PolicySolution <: Solution
Abstract type for policy solutions. Minimally, PolicySolution
s should implement the get_action(sol, state::State)
method, defining the (potentially random) action to be taken at a particular state
.
The following methods constitute the interface for PolicySolution
s:
SymbolicPlanners.get_action
— Methodget_action(sol, state)
Return action for the given state. If no actions are available, return missing
.
SymbolicPlanners.best_action
— Functionbest_action(sol, state)
Returns the best action for the given state. If no actions are available, return missing
.
SymbolicPlanners.rand_action
— Functionrand_action(sol, state)
Samples an action according to the policy for the given state. If no actions are available, return missing
.
SymbolicPlanners.get_action_probs
— Functionget_action_probs(sol, state)
Return a dictionary of action probabilities for the given state
. If no actions are available, return an empty dictionary.
SymbolicPlanners.get_action_prob
— Functionget_action_prob(sol, state, action)
Return the probability of taking an action
at the given state
. If the action
is not available, return zero.
Some policies also store the value function (or equivalently, the negative cost-to-go) associated with states and actions:
SymbolicPlanners.get_value
— Functionget_value(sol, state)
get_value(sol, state, action)
Return value (i.e. expected future reward) of the given state
(and action
).
SymbolicPlanners.get_action_values
— Functionget_action_values(sol, state)
Return a dictionary of action Q-values for the given state
.
SymbolicPlanners.has_values
— Functionhas_values(sol)
Trait that denotes whether the solution stores a value function.
SymbolicPlanners.has_cached_value
— Functionhas_cached_value(sol, state)
has_cached_value(sol, state, action)
Returns true if the value of state
(and action
) is cached in the state value or action value table of sol
.
SymbolicPlanners.has_cached_action_values
— Functionhas_cached_action_values(sol, state)
Returns true if all actions that are available at state
have cached values associated with them.
A NullPolicy
can be used as a default when no information is known.
SymbolicPlanners.NullPolicy
— TypeNullPolicy()
Null policy which returns missing
for calls to get_action
, etc.
Deterministic Policies
SymbolicPlanners.jl provides the following deterministic policies, i.e., policies that always return the (estimated) best action for a given state:
SymbolicPlanners.TabularPolicy
— TypeTabularPolicy(V::Dict, Q::Dict, default)
TabularPolicy(default = NullPolicy())
Policy solution where state values and action Q-values are stored in lookup tables V
and Q
, where V
maps state hashes to values, and Q
maps state hashes to dictionaries of Q-values for each action in the corresponding state.
A default
policy can be specified, so that if a state doesn't already exist in the lookup tables, the value returned by default
will be used instead.
SymbolicPlanners.TabularVPolicy
— TypeTabularVPolicy(V::Dict, domain, spec, default)
TabularVPolicy(domain, spec, default = NullPolicy())
Policy solution where state values are stored in a lookup table V
that maps state hashes to values. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values in each state.
A default
policy can be specified, so that if a state doesn't already exist in the lookup table, the value returned by default
will be used instead.
SymbolicPlanners.FunctionalVPolicy
— TypeFunctionalVPolicy(evaluator, domain, spec)
Policy solution where state values are defined by an evaluator
, a one-argument function that outputs a value estimate for each state
. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values for each state.
SymbolicPlanners.HeuristicVPolicy
— TypeHeuristicVPolicy(heuristic:Heuristic, domain, spec)
Policy solution where state values are defined by the (negated) goal distance estimates computed by a Heuristic
for a domain
and goal spec
.
SymbolicPlanners.ReusableTreePolicy
— TypeReusableTreePolicy(
value_policy::PolicySolution,
search_sol::PathSearchSolution,
[goal_tree::Dict{UInt, PathNode}]
)
The policy returned by RealTimeHeuristicSearch
, which stores a value table in the nested value_policy
, a forward search tree in search_sol
, and (when reuse_paths
is true
) a reusable goal_tree
of cost-optimal paths to the goal.
When taking actions at states along some stored cost-optimal path, actions along that path are followed, as in Tree-Adaptive A* [1]. Otherwise, the highest value action according to value_policy
is returned, with ties broken by the (possibly incomplete) plan in search_sol
.
[1] C. Hernández, X. Sun, S. Koenig, and P. Meseguer, "Tree Adaptive A*," AAMAS (2011), pp. 123–130. https://dl.acm.org/doi/abs/10.5555/2030470.2030488.
Stochastic Policies
SymbolicPlanners.jl also provides stochastic policies, some of which are intended for use as wrappers around deterministic policies:
SymbolicPlanners.RandomPolicy
— TypeRandomPolicy(domain, [rng::AbstractRNG])
Policy that selects available actions uniformly at random. The domain
has to be provided to determine the actions available in each state.
SymbolicPlanners.EpsilonGreedyPolicy
— TypeEpsilonGreedyPolicy(domain, policy, epsilon, [rng::AbstractRNG])
Policy that acts uniformly at random with epsilon
chance, but otherwise selects the best action(s) according the underlying policy
. If there is more than one best action, tie-breaking occurs randomly. The domain
has to be provided to determine the actions available in each state.
SymbolicPlanners.BoltzmannPolicy
— TypeBoltzmannPolicy(policy, temperature, [rng::AbstractRNG])
Policy that samples actions according to a Boltzmann distribution with the specified temperature
. The unnormalized log probability of taking an action $a$ in state $s$ corresponds to its Q-value $Q(s, a)$ divided by the temperature $T$:
\[P(a|s) \propto \exp(Q(s, a) / T)\]
Higher temperatures lead to an increasingly random policy, whereas a temperature of zero corresponds to a deterministic policy. Q-values are computed according to the underlying policy
provided as an argument to the constructor.
Note that wrapping an existing policy in a BoltzmannPolicy
does not ensure consistency of the state values $V$ and Q-values $Q$ according to the Bellman equation, since this would require repeated Bellman updates to ensure convergence.
Mixture Policies
A subclass of stochastic policies are mixture policies, which randomly select one of their underlying policies to sample an action from:
SymbolicPlanners.MixturePolicy
— TypeMixturePolicy(policies, [weights, rng::AbstractRNG])
A mixture of underlying policies
with associated weights
. If provided, weights
must be non-negative and sum to one. Otherwise a uniform mixture is assumed.
SymbolicPlanners.EpsilonMixturePolicy
— TypeEpsilonMixturePolicy(domain, policy, epsilons, [weights, rng::AbstractRNG])
A mixture of epsilon-greedy policies with different epsilons
and mixture weights
, specified as Vector
s. If provided, weights
must be non-negative and sum to one. Otherwise a uniform mixture is assumed. The domain
is required to determine the actions available in each state.
SymbolicPlanners.BoltzmannMixturePolicy
— TypeBoltzmannMixturePolicy(policy, temperatures, [weights, rng::AbstractRNG])
A mixture of Boltzmann policies with different temperatures
and mixture weights
, specified as Vector
s. If provided, weights
must be non-negative and sum to one. Otherwise a uniform mixture is assumed. Q-values are computed according to the underlying policy
provided as an argument to the constructor.
Mixture policies are associated with a set of mixture weights. These can be accessed with get_mixture_weights
:
SymbolicPlanners.get_mixture_weights
— Functionget_mixture_weights(sol)
Returns the mixture weights for a mixture policy.
get_mixture_weights(sol, state, action)
Returns the posterior mixture weights for a mixture policy after an action
has been taken at state
.
Combined Solutions
It is possible to combine multiple solutions into a single solution using a MultiSolution
:
SymbolicPlanners.MultiSolution
— TypeMultiSolution(solutions::Solution...)
MultiSolution(solutions::Tuple, [selector])
A combination of multiple Solution
s, which are selected between according to a selector
function (solutions, [state]) -> sol
that returns the solution to use (which may depend on the current state
). The selector
default to always returning the first solution.