Solutions

Planners in SymbolicPlanners.jl return Solutions, which represent (potentially partial) descriptions of how a problem should be solved.

SymbolicPlanners.SolutionType
abstract type Solution

Abstract type for solutions to planning problems. Minimally, a Solution should define what action should be taken at a particular step t, or at a particular state, by implementing get_action.

source

In the case that a problem is unsolved or unsolvable, planners may return a NullSolution:

SymbolicPlanners.NullSolutionType
NullSolution([status])

Null solution that indicates the problem was unsolved. The status field can be used to denote why the problem was unsolved. Defaults to :failure.

source

Ordered Solutions

One class of solutions returned by planners are [OrderedSolution]s(@ref), which define an ordered sequence of actions (i.e. a plan) that must be taken to reach a goal.

SymbolicPlanners.OrderedSolutionType
abstract type OrderedSolution <: Solution

Abstract type for ordered planning solutions. OrderedSolutions should satisfy the iteration interface. Calling get_action(sol, t::Int) on an ordered solution should return the intended action for step t.

source

Path Search Solutions

A particular type of OrderedSolution is returned by search-based shortest-path planners (i.e. BreadthFirstPlanner, ForwardPlanner, and BackwardPlanner). These PathSearchSolutions may store information about the search process in addition to the discovered plan, allowing such information to be used by future searches (e.g. through calls to refine!).

SymbolicPlanners.PathSearchSolutionType
PathSearchSolution(status, plan)
PathSearchSolution(status, plan, trajectory)
PathSearchSolution(status, plan, trajectory, expanded,
                   search_tree, search_frontier, search_order)

Solution type for search-based planners that produce fully ordered plans.

Fields

  • status

  • plan

  • trajectory

  • expanded

  • search_tree

  • search_frontier

  • search_order

source

Bidirectional search-based planners also have a corresponding solution type:

SymbolicPlanners.BiPathSearchSolutionType
BiPathSearchSolution(status, plan)
BiPathSearchSolution(status, plan, trajectory)
BiPathSearchSolution(status, plan, trajectory, expanded,
                     f_search_tree, f_frontier, f_expanded, f_trajectory,
                     b_search_tree, b_frontier, b_expanded, b_trajectory)

Solution type for bidirectional search-based planners.

Fields

  • status: Status of the returned solution.

  • plan: Sequence of actions that reach the goal. May be partial / incomplete.

  • trajectory: Trajectory of states that will be traversed while following the plan.

  • expanded: Number of nodes expanded during search.

  • f_search_tree: Forward search tree.

  • f_frontier: Forward search frontier.

  • f_expanded: Number of nodes expanded via forward search.

  • f_trajectory: Trajectory of states returned by forward search.

  • b_search_tree: Backward search tree.

  • b_frontier: Backward search frontier.

  • b_expanded: Number of nodes expanded via backward search.

  • b_trajectory: Trajectory of states returned by backward search.

source

Nodes in a search tree have the following type:

SymbolicPlanners.PathNodeType
PathNode(id::UInt, state::State, path_cost::Float32,
         parent = nothing, child = nothing)

Representation of search node with optional parent and child pointers, used by search-based planners. One or more parents or children may be stored as a linked list using the LinkedNodeRef data type.

source

Policy Solutions

Another important class of solutions are PolicySolutions, which specify the action to be taken given a particular state. This is especially useful when the true environment is stochastic, such that agents may end up in a state that is different than expected, or when it is desirable to reuse solutions from one initial state in a different initial state.

SymbolicPlanners.PolicySolutionType
abstract type PolicySolution <: Solution

Abstract type for policy solutions. Minimally, PolicySolutions should implement the get_action(sol, state::State) method, defining the (potentially random) action to be taken at a particular state.

source

The following methods constitute the interface for PolicySolutions:

SymbolicPlanners.rand_actionFunction
rand_action(sol, state)

Samples an action according to the policy for the given state. If no actions are available, return missing.

source
SymbolicPlanners.get_action_probsFunction
get_action_probs(sol, state)

Return a dictionary of action probabilities for the given state. If no actions are available, return an empty dictionary.

source
SymbolicPlanners.get_action_probFunction
get_action_prob(sol, state, action)

Return the probability of taking an action at the given state. If the action is not available, return zero.

source

Some policies also store the value function (or equivalently, the negative cost-to-go) associated with states and actions:

SymbolicPlanners.get_valueFunction
get_value(sol, state)
get_value(sol, state, action)

Return value (i.e. expected future reward) of the given state (and action).

source
SymbolicPlanners.has_cached_valueFunction
has_cached_value(sol, state)
has_cached_value(sol, state, action)

Returns true if the value of state (and action) is cached in the state value or action value table of sol.

source

A NullPolicy can be used as a default when no information is known.

Deterministic Policies

SymbolicPlanners.jl provides the following deterministic policies, i.e., policies that always return the (estimated) best action for a given state:

SymbolicPlanners.TabularPolicyType
TabularPolicy(V::Dict, Q::Dict, default)
TabularPolicy(default = NullPolicy())

Policy solution where state values and action Q-values are stored in lookup tables V and Q, where V maps state hashes to values, and Q maps state hashes to dictionaries of Q-values for each action in the corresponding state.

A default policy can be specified, so that if a state doesn't already exist in the lookup tables, the value returned by default will be used instead.

source
SymbolicPlanners.TabularVPolicyType
TabularVPolicy(V::Dict, domain, spec, default)
TabularVPolicy(domain, spec, default = NullPolicy())

Policy solution where state values are stored in a lookup table V that maps state hashes to values. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values in each state.

A default policy can be specified, so that if a state doesn't already exist in the lookup table, the value returned by default will be used instead.

source
SymbolicPlanners.FunctionalVPolicyType
FunctionalVPolicy(evaluator, domain, spec)

Policy solution where state values are defined by an evaluator, a one-argument function that outputs a value estimate for each state. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values for each state.

source
SymbolicPlanners.ReusableTreePolicyType
ReusableTreePolicy(
    value_policy::PolicySolution,
    search_sol::PathSearchSolution,
    [goal_tree::Dict{UInt, PathNode}]
)

The policy returned by RealTimeHeuristicSearch, which stores a value table in the nested value_policy, a forward search tree in search_sol, and (when reuse_paths is true) a reusable goal_tree of cost-optimal paths to the goal.

When taking actions at states along some stored cost-optimal path, actions along that path are followed, as in Tree-Adaptive A* [1]. Otherwise, the highest value action according to value_policy is returned, with ties broken by the (possibly incomplete) plan in search_sol.

[1] C. Hernández, X. Sun, S. Koenig, and P. Meseguer, "Tree Adaptive A*," AAMAS (2011), pp. 123–130. https://dl.acm.org/doi/abs/10.5555/2030470.2030488.

source

Stochastic Policies

SymbolicPlanners.jl also provides stochastic policies, some of which are intended for use as wrappers around deterministic policies:

SymbolicPlanners.RandomPolicyType
RandomPolicy(domain, [rng::AbstractRNG])

Policy that selects available actions uniformly at random. The domain has to be provided to determine the actions available in each state.

source
SymbolicPlanners.EpsilonGreedyPolicyType
EpsilonGreedyPolicy(domain, policy, epsilon, [rng::AbstractRNG])

Policy that acts uniformly at random with epsilon chance, but otherwise selects the best action(s) according the underlying policy. If there is more than one best action, tie-breaking occurs randomly. The domain has to be provided to determine the actions available in each state.

source
SymbolicPlanners.BoltzmannPolicyType
BoltzmannPolicy(policy, temperature, [rng::AbstractRNG])

Policy that samples actions according to a Boltzmann distribution with the specified temperature. The unnormalized log probability of taking an action $a$ in state $s$ corresponds to its Q-value $Q(s, a)$ divided by the temperature $T$:

\[P(a|s) \propto \exp(Q(s, a) / T)\]

Higher temperatures lead to an increasingly random policy, whereas a temperature of zero corresponds to a deterministic policy. Q-values are computed according to the underlying policy provided as an argument to the constructor.

Note that wrapping an existing policy in a BoltzmannPolicy does not ensure consistency of the state values $V$ and Q-values $Q$ according to the Bellman equation, since this would require repeated Bellman updates to ensure convergence.

source

Mixture Policies

A subclass of stochastic policies are mixture policies, which randomly select one of their underlying policies to sample an action from:

SymbolicPlanners.MixturePolicyType
MixturePolicy(policies, [weights, rng::AbstractRNG])

A mixture of underlying policies with associated weights. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed.

source
SymbolicPlanners.EpsilonMixturePolicyType
EpsilonMixturePolicy(domain, policy, epsilons, [weights, rng::AbstractRNG])

A mixture of epsilon-greedy policies with different epsilons and mixture weights, specified as Vectors. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed. The domain is required to determine the actions available in each state.

source
SymbolicPlanners.BoltzmannMixturePolicyType
BoltzmannMixturePolicy(policy, temperatures, [weights, rng::AbstractRNG])

A mixture of Boltzmann policies with different temperatures and mixture weights, specified as Vectors. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed. Q-values are computed according to the underlying policy provided as an argument to the constructor.

source

Mixture policies are associated with a set of mixture weights. These can be accessed with get_mixture_weights:

SymbolicPlanners.get_mixture_weightsFunction
get_mixture_weights(sol)

Returns the mixture weights for a mixture policy.

get_mixture_weights(sol, state, action)

Returns the posterior mixture weights for a mixture policy after an action has been taken at state.

source

Combined Solutions

It is possible to combine multiple solutions into a single solution using a MultiSolution:

SymbolicPlanners.MultiSolutionType
MultiSolution(solutions::Solution...)
MultiSolution(solutions::Tuple, [selector])

A combination of multiple Solutions, which are selected between according to a selector function (solutions, [state]) -> sol that returns the solution to use (which may depend on the current state). The selector default to always returning the first solution.

source