Solutions

Planners in SymbolicPlanners.jl return Solutions, which represent (potentially partial) descriptions of how a problem should be solved.

SymbolicPlanners.Solution — Type

abstract type Solution

Abstract type for solutions to planning problems. Minimally, a Solution should define what action should be taken at a particular step t, or at a particular state, by implementing get_action.

source

SymbolicPlanners.get_action — Method

get_action(sol, t, state)

Return an action for step t at state.

source

In the case that a problem is unsolved or unsolvable, planners may return a NullSolution:

SymbolicPlanners.NullSolution — Type

NullSolution([status])

Null solution that indicates the problem was unsolved. The status field can be used to denote why the problem was unsolved. Defaults to :failure.

source

Ordered Solutions

One class of solutions returned by planners are [OrderedSolution]s(@ref), which define an ordered sequence of actions (i.e. a plan) that must be taken to reach a goal.

SymbolicPlanners.OrderedSolution — Type

abstract type OrderedSolution <: Solution

Abstract type for ordered planning solutions. OrderedSolutions should satisfy the iteration interface. Calling get_action(sol, t::Int) on an ordered solution should return the intended action for step t.

source

SymbolicPlanners.get_action — Method

get_action(sol, t)

Return action for step t.

source

SymbolicPlanners.OrderedPlan — Type

OrderedPlan(plan::AbstractVector{<:Term})

Generic solution type for fully ordered plans.

source

Path Search Solutions

A particular type of OrderedSolution is returned by search-based shortest-path planners (i.e. BreadthFirstPlanner, ForwardPlanner, and BackwardPlanner). These PathSearchSolutions may store information about the search process in addition to the discovered plan, allowing such information to be used by future searches (e.g. through calls to refine!).

SymbolicPlanners.PathSearchSolution — Type

PathSearchSolution(status, plan)
PathSearchSolution(status, plan, trajectory)
PathSearchSolution(status, plan, trajectory, expanded,
                   search_tree, search_frontier, search_order)

Solution type for search-based planners that produce fully ordered plans.

Fields

status
plan
trajectory
expanded
search_tree
search_frontier
search_order

source

Bidirectional search-based planners also have a corresponding solution type:

SymbolicPlanners.BiPathSearchSolution — Type

BiPathSearchSolution(status, plan)
BiPathSearchSolution(status, plan, trajectory)
BiPathSearchSolution(status, plan, trajectory, expanded,
                     f_search_tree, f_frontier, f_expanded, f_trajectory,
                     b_search_tree, b_frontier, b_expanded, b_trajectory)

Solution type for bidirectional search-based planners.

Fields

status: Status of the returned solution.
plan: Sequence of actions that reach the goal. May be partial / incomplete.
trajectory: Trajectory of states that will be traversed while following the plan.
expanded: Number of nodes expanded during search.
f_search_tree: Forward search tree.
f_frontier: Forward search frontier.
f_expanded: Number of nodes expanded via forward search.
f_trajectory: Trajectory of states returned by forward search.
b_search_tree: Backward search tree.
b_frontier: Backward search frontier.
b_expanded: Number of nodes expanded via backward search.
b_trajectory: Trajectory of states returned by backward search.

source

Nodes in a search tree have the following type:

SymbolicPlanners.PathNode — Type

PathNode(id::UInt, state::State, path_cost::Float32,
         parent = nothing, child = nothing)

Representation of search node with optional parent and child pointers, used by search-based planners. One or more parents or children may be stored as a linked list using the LinkedNodeRef data type.

source

Policy Solutions

Another important class of solutions are PolicySolutions, which specify the action to be taken given a particular state. This is especially useful when the true environment is stochastic, such that agents may end up in a state that is different than expected, or when it is desirable to reuse solutions from one initial state in a different initial state.

SymbolicPlanners.PolicySolution — Type

abstract type PolicySolution <: Solution

Abstract type for policy solutions. Minimally, PolicySolutions should implement the get_action(sol, state::State) method, defining the (potentially random) action to be taken at a particular state.

source

The following methods constitute the interface for PolicySolutions:

SymbolicPlanners.get_action — Method

get_action(sol, state)

Return action for the given state. If no actions are available, return missing.

source

SymbolicPlanners.best_action — Function

best_action(sol, state)

Returns the best action for the given state. If no actions are available, return missing.

source

SymbolicPlanners.rand_action — Function

rand_action(sol, state)

Samples an action according to the policy for the given state. If no actions are available, return missing.

source

SymbolicPlanners.get_action_probs — Function

get_action_probs(sol, state)

Return a dictionary of action probabilities for the given state. If no actions are available, return an empty dictionary.

source

SymbolicPlanners.get_action_prob — Function

get_action_prob(sol, state, action)

Return the probability of taking an action at the given state. If the action is not available, return zero.

source

Some policies also store the value function (or equivalently, the negative cost-to-go) associated with states and actions:

SymbolicPlanners.get_value — Function

get_value(sol, state)
get_value(sol, state, action)

Return value (i.e. expected future reward) of the given state (and action).

source

SymbolicPlanners.get_action_values — Function

get_action_values(sol, state)

Return a dictionary of action Q-values for the given state.

source

SymbolicPlanners.has_values — Function

has_values(sol)

Trait that denotes whether the solution stores a value function.

source

SymbolicPlanners.has_cached_value — Function

has_cached_value(sol, state)
has_cached_value(sol, state, action)

Returns true if the value of state (and action) is cached in the state value or action value table of sol.

source

SymbolicPlanners.has_cached_action_values — Function

has_cached_action_values(sol, state)

Returns true if all actions that are available at state have cached values associated with them.

source

A NullPolicy can be used as a default when no information is known.

SymbolicPlanners.NullPolicy — Type

NullPolicy()

Null policy which returns missing for calls to get_action, etc.

source

Deterministic Policies

SymbolicPlanners.jl provides the following deterministic policies, i.e., policies that always return the (estimated) best action for a given state:

SymbolicPlanners.TabularPolicy — Type

TabularPolicy(V::Dict, Q::Dict, default)
TabularPolicy(default = NullPolicy())

Policy solution where state values and action Q-values are stored in lookup tables V and Q, where V maps state hashes to values, and Q maps state hashes to dictionaries of Q-values for each action in the corresponding state.

A default policy can be specified, so that if a state doesn't already exist in the lookup tables, the value returned by default will be used instead.

source

SymbolicPlanners.TabularVPolicy — Type

TabularVPolicy(V::Dict, domain, spec, default)
TabularVPolicy(domain, spec, default = NullPolicy())

Policy solution where state values are stored in a lookup table V that maps state hashes to values. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values in each state.

A default policy can be specified, so that if a state doesn't already exist in the lookup table, the value returned by default will be used instead.

source

SymbolicPlanners.FunctionalVPolicy — Type

FunctionalVPolicy(evaluator, domain, spec)

Policy solution where state values are defined by an evaluator, a one-argument function that outputs a value estimate for each state. The domain and specification also have to be provided, so that the policy knows how to derive action Q-values for each state.

source

SymbolicPlanners.HeuristicVPolicy — Type

HeuristicVPolicy(heuristic:Heuristic, domain, spec)

Policy solution where state values are defined by the (negated) goal distance estimates computed by a Heuristic for a domain and goal spec.

source

SymbolicPlanners.ReusableTreePolicy — Type

ReusableTreePolicy(
    value_policy::PolicySolution,
    search_sol::PathSearchSolution,
    [goal_tree::Dict{UInt, PathNode}]
)

The policy returned by RealTimeHeuristicSearch, which stores a value table in the nested value_policy, a forward search tree in search_sol, and (when reuse_paths is true) a reusable goal_tree of cost-optimal paths to the goal.

When taking actions at states along some stored cost-optimal path, actions along that path are followed, as in Tree-Adaptive A* [1]. Otherwise, the highest value action according to value_policy is returned, with ties broken by the (possibly incomplete) plan in search_sol.

[1] C. Hernández, X. Sun, S. Koenig, and P. Meseguer, "Tree Adaptive A*," AAMAS (2011), pp. 123–130. https://dl.acm.org/doi/abs/10.5555/2030470.2030488.

source

Stochastic Policies

SymbolicPlanners.jl also provides stochastic policies, some of which are intended for use as wrappers around deterministic policies:

SymbolicPlanners.RandomPolicy — Type

RandomPolicy(domain, [rng::AbstractRNG])

Policy that selects available actions uniformly at random. The domain has to be provided to determine the actions available in each state.

source

SymbolicPlanners.EpsilonGreedyPolicy — Type

EpsilonGreedyPolicy(domain, policy, epsilon, [rng::AbstractRNG])

Policy that acts uniformly at random with epsilon chance, but otherwise selects the best action(s) according the underlying policy. If there is more than one best action, tie-breaking occurs randomly. The domain has to be provided to determine the actions available in each state.

source

SymbolicPlanners.BoltzmannPolicy — Type

BoltzmannPolicy(policy, temperature, [epsilon, rng::AbstractRNG])

Policy that samples actions according to a Boltzmann distribution with the specified temperature. The unnormalized log probability of taking an action $a$ in state $s$ corresponds to its Q-value $Q(s, a)$ divided by the temperature $T$:

\[P(a|s) \propto \exp(Q(s, a) / T)\]

Higher temperatures lead to an increasingly random policy, whereas a temperature of zero corresponds to a deterministic policy. Q-values are computed according to the underlying policy provided as an argument to the constructor.

To prevent zero-probability actions, an epsilon chance of taking an action uniformly at random can be specified.

Note that wrapping an existing policy in a BoltzmannPolicy does not ensure consistency of the state values $V$ and Q-values $Q$ according to the Bellman equation, since this would require repeated Bellman updates to ensure convergence.

source

Mixture Policies

A subclass of stochastic policies are mixture policies, which randomly select one of their underlying policies to sample an action from:

SymbolicPlanners.MixturePolicy — Type

MixturePolicy(policies, [weights, rng::AbstractRNG])

A mixture of underlying policies with associated weights. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed.

source

SymbolicPlanners.EpsilonMixturePolicy — Type

EpsilonMixturePolicy(domain, policy, epsilons, [weights, rng::AbstractRNG])

A mixture of epsilon-greedy policies with different epsilons and mixture weights, specified as Vectors. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed. The domain is required to determine the actions available in each state.

source

SymbolicPlanners.BoltzmannMixturePolicy — Type

BoltzmannMixturePolicy(policy, temperatures, [weights,]
                       [epsilon, rng::AbstractRNG])

A mixture of Boltzmann policies with different temperatures and mixture weights, specified as Vectors. If provided, weights must be non-negative and sum to one. Otherwise a uniform mixture is assumed. Q-values are computed according to the underlying policy provided as an argument to the constructor.

Similar to the BoltzmannPolicy, epsilon can be specified to add a an epsilon chance of taking an action uniformly at random.

source

Mixture policies are associated with a set of mixture weights. These can be accessed with get_mixture_weights:

SymbolicPlanners.get_mixture_weights — Function

get_mixture_weights(sol)

Returns the mixture weights for a mixture policy.

get_mixture_weights(sol, state, action; normalize = true)

Returns the posterior mixture weights for a mixture policy after an action has been taken at state. Posterior weights are normalized by default, but this can be disabled by setting normalize = false.

source

Combined Solutions

It is possible to combine multiple solutions into a single solution using a MultiSolution:

SymbolicPlanners.MultiSolution — Type

MultiSolution(solutions::Solution...)
MultiSolution(solutions::Tuple, [selector])

A combination of multiple Solutions, which are selected between according to a selector function (solutions, [state]) -> sol that returns the solution to use (which may depend on the current state). The selector default to always returning the first solution.

source