kitae.operations package#
Submodules#
kitae.operations.loss module#
Collection of loss functions for reinforcement learning.
- kitae.operations.loss.loss_policy_ppo(dist: Distribution, log_probs: Array, log_probs_old: Array, gaes: Array, clip_eps: float, entropy_coef: float) tuple[float, dict[str, Array]] #
Proximal Policy Optimization’s policy loss function.
- Parameters:
dist – A dx.Distribution to compute the entropy
logits – An Array of shape (…, N_actions)
log_probs – An Array of shape (…, 1)
log_probs_old – An Array of shape (…, 1)
gaes – An Array of shape (…, 1)
clip_eps – A float
entropy_coef – A float
- Returns:
A float corresponding to the loss value. A LossDict with the following keys: [“loss_policy”, “entropy”, “kl_divergence”]
- kitae.operations.loss.loss_shannon_jensen_divergence(average_logits: Array, average_entropy: Array) float #
Shannon Jensen Divergence loss function
Shannon Jensen Divergence loss is used to increase the behaviour diversity of a population.
Compute average_logits by averaging logits over the last axis
Compute average_entropy by averaging entropies over the last axis
- Parameters:
average_logits – An Array of shape (…, N_actions)
average_entropy – An Array of shape (…, N_actions)
- Returns:
A float
- Return type:
shannon_jensen_divergence_loss
- kitae.operations.loss.loss_value_clip(values: Array, targets: Array, values_old: Array, clip_eps: float) tuple[float, dict[str, Array]] #
Clipped value loss function
A clipped value loss ensures smaller updates of the value.
- Parameters:
values – An Array of shape (…, 1)
targets – An Array of shape (…, 1)
values_old – An Array of shape (…, 1)
clip_eps – A float
- Returns:
A float corresponding to the loss value. A LossDict with the following keys: [“loss_value”]
kitae.operations.timesteps module#
Timesteps operations useful for reinforcement learning
- kitae.operations.timesteps.calculate_gaes_targets(values: Array, next_values: Array, discounts: Array, rewards: Array, _lambda: float, normalize: bool) tuple[Array, Array] #
Calculates general advantage estimations
- Parameters:
values – An Array of shape (T, 1)
next_values – An Array of shape (T, 1)
discounts – An Array of shape (T, 1)
rewards – An Array of shape (T, 1)
_lambda – A float
normalize – A boolean indicating if the advantages should be normalized
- Returns:
An array of shape (T, 1)
- Return type:
gaes
values are estimated by using observations. next_values are estimated by using next_observations. discounts are calculated using dones and the discount factor.
- kitae.operations.timesteps.compute_td_targets(rewards: Array, discounts: Array, next_values: Array)#
Calculates TD targets
- Parameters:
rewards – An Array of shape (T, 1)
discounts – An Array of shape (T, 1)
next_values – An Array of shape (T, 1)
- Returns:
An array of shape (T, 1)
- Return type:
targets
kitae.operations.transformation module#
- kitae.operations.transformation.action_clip(x: Array, action_space: Box) Array #
- kitae.operations.transformation.flatten(x: Array) Array #
- kitae.operations.transformation.inverse_linear_interpolation(y: Array, a: float, b: float)#
- kitae.operations.transformation.linear_interpolation(x: Array, a: float, b: float)#
- kitae.operations.transformation.normalize_frames(x: Array) Array #