kitae.operations package

kitae.operations package#

Submodules#

kitae.operations.loss module#

Collection of loss functions for reinforcement learning.

kitae.operations.loss.loss_policy_ppo(dist: Distribution, log_probs: Array, log_probs_old: Array, gaes: Array, clip_eps: float, entropy_coef: float) → tuple[float, dict[str, Array]]#

Proximal Policy Optimization’s policy loss function.

Parameters:

dist – A dx.Distribution to compute the entropy
logits – An Array of shape (…, N_actions)
log_probs – An Array of shape (…, 1)
log_probs_old – An Array of shape (…, 1)
gaes – An Array of shape (…, 1)
clip_eps – A float
entropy_coef – A float

Returns:

A float corresponding to the loss value. A LossDict with the following keys: [“loss_policy”, “entropy”, “kl_divergence”]

kitae.operations.loss.loss_shannon_jensen_divergence(average_logits: Array, average_entropy: Array) → float#

Shannon Jensen Divergence loss function

Shannon Jensen Divergence loss is used to increase the behaviour diversity of a population.

Compute average_logits by averaging logits over the last axis
Compute average_entropy by averaging entropies over the last axis

Parameters:

average_logits – An Array of shape (…, N_actions)
average_entropy – An Array of shape (…, N_actions)

Returns:

A float

Return type:

shannon_jensen_divergence_loss

kitae.operations.loss.loss_value_clip(values: Array, targets: Array, values_old: Array, clip_eps: float) → tuple[float, dict[str, Array]]#

Clipped value loss function

A clipped value loss ensures smaller updates of the value.

Parameters:

values – An Array of shape (…, 1)
targets – An Array of shape (…, 1)
values_old – An Array of shape (…, 1)
clip_eps – A float

Returns:

A float corresponding to the loss value. A LossDict with the following keys: [“loss_value”]

kitae.operations.timesteps module#

Timesteps operations useful for reinforcement learning

kitae.operations.timesteps.calculate_gaes_targets(values: Array, next_values: Array, discounts: Array, rewards: Array, _lambda: float, normalize: bool) → tuple[Array, Array]#

Calculates general advantage estimations

Parameters:

values – An Array of shape (T, 1)
next_values – An Array of shape (T, 1)
discounts – An Array of shape (T, 1)
rewards – An Array of shape (T, 1)
_lambda – A float
normalize – A boolean indicating if the advantages should be normalized

Returns:

An array of shape (T, 1)

Return type:

gaes

values are estimated by using observations. next_values are estimated by using next_observations. discounts are calculated using dones and the discount factor.

kitae.operations.timesteps.compute_td_targets(rewards: Array, discounts: Array, next_values: Array)#

Calculates TD targets

Parameters:

rewards – An Array of shape (T, 1)
discounts – An Array of shape (T, 1)
next_values – An Array of shape (T, 1)

Returns:

An array of shape (T, 1)

Return type:

targets

kitae.operations.transformation module#

kitae.operations.transformation.action_clip(x: Array, action_space: Box) → Array#

kitae.operations.transformation.flatten(x: Array) → Array#

kitae.operations.transformation.inverse_linear_interpolation(y: Array, a: float, b: float)#

kitae.operations.transformation.linear_interpolation(x: Array, a: float, b: float)#

kitae.operations.transformation.normalize_frames(x: Array) → Array#

kitae.operations package

Contents

kitae.operations package#

Submodules#

kitae.operations.loss module#

kitae.operations.timesteps module#

kitae.operations.transformation module#

Module contents#