kitae.operations package#

Submodules#

kitae.operations.loss module#

Collection of loss functions for reinforcement learning.

kitae.operations.loss.loss_policy_ppo(dist: Distribution, log_probs: Array, log_probs_old: Array, gaes: Array, clip_eps: float, entropy_coef: float) tuple[float, dict[str, Array]]#

Proximal Policy Optimization’s policy loss function.

Parameters:
  • dist – A dx.Distribution to compute the entropy

  • logits – An Array of shape (…, N_actions)

  • log_probs – An Array of shape (…, 1)

  • log_probs_old – An Array of shape (…, 1)

  • gaes – An Array of shape (…, 1)

  • clip_eps – A float

  • entropy_coef – A float

Returns:

A float corresponding to the loss value. A LossDict with the following keys: [“loss_policy”, “entropy”, “kl_divergence”]

kitae.operations.loss.loss_shannon_jensen_divergence(average_logits: Array, average_entropy: Array) float#

Shannon Jensen Divergence loss function

Shannon Jensen Divergence loss is used to increase the behaviour diversity of a population.

  • Compute average_logits by averaging logits over the last axis

  • Compute average_entropy by averaging entropies over the last axis

Parameters:
  • average_logits – An Array of shape (…, N_actions)

  • average_entropy – An Array of shape (…, N_actions)

Returns:

A float

Return type:

shannon_jensen_divergence_loss

kitae.operations.loss.loss_value_clip(values: Array, targets: Array, values_old: Array, clip_eps: float) tuple[float, dict[str, Array]]#

Clipped value loss function

A clipped value loss ensures smaller updates of the value.

Parameters:
  • values – An Array of shape (…, 1)

  • targets – An Array of shape (…, 1)

  • values_old – An Array of shape (…, 1)

  • clip_eps – A float

Returns:

A float corresponding to the loss value. A LossDict with the following keys: [“loss_value”]

kitae.operations.timesteps module#

Timesteps operations useful for reinforcement learning

kitae.operations.timesteps.calculate_gaes_targets(values: Array, next_values: Array, discounts: Array, rewards: Array, _lambda: float, normalize: bool) tuple[Array, Array]#

Calculates general advantage estimations

Parameters:
  • values – An Array of shape (T, 1)

  • next_values – An Array of shape (T, 1)

  • discounts – An Array of shape (T, 1)

  • rewards – An Array of shape (T, 1)

  • _lambda – A float

  • normalize – A boolean indicating if the advantages should be normalized

Returns:

An array of shape (T, 1)

Return type:

gaes

values are estimated by using observations. next_values are estimated by using next_observations. discounts are calculated using dones and the discount factor.

kitae.operations.timesteps.compute_td_targets(rewards: Array, discounts: Array, next_values: Array)#

Calculates TD targets

Parameters:
  • rewards – An Array of shape (T, 1)

  • discounts – An Array of shape (T, 1)

  • next_values – An Array of shape (T, 1)

Returns:

An array of shape (T, 1)

Return type:

targets

kitae.operations.transformation module#

kitae.operations.transformation.action_clip(x: Array, action_space: Box) Array#
kitae.operations.transformation.flatten(x: Array) Array#
kitae.operations.transformation.inverse_linear_interpolation(y: Array, a: float, b: float)#
kitae.operations.transformation.linear_interpolation(x: Array, a: float, b: float)#
kitae.operations.transformation.normalize_frames(x: Array) Array#

Module contents#