Skip to content

Stepper API

This page details the API for the pse_core.Stepper class. Steppers track the current position within a StateMachine during generation. Developers typically don't instantiate Steppers directly but interact with them via the StructuringEngine. Understanding the Stepper API can be helpful for debugging or advanced customization.

class pse_core.Stepper(
    state_machine: StateMachine,
    current_state: StateId | None = None
)

Represents a position within a state machine and manages traversal.

Parameters:

  • state_machine (StateMachine): The StateMachine instance this stepper will traverse.
  • current_state (StateId | None, optional): The initial state ID within the state_machine. If None, defaults to the state_machine's start_state.

Methods

clone

clone() -> Stepper

Creates a deep copy of this stepper, including its current state, history, and associated sub-stepper (if any). Essential for exploring multiple grammar paths.

Returns:

  • (Stepper): A new Stepper instance identical to the original.

consume

consume(
    token: str
) -> list[Stepper]

Consumes a token string and advances the stepper's state according to its StateMachine. This is the primary method for processing tokens during generation, often called internally by the StructuringEngine. It handles transitions, sub-stepper management, and branching.

Parameters:

  • token (str): The token string to consume.

Returns:

  • (list[Stepper]): A list of new Stepper instances representing all possible valid states after consuming the token. Returns an empty list if the token leads to no valid states.

get_current_value

get_current_value() -> Any

Returns the accumulated value parsed from the raw string generated along the stepper's path so far. Attempts to parse as JSON (number, boolean, object, array) first, falling back to the raw string if JSON parsing fails.

Returns:

  • (Any): The parsed Python object (e.g., int, float, bool, dict, list) or the raw str. Returns None if no value has been accumulated.

get_raw_value

get_raw_value() -> str

Returns the raw, concatenated string output generated along the stepper's path (including history and any active sub-stepper) without attempting any parsing or type conversion.

Returns:

  • (str): The raw accumulated string.

get_valid_continuations

get_valid_continuations() -> list[str]

Returns a list of strings representing all valid token sequences that can legally follow the stepper's current state according to its StateMachine. Used by the StructuringEngine to determine which tokens to allow during logit processing.

Returns:

  • (list[str]): A list of valid continuation strings.

get_invalid_continuations

get_invalid_continuations() -> list[str]

Returns a list of strings that are explicitly invalid continuations from the current state. This is less commonly used than get_valid_continuations but can be implemented by custom StateMachine subclasses for specific exclusion rules.

Returns:

  • (list[str]): A list of invalid continuation strings.

has_reached_accept_state

has_reached_accept_state() -> bool

Checks if the stepper (and its sub-stepper, if active) is currently in a state designated as an end_state by its StateMachine.

Returns:

  • (bool): True if the stepper is in a valid terminal state, False otherwise.

can_accept_more_input

can_accept_more_input() -> bool

Checks if the stepper can consume more tokens based on the rules of its current StateMachine (e.g., character limits in a CharacterStateMachine).

Returns:

  • (bool): True if more input can be processed, False otherwise.

is_within_value

is_within_value() -> bool

Indicates if the stepper is currently in the process of accumulating characters for a specific value (e.g., inside a string literal, number, etc.), as opposed to being between structural elements.

Returns:

  • (bool): True if actively consuming value characters.

accepts_any_token

accepts_any_token() -> bool

Indicates if the stepper's current state allows any token as a valid continuation (often true for free-form text states).

Returns:

  • (bool): True if any token is currently valid.

get_identifier

get_identifier() -> str | None

Returns the identifier string associated with the stepper's current StateMachine or its active sub-stepper's StateMachine. Used by get_stateful_structured_output.

Returns:

  • (str | None): The identifier string, or None.

get_token_ids_history

get_token_ids_history() -> list[int]

Returns the sequence of token IDs that were consumed along the path taken by this stepper and its history. Used for accurate output reconstruction via get_token_safe_output.

Returns:

  • (list[int]): The list of consumed token IDs.

get_token_safe_output

get_token_safe_output(
    decode_function: Callable[[list[int]], str]
) -> str

Reconstructs the generated string output accurately by decoding the stored token_ids_history using the provided decode_function. This avoids potential errors from decoding the raw_value directly, especially if token healing occurred.

Parameters:

  • decode_function (Callable[[list[int]], str]): The tokenizer's decode function.

Returns:

  • (str): The accurately reconstructed output string.

Properties

  • state_machine (StateMachine): The StateMachine this stepper is traversing. Read/write.
  • current_state (StateId): The current state ID within the state_machine. Read/write.
  • target_state (StateId | None): The target state ID for an in-progress transition. Read/write.
  • sub_stepper (Stepper | None): The active sub-stepper handling a nested StateMachine traversal, if any. Read/write.
  • history (list[Stepper]): The list of completed sub-steppers that led to the current state. Read/write.
  • consumed_character_count (int): The number of characters consumed along this stepper's path. Read/write.
  • remaining_input (str | None): Any portion of the last consumed token that was not processed by the current step. Read/write.
  • _raw_value (str | None): The raw accumulated string value. Use get_raw_value() for access. Read/write (internal use primarily).