Structuring Engine API
This page details the API for the main pse.StructuringEngine class, which is the primary interface for using the Proxy Structuring Engine.
class pse.StructuringEngine(
tokenizer: PreTrainedTokenizerFast | PreTrainedTokenizerBase,
whitelist_control_tokens: list[str] | None = None,
multi_token_sampling: bool = False,
max_resample_attempts: int = 5
)
Inherits from the core C++ Engine. Orchestrates token processing and interfaces with language models.
Parameters:
tokenizer(PreTrainedTokenizerFast | PreTrainedTokenizerBase): An initialized tokenizer instance from the Hugging Facetransformerslibrary. Used to access the vocabulary, encode text to token IDs, and decode token IDs to text.whitelist_control_tokens(list[str] | None, optional): A list of control token strings (e.g.,"<|eot_id|>") that should not be automatically masked by the engine, even if they might otherwise be considered invalid by the grammar near the end of generation. This prevents the engine from blocking essential control tokens like EOS. Defaults toNone.multi_token_sampling(bool, optional): Enables or disables the Multi-Token Processing optimization. WhenTrue(default in source, but shown asFalsehere - Correction: Default isFalsein provided__init__signature), allows the engine to potentially return multiple tokens at once if an unambiguous multi-token sequence is required by the grammar. Defaults toFalse.max_resample_attempts(int, optional): The maximum number of times the engine will ask the base sampler for a new token if the initially sampled token is invalid according to the grammar. Helps find a valid token when the probability mass is concentrated on invalid options. Defaults to5.
Methods
configure
configure(
structure: JSONSchemaSource | StateMachine,
**kwargs: Any
) -> None
Configures the engine with the desired output structure. This translates the provided schema into the internal StateMachine representation used for enforcement.
Parameters:
structure(JSONSchemaSource | StateMachine): The schema definition. Can be a PydanticBaseModelclass, a JSON Schema dictionary, a Python callable (function signature), a sequence (listortuple) of these types (interpreted asanyOf), or a directStateMachineinstance. See Schema Sources Guide.**kwargs: Additional keyword arguments passed to the schema conversion process (e.g.,delimitersorbuffer_lengthwhen usingEncapsulatedStateMachineorWaitForimplicitly or explicitly).
process_logits
process_logits(
input_ids: Any, # Framework-specific tensor/array
scores: Any # Framework-specific tensor/array
) -> Any # Framework-specific tensor/array
The primary logits processing hook. This method should be added to the logits_processor list in your generation call. It queries the internal Stepper(s) to determine valid next tokens based on the StateMachine and masks the scores (logits) tensor, setting invalid token probabilities to negative infinity.
Parameters:
input_ids: The input token IDs tensor/array provided by the generation framework.scores: The logits tensor/array (typically shape(batch_size, vocab_size)) produced by the LLM for the current step.
Returns:
- The modified logits tensor/array with invalid tokens masked.
sample
sample(
logprobs: Any, # Framework-specific tensor/array
sampler: Callable[..., Any]
) -> Any # Framework-specific tensor/array
The sampling hook. This method should be used as the sampler function in your generation call. It takes the processed logits (logprobs), calls the provided base sampler function, checks the validity of the sampled token(s), potentially resamples up to max_resample_attempts, handles multi-token processing, advances the internal Stepper(s), and returns the final chosen token ID(s).
Parameters:
logprobs: The processed logits tensor/array (shape(batch_size, vocab_size)) afterprocess_logitshas been applied.sampler: The base sampling function from your framework (e.g.,torch.multinomial,jax.random.categorical,tf.random.stateless_categorical) which takes the logits and returns sampled token ID(s).
Returns:
- A tensor/array containing the chosen token ID(s) for the current step.
get_structured_output
get_structured_output(
output_type: type[OutputType] | None = None,
raise_on_error: bool = False
) -> OutputType | Any
Retrieves the final generated output string from the engine's state, parses it (primarily as JSON), and optionally validates/casts it to a specified Python type (like a Pydantic model).
Parameters:
output_type(type[OutputType] | None, optional): The target Python type (e.g., a PydanticBaseModelsubclass) to parse and validate the output against. IfNone, the raw parsed JSON object (or string if JSON parsing fails) is returned. Defaults toNone.raise_on_error(bool, optional): IfTrue, raises an error if JSON parsing or Pydantic validation fails. IfFalse, logs the error and returns the raw string or partially parsed object. Defaults toFalse.
Returns:
- An instance of
output_typeif provided and validation succeeds, otherwise the parsed JSON object, or the raw string if parsing fails andraise_on_errorisFalse.
get_stateful_structured_output
get_stateful_structured_output(
output_type: type[OutputType] | None = None,
raise_on_error: bool = False
) -> Iterator[tuple[str, OutputType | Any]]
Retrieves the generated output segmented by the state identifier that produced each part. Useful for complex state machines (like in PBA) where different parts of the output correspond to different logical steps (e.g., "thinking", "tool_call").
Parameters:
output_type(type[OutputType] | None, optional): The target Python type to parse/validate each segment against (applied individually). Defaults toNone.raise_on_error(bool, optional): Whether to raise errors during parsing/validation of segments. Defaults toFalse.
Returns:
- An iterator yielding tuples of
(state_identifier: str, parsed_output: OutputType | Any).
get_live_structured_output
get_live_structured_output() -> tuple[str, str] | None
Attempts to retrieve the current, potentially incomplete output being generated, along with the identifier of the state currently being processed. Useful for streaming or live display. Relies on Stepper.get_token_safe_output.
Returns:
- A tuple
(state_identifier, current_output_string)if available, otherwiseNone.
reset
reset(
hard_reset: bool = False
) -> None
Resets the engine's internal state, clearing active Steppers. If hard_reset is True, it also removes the configured StateMachine.
Parameters:
hard_reset(bool, optional): IfTrue, removes the configuredStateMachinein addition to resetting steppers. Defaults toFalse.
Properties
has_reached_accept_state(bool, read-only): ReturnsTrueif any of the activeSteppers have reached a valid end state according to the configuredStateMachine.state_machine(StateMachine | None): The currently configured rootStateMachineinstance. Can be set directly or viaconfigure.steppers(list[Stepper]): The list of currently activeStepperobjects representing the engine's state within theStateMachine. Can be read or set directly (advanced use).vocabulary(TrieMap, read-only): The vocabulary map (string to token ID list) used by the engine, derived from the tokenizer.reverse_vocabulary(dict[int, str], read-only): The reverse vocabulary map (token ID to string).multi_token_mapping(dict[int, list[int]]): The internal mapping used for multi-token processing. Can be read or set (advanced use).