Skip to content

Token Healing

Token Healing is a robustness feature within the Proxy Structuring Engine (PSE) designed to handle minor discrepancies between the exact token generated by the Language Model (LLM) and the canonical token expected by the grammar (StateMachine). It allows PSE to recover gracefully when the generated token contains slight variations (like extra spaces or minor artifacts) but still clearly corresponds to a valid structural continuation.

The Problem: Tokenization Artifacts

LLMs operate on tokens, and sometimes the tokenization process introduces subtle variations compared to the clean strings defined in a grammar.

Example: * A StateMachine expects the exact string "example" as the next valid token. * The vocabulary contains a canonical token for "example" (e.g., ID 456). * However, due to tokenization quirks, the LLM might generate a slightly different token, like ID 123, which decodes to " example " (with leading/trailing spaces).

Without Token Healing, when the engine receives " example ", the StateMachine might only consume the "example" part, leaving the spaces as remaining_input. This could potentially cause the generation path to be incorrectly penalized or discarded, even though the LLM likely intended to produce the correct structural element. Standard multi-token sequences (like generating "exam" then "ple" to form "example") are handled naturally by the Stepper's recursive consumption and are not the target of Token Healing.

How Token Healing Works

Token Healing activates specifically when a call to stepper->consume(token) results in a stepper state where remaining_input is not null. This indicates the provided token was longer than what the current StateMachine step could fully process.

  1. Partial Consumption: The StateMachine consumes the longest valid prefix it can from the input token. The unconsumed part is left as remaining_input.
  2. Prefix Identification: The portion of the input token that was successfully consumed is identified (let's call this consumed_prefix).
  3. Vocabulary Lookup: PSE performs an efficient lookup in the vocabulary (the HAT-trie provided to the Engine) using vocab->longest_prefix(consumed_prefix). This search finds the longest token string stored in the vocabulary that starts with consumed_prefix.
  4. Canonical Match Found: If the lookup finds such a token in the vocabulary (let's call it canonical_token), PSE assumes this canonical_token was the LLM's true intention.
  5. Healing Applied:
    • A new StepperDelta is created, associating the transition with the canonical_token string found in the vocabulary (not the original, slightly deviant input token).
    • This StepperDelta is marked with was_healed = true.
    • The remaining_input on the resulting Stepper is cleared, as the healing process assumes the full canonical_token was intended and processed.
  6. Path Selection: When choosing the best path forward (StepperDelta::choose_best_path), the engine prioritizes paths that did not require healing (was_healed = false). Healing serves as a fallback to recover potentially valid paths that would otherwise be discarded due to minor token deviations.

Benefits

  • Resilience to Tokenization Noise: Makes generation robust against common tokenization artifacts like inconsistent spacing around words or punctuation.
  • Reduced False Negatives: Prevents valid generation paths from being prematurely abandoned due to minor, recoverable token mismatches.
  • Improved Reliability: Increases the overall success rate of generating complex, guaranteed structures, especially with models or tokenizers prone to minor inconsistencies.

Token Healing enhances PSE's practical reliability by bridging the gap between the precise requirements of the StateMachine and the sometimes noisy reality of token-based LLM generation. It operates automatically when a vocabulary is provided to the engine.