Token Healing
Token Healing is a robustness feature within the Proxy Structuring Engine (PSE) designed to handle minor discrepancies between the exact token generated by the Language Model (LLM) and the canonical token expected by the grammar (StateMachine). It allows PSE to recover gracefully when the generated token contains slight variations (like extra spaces or minor artifacts) but still clearly corresponds to a valid structural continuation.
The Problem: Tokenization Artifacts
LLMs operate on tokens, and sometimes the tokenization process introduces subtle variations compared to the clean strings defined in a grammar.
Example:
* A StateMachine
expects the exact string "example"
as the next valid token.
* The vocabulary contains a canonical token for "example"
(e.g., ID 456).
* However, due to tokenization quirks, the LLM might generate a slightly different token, like ID 123, which decodes to " example "
(with leading/trailing spaces).
Without Token Healing, when the engine receives " example "
, the StateMachine
might only consume the "example"
part, leaving the spaces as remaining_input
. This could potentially cause the generation path to be incorrectly penalized or discarded, even though the LLM likely intended to produce the correct structural element. Standard multi-token sequences (like generating "exam"
then "ple"
to form "example"
) are handled naturally by the Stepper's recursive consumption and are not the target of Token Healing.
How Token Healing Works
Token Healing activates specifically when a call to stepper->consume(token)
results in a stepper state where remaining_input
is not null. This indicates the provided token
was longer than what the current StateMachine
step could fully process.
- Partial Consumption: The
StateMachine
consumes the longest valid prefix it can from the inputtoken
. The unconsumed part is left asremaining_input
. - Prefix Identification: The portion of the input
token
that was successfully consumed is identified (let's call thisconsumed_prefix
). - Vocabulary Lookup: PSE performs an efficient lookup in the vocabulary (the HAT-trie provided to the
Engine
) usingvocab->longest_prefix(consumed_prefix)
. This search finds the longest token string stored in the vocabulary that starts withconsumed_prefix
. - Canonical Match Found: If the lookup finds such a token in the vocabulary (let's call it
canonical_token
), PSE assumes thiscanonical_token
was the LLM's true intention. - Healing Applied:
- A new
StepperDelta
is created, associating the transition with thecanonical_token
string found in the vocabulary (not the original, slightly deviant inputtoken
). - This
StepperDelta
is marked withwas_healed = true
. - The
remaining_input
on the resultingStepper
is cleared, as the healing process assumes the fullcanonical_token
was intended and processed.
- A new
- Path Selection: When choosing the best path forward (
StepperDelta::choose_best_path
), the engine prioritizes paths that did not require healing (was_healed = false
). Healing serves as a fallback to recover potentially valid paths that would otherwise be discarded due to minor token deviations.
Benefits
- Resilience to Tokenization Noise: Makes generation robust against common tokenization artifacts like inconsistent spacing around words or punctuation.
- Reduced False Negatives: Prevents valid generation paths from being prematurely abandoned due to minor, recoverable token mismatches.
- Improved Reliability: Increases the overall success rate of generating complex, guaranteed structures, especially with models or tokenizers prone to minor inconsistencies.
Token Healing enhances PSE's practical reliability by bridging the gap between the precise requirements of the StateMachine
and the sometimes noisy reality of token-based LLM generation. It operates automatically when a vocabulary is provided to the engine.