Data Type State Machines

PSE provides specialized state machines for handling common data types, allowing precise control over the format and content of generated values.

StringStateMachine

StringStateMachine handles JSON-style string parsing with double quotation marks and proper escaping of special characters.

from pse.types.string import StringStateMachine

# Create a basic string state machine that matches quoted strings like "hello"
string_sm = StringStateMachine()

# Create a string state machine with length constraints
constrained_string_sm = StringStateMachine(
    min_length=3,  # Minimum string length (excluding quotes)
    max_length=50  # Maximum string length (excluding quotes)
)

Implementation details: - State machine structure: - Initial state (0): Expects opening double quote (") - String contents state (1): Processes regular characters - Escape sequence state (2): Handles escape sequences - Unicode escape state (3): Processes \uXXXX Unicode escapes

Escape sequence handling:
Standard JSON escape sequences: \", \, \/, \b, \f, \n, \r, \t
Unicode escape sequences: \uXXXX (4 hex digits)
Automatically rejects invalid escape sequences
Character handling:
Properly validates all string content
Automatically rejects control characters that must be escaped
Maintains proper JSON string compliance

StringSchemaStateMachine extension: For JSON Schema integration, the StringSchemaStateMachine adds pattern and format validation:

from pse.types.json.json_string import StringSchemaStateMachine

# Create a string state machine with JSON Schema constraints
email_string_sm = StringSchemaStateMachine(
    schema={"type": "string", "format": "email"},
    context={}
)

# Create a string state machine with pattern validation
pattern_string_sm = StringSchemaStateMachine(
    schema={"type": "string", "pattern": "^[A-Z]{2}-\\d{4}$"},  # Format like "AB-1234"
    context={}
)

Example: String with escape sequences

from pse.types.string import StringStateMachine

# Define a string state machine
string_sm = StringStateMachine()

# This will match: "Hello\nWorld" and correctly handle the newline
# The value returned will be the string without quotes: Hello\nWorld

Example: Unicode handling

from pse.types.string import StringStateMachine

# Define a string state machine
string_sm = StringStateMachine()

# This will match: "Hello\u0057orld" and correctly interpret it as "HelloWorld"
# This also matches: "Hello 世界" as Unicode is supported via direct inclusion or escapes

Example: Length validation

from pse.types.string import StringStateMachine

# Create a string state machine with length constraints
username_sm = StringStateMachine(min_length=3, max_length=20)

# This will match: "user123"
# This will not match: "a" (too short)
# This will not match: "this_username_is_way_too_long" (too long)

NumberStateMachine

NumberStateMachine parses JSON-compliant numeric values, handling integers, decimals, and scientific notation.

from pse.types.number import NumberStateMachine

# Create a basic number state machine for all numeric formats
number_sm = NumberStateMachine()

Implementation details: - State machine structure: - State 0: Starting state that optionally accepts a negative sign (-) - State 1: Expects an integer part (mandatory) - State 2: Accept state for integers, can also transition to decimal part - State 3: Decimal part state (after the decimal point), also an accept state - State 4: Exponential notation marker 'e' or 'E' - State 5: Optional sign for the exponent (+ or -) - Final state: Exponent integer value, if present

Decimal handling:
Properly handles decimal points followed by integer part
Maintains leading zeros after decimal point (e.g., 0.001)
Validates full decimal number structure
Scientific notation:
Supports 'e' or 'E' notation (e.g., 1.5e3, 2E-4)
Handles optional sign in exponent
Properly combines base number with exponent for final value

NumberSchemaStateMachine extension: For JSON Schema integration, use the NumberSchemaStateMachine to add validation constraints:

from pse.types.json.json_number import NumberSchemaStateMachine

# Create a number state machine with JSON Schema constraints
constrained_number_sm = NumberSchemaStateMachine(
    schema={
        "type": "number",
        "minimum": 0,
        "maximum": 100,
        "exclusiveMaximum": True
    },
    context={}
)

# Create an integer-only state machine
integer_number_sm = NumberSchemaStateMachine(
    schema={
        "type": "integer",
        "multipleOf": 5
    },
    context={}
)

Example: Basic number parsing

from pse.types.number import NumberStateMachine

# Define a number state machine
number_sm = NumberStateMachine()

# This will match: "42" (integer)
# This will match: "-3.14" (negative decimal)
# This will match: "1.618e2" (scientific notation, equal to 161.8)

Example: JSON Schema validation

from pse.types.json.json_number import NumberSchemaStateMachine

# Define a number state machine with constraints
temperature_sm = NumberSchemaStateMachine(
    schema={
        "type": "number",
        "minimum": -273.15,  # Absolute zero in Celsius
        "maximum": 1000
    },
    context={}
)

# This will match: "25.5" (valid temperature)
# This will not match: "-300" (below absolute zero)
# This will not match: "1200" (above maximum)

Example: Integer validation

from pse.types.json.json_number import NumberSchemaStateMachine

# Define an integer state machine with constraints
page_number_sm = NumberSchemaStateMachine(
    schema={
        "type": "integer",
        "minimum": 1
    },
    context={}
)

# This will match: "42" (valid page number)
# This will not match: "2.5" (not an integer)
# This will not match: "0" (below minimum)

IntegerStateMachine

IntegerStateMachine is a specialized state machine for parsing positive integer values with configurable handling of leading zeros.

from pse.types.integer import IntegerStateMachine

# Create a basic integer state machine
int_sm = IntegerStateMachine()

# Create an integer state machine that preserves leading zeros
raw_int_sm = IntegerStateMachine(drop_leading_zeros=False)

Implementation details: - Extends CharacterStateMachine with digit characters (0-9) - Only accepts positive integers (no negative sign support) - By default, removes leading zeros and returns an int value - When drop_leading_zeros=False, preserves the original string representation

Example: Basic integer parsing

from pse.types.integer import IntegerStateMachine

# Define an integer state machine
int_sm = IntegerStateMachine()

# This will match: "42" -> 42 (int)
# This will match: "007" -> 7 (int, leading zeros removed)
# This will not match: "-5" (negative not supported)
# This will not match: "3.14" (decimals not supported)

Example: Preserving leading zeros

from pse.types.integer import IntegerStateMachine

# Create an integer state machine that preserves leading zeros
raw_int_sm = IntegerStateMachine(drop_leading_zeros=False)

# This will match: "42" -> "42" (string)
# This will match: "007" -> "007" (string, leading zeros preserved)
# This will not match: "-5" (negative not supported)
# This will not match: "3.14" (decimals not supported)

Note: For more complex integer requirements (negative numbers, range validation), use NumberSchemaStateMachine with a JSON Schema:

from pse.types.json.json_number import NumberSchemaStateMachine

# Create a state machine for integers with validation constraints
integer_sm = NumberSchemaStateMachine(
    schema={
        "type": "integer",
        "minimum": 1,
        "maximum": 1000
    },
    context={}
)

BooleanStateMachine

BooleanStateMachine parses JSON-compliant boolean literals ("true"/"false").

from pse.types.boolean import BooleanStateMachine

# Create a boolean state machine
bool_sm = BooleanStateMachine()

Implementation details: - State machine structure: - Initial state (0): Expects exact match of "true" or "false" literals - Uses PhraseStateMachine for exact string matching - Transitions directly to accept states with corresponding boolean values - Returns Python True or False values

JSON compliance:
Strictly case-sensitive (only "true"/"false", not "True"/"FALSE")
No whitespace allowed (unlike some other PSE state machines)
No partial matches or extra characters permitted

Example: Boolean parsing

from pse.types.boolean import BooleanStateMachine

# Define a boolean state machine
bool_sm = BooleanStateMachine()

# This will match: "true" -> True (Python bool)
# This will match: "false" -> False (Python bool)
# This will not match: "True" (incorrect case)
# This will not match: " true" (has whitespace)
# This will not match: "truthy" (extra characters)

Integration example:

from pse.types.boolean import BooleanStateMachine
from pse.types.json.json_object import ObjectSchemaStateMachine

# Using boolean state machine inside a JSON object
settings_sm = ObjectSchemaStateMachine(
    schema={
        "type": "object",
        "properties": {
            "enabled": {"type": "boolean"},
            "notifications": {"type": "boolean"}
        }
    },
    context={}
)

# This will match: {"enabled": true, "notifications": false}

ArrayStateMachine

ArrayStateMachine handles JSON array structures, processing sequences of elements enclosed in square brackets.

from pse.types.array import ArrayStateMachine

# Create a basic array state machine for JSON arrays
array_sm = ArrayStateMachine()

Implementation details: - State machine structure: - State 0: Initial state that accepts opening bracket "[" - State 1: Whitespace handling or immediate closing bracket for empty arrays - State 2: Value state that uses JsonStateMachine to parse elements - State 3: Whitespace handling after each value - State 4: Decision point (comma for next item or "]" to close array)

Array validation:
Enforces proper structure with opening/closing brackets
Ensures comma-separated values with no trailing commas
Properly validates nested array structures
Supports whitespace between array elements and brackets
Value collection:
Uses ArrayStepper to track parsed values
Maintains immutability by cloning steppers during processing
Returns a list of parsed JSON values

ArraySchemaStateMachine extension: For JSON Schema integration and type-specific arrays, use ArraySchemaStateMachine:

from pse.types.json.json_array import ArraySchemaStateMachine

# Create a typed array state machine for arrays of strings
string_array_sm = ArraySchemaStateMachine(
    schema={
        "type": "array",
        "items": {"type": "string"}
    },
    context={}
)

# Create an array with length constraints and unique items
constrained_array_sm = ArraySchemaStateMachine(
    schema={
        "type": "array",
        "items": {"type": "number"},
        "minItems": 1,
        "maxItems": 5,
        "uniqueItems": True
    },
    context={}
)

Example: Basic array parsing

from pse.types.array import ArrayStateMachine

# Define an array state machine
array_sm = ArrayStateMachine()

# This will match: "[]" -> [] (empty array)
# This will match: "[1, 2, 3]" -> [1, 2, 3] (array of numbers)
# This will match: '["a", "b", "c"]' -> ["a", "b", "c"] (array of strings)
# This will match: '[true, false]' -> [True, False] (array of booleans)

Example: Nested array parsing

from pse.types.array import ArrayStateMachine

# Define an array state machine
array_sm = ArrayStateMachine()

# This will match: "[[1, 2], [3, 4]]" -> [[1, 2], [3, 4]] (nested array)
# This will match: '[{"a": 1}, {"b": 2}]' -> [{"a": 1}, {"b": 2}] (array of objects)

Example: Schema validation

from pse.types.json.json_array import ArraySchemaStateMachine

# Create a schema-enforced array of integers
integers_array_sm = ArraySchemaStateMachine(
    schema={
        "type": "array",
        "items": {"type": "integer"},
        "minItems": 2
    },
    context={}
)

# This will match: "[1, 2, 3]" -> [1, 2, 3]
# This will not match: "[1]" (too few items)
# This will not match: "[1.5, 2]" (non-integer value)

ObjectStateMachine

ObjectStateMachine processes JSON object structures, handling key-value pairs enclosed in curly braces.

from pse.types.object import ObjectStateMachine

# Create a basic object state machine
object_sm = ObjectStateMachine()

# Create an object state machine that can be optional (empty objects allowed)
optional_object_sm = ObjectStateMachine(is_optional=True)

Implementation details: - State machine structure: - State 0: Initial state that accepts opening brace "{" - State 1: Whitespace handling after opening brace - State 2: Key-value pair processing using KeyValueStateMachine - State 3: Whitespace handling after key-value pair - State 4: Decision point (comma for next property or "}" to close object)

Object validation:
Enforces proper structure with opening/closing braces
Ensures comma-separated properties with no trailing commas
Properly handles nested object structures
Supports whitespace between properties and braces
Value collection:
Uses ObjectStepper to accumulate key-value pairs in a dictionary
Maintains immutability by cloning steppers during processing
Returns a dictionary of parsed property values
Parameters:
is_optional: When True, allows empty objects (default: False)

ObjectSchemaStateMachine extension: For JSON Schema integration and property validation, use ObjectSchemaStateMachine:

from pse.types.json.json_object import ObjectSchemaStateMachine

# Create a schema-based object state machine
person_object_sm = ObjectSchemaStateMachine(
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer", "minimum": 0},
            "email": {"type": "string", "format": "email"}
        },
        "required": ["name", "email"]
    },
    context={}
)

Example: Basic object parsing

from pse.types.object import ObjectStateMachine

# Define an object state machine
object_sm = ObjectStateMachine()

# This will match: "{}" -> {} (empty object, only if is_optional=True)
# This will match: '{"a": 1}' -> {"a": 1} (simple object)
# This will match: '{"name": "Alice", "age": 30}' -> {"name": "Alice", "age": 30} (multiple properties)

Example: Nested object parsing

from pse.types.object import ObjectStateMachine

# Define an object state machine
object_sm = ObjectStateMachine()

# This will match complex nested objects:
# {
#   "user": {
#     "name": "Alice",
#     "address": {
#       "city": "New York",
#       "zip": "10001"
#     }
#   }
# }

Example: Working with object values

from pse.types.object import ObjectStateMachine

object_sm = ObjectStateMachine()
input_str = '{"a": 1, "b": true, "c": "hello"}'

# Parse the object string
steppers = object_sm.get_steppers()
for char in input_str:
    steppers = object_sm.advance_all_steppers(steppers, char)

# Get the parsed object from the first valid stepper
if steppers:
    parsed_object = steppers[0].value  # {"a": 1, "b": True, "c": "hello"}

    # Access individual properties
    value_a = parsed_object["a"]  # 1

KeyValueStateMachine

KeyValueStateMachine handles JSON key-value pairs within objects, processing the structure of "key": value.

from pse.types.key_value import KeyValueStateMachine

# Create a basic key-value pair state machine
kv_sm = KeyValueStateMachine()

# Create an optional key-value pair state machine
optional_kv_sm = KeyValueStateMachine(is_optional=True)

Implementation details: - State machine structure: - By default, uses a sequence of five state machines: 1. StringStateMachine() for the key 2. WhitespaceStateMachine() for whitespace after key 3. PhraseStateMachine(":") for the colon separator 4. WhitespaceStateMachine() for whitespace after colon 5. JsonStateMachine() for the value - Transitions through each component in sequence - Returns a tuple of (key, value) upon successful parsing

Parameters:
sequence: Optional customized sequence of state machines (default: as described above)
is_optional: Whether the key-value pair is optional (default: False)
Key-value tracking:
Uses KeyValueStepper to track property name and value
Extracts key using json.loads() to handle quote removal and escaping
Value is processed by JsonStateMachine and can be any valid JSON type
Maintains immutability through proper stepper cloning

Example: Basic key-value parsing

from pse.types.key_value import KeyValueStateMachine

# Define a key-value state machine
kv_sm = KeyValueStateMachine()

# This will match: '"name": "Alice"' -> ("name", "Alice")
# This will match: '"age": 30' -> ("age", 30)
# This will match: '"active": true' -> ("active", True)
# This will match: '"data": {"x": 1, "y": 2}' -> ("data", {"x": 1, "y": 2})

Example: Using key-value pairs in objects

from pse.types.object import ObjectStateMachine
from pse.types.key_value import KeyValueStateMachine

# ObjectStateMachine uses KeyValueStateMachine internally
object_sm = ObjectStateMachine()

# This uses KeyValueStateMachine to parse each property
input_str = '{"name": "Alice", "age": 30}'

steppers = object_sm.get_steppers()
for char in input_str:
    steppers = object_sm.advance_all_steppers(steppers, char)

# Result is a dictionary built from key-value pairs
result = steppers[0].value  # {"name": "Alice", "age": 30}

KeyValueSchemaStateMachine extension: For JSON Schema validation of property pairs, the schema-aware extension provides additional validation:

from pse.types.json.json_key_value import KeyValueSchemaStateMachine

# Create a schema-validated key-value pair
email_property_sm = KeyValueSchemaStateMachine(
    property_name="email",
    property_schema={"type": "string", "format": "email"},
    context={},
    required=True
)

# This will match: '"email": "user@example.com"'
# This will not match: '"email": 123' (invalid type)
# This will not match: '"email": "invalid"' (invalid format)

WhitespaceStateMachine

WhitespaceStateMachine handles whitespace characters (spaces, tabs, newlines, carriage returns) in structured formats.

from pse.types.whitespace import WhitespaceStateMachine

# Create a whitespace state machine with default settings
# This makes whitespace optional (min=0) with a maximum of 20 characters
ws_sm = WhitespaceStateMachine()

# Create a whitespace state machine with specific constraints
required_ws_sm = WhitespaceStateMachine(
    min_whitespace=1,  # Require at least one whitespace character
    max_whitespace=10  # Allow at most 10 whitespace characters
)

Implementation details: - State machine structure: - Extends CharacterStateMachine with whitespace characters (" \t\n\r") - Accepts consecutive whitespace characters up to max_whitespace - Becomes optional when min_whitespace is set to 0 (default) - Returns the matched whitespace characters as a string

Parameters:
min_whitespace: Minimum required whitespace characters (default: 0)
max_whitespace: Maximum allowed whitespace characters (default: 20)
When min_whitespace=0, acts as an optional whitespace matcher
Matching behavior:
Greedy matching: consumes as many whitespace characters as possible
Stops consuming when encountering non-whitespace characters
Successfully validates when at least min_whitespace characters have been consumed
Rejects input when more than max_whitespace characters are encountered

Example: Basic whitespace handling

from pse.types.whitespace import WhitespaceStateMachine

# Define a whitespace state machine
ws_sm = WhitespaceStateMachine()

# This will match: "" (empty string, since min_whitespace=0)
# This will match: " " (single space)
# This will match: "\t\n  " (mixed whitespace)
# This will not match: "a" (non-whitespace character)

Example: Required whitespace

from pse.types.whitespace import WhitespaceStateMachine

# Define a whitespace state machine requiring at least one character
required_ws_sm = WhitespaceStateMachine(min_whitespace=1)

# This will not match: "" (empty string)
# This will match: " " (single space)
# This will match: "\t\n  " (mixed whitespace)

Example: Integration with other state machines

from pse.types.base.chain import ChainStateMachine
from pse.types.base.phrase import PhraseStateMachine
from pse.types.whitespace import WhitespaceStateMachine

# Create a state machine that parses "key = value" format
assignment_sm = ChainStateMachine([
    PhraseStateMachine("key"),         # Match "key"
    WhitespaceStateMachine(),          # Optional whitespace
    PhraseStateMachine("="),           # Match "="
    WhitespaceStateMachine(),          # Optional whitespace
    PhraseStateMachine("value")        # Match "value"
])

# This will match: "key=value"
# This will match: "key = value"
# This will match: "key  =  value"

Common usage in PSE: - Handles optional whitespace between tokens in structured formats - Creates flexible parsers that handle various formatting styles - Enables human-readable output with proper spacing - Used extensively in JSON, XML, and other format-specific state machines