Defining Structure: Schema Sources
The Proxy Structuring Engine (PSE) is designed to be flexible, allowing you to define the desired output structure using several common and convenient methods in Python. The StructuringEngine.configure()
method accepts various types as its structure
argument.
Supported Schema Sources
-
Pydantic Models:
- How: Pass a Pydantic
BaseModel
class directly. - Mechanism: PSE introspects the model's fields, types, validators, default values, and descriptions to automatically generate the corresponding JSON Schema, which is then converted into a
StateMachine
. It respects nested models, standard Python types (str
,int
,list
,dict
, etc.),Union
,Optional
,Literal
, and more. Field descriptions from the model or its docstring are used. - Use Case: Ideal when you already use Pydantic for data validation or prefer a Python-native way to define complex, typed structures. This is often the most convenient method for defining nested JSON objects.
- Example:
from pydantic import BaseModel class User(BaseModel): name: str id: int engine.configure(User)
- How: Pass a Pydantic
-
JSON Schema (Dictionary):
- How: Pass a Python dictionary representing a valid JSON Schema object.
- Mechanism: PSE directly interprets the standard JSON Schema keywords (
type
,properties
,items
,required
,enum
,pattern
,minLength
,format
,anyOf
,$ref
, etc.) and converts them into the equivalentStateMachine
structure. It supports local$defs
and$ref
for defining and reusing schema components. - Use Case: Useful when working with existing JSON Schemas, integrating with systems that produce/consume JSON Schema, or when needing schema features not directly representable in Pydantic (though PSE's Pydantic conversion is quite comprehensive).
- Example:
user_schema = { "type": "object", "properties": { "name": {"type": "string"}, "id": {"type": "integer"} }, "required": ["name", "id"] } engine.configure(user_schema)
-
Function Signatures (Callable):
- How: Pass a Python function or callable.
- Mechanism: PSE introspects the function's signature (parameter names, type hints, default values) and parses its docstring (using
docstring-parser
) to generate a JSON Schema representing the function's arguments. This schema is then converted to aStateMachine
. It aims to produce a structure suitable for generating function call arguments. - Use Case: Convenient for generating structured arguments intended for calling a specific Python function, leveraging existing type hints and docstrings.
- Example:
def get_weather(city: str, unit: str = "celsius"): """Fetches weather for a city.""" # ... function body ... engine.configure(get_weather) # PSE will expect JSON like: {"city": "...", "unit": "..."}
-
Sequence of Schemas (
list
ortuple
):- How: Pass a list or tuple where each element is one of the above supported types (Pydantic model, JSON Schema dict, Callable).
- Mechanism: PSE interprets this as an "anyOf" or "oneOf" condition. It generates the schema for each item in the sequence and combines them using an
AnyStateMachine
. The LLM output must conform to at least one of the provided schemas. - Use Case: When the LLM's output could validly take one of several different structures depending on the context or input.
- Example:
from pydantic import BaseModel class User(BaseModel): # ... class Product(BaseModel): # ... engine.configure([User, Product]) # PSE expects output matching either User OR Product schema
-
Direct
StateMachine
Instance:- How: Pass an instance of a
StateMachine
class (either a base type likeChainStateMachine
or a custom subclass). - Mechanism: PSE uses the provided
StateMachine
directly without any conversion. - Use Case: For advanced users who need fine-grained control over the grammar or want to implement complex, non-standard structures or control flows by composing base
StateMachine
types. - Example:
from pse.types.base import ChainStateMachine, PhraseStateMachine custom_sm = ChainStateMachine([PhraseStateMachine("START"), PhraseStateMachine("END")]) engine.configure(custom_sm)
- How: Pass an instance of a
By supporting these diverse sources, PSE allows developers to choose the most natural and efficient way to define their desired output structure based on their existing code, data models, or specific grammatical requirements.