Schema System
The schema system in Airtrain provides a robust foundation for defining data structures, validating inputs/outputs, and ensuring consistency across tasks and AI employees. Each schema must be published and registered in the system before it can be used in any task or AI employee implementation.
Schema Definition
Schemas in Airtrain can be defined using two methods:
Using Pydantic Models
Pydantic provides a Python-native way to define schemas with type hints and validation:
from pydantic import BaseModel
from typing import List, Optional
class UserProfile(BaseModel):
user_id: str
name: str
age: int
email: Optional[str]
interests: List[str]
Using JSON Schema
For direct schema definition, you can use JSON Schema format:
{
"type": "object",
"properties": {
"user_id": { "type": "string" },
"name": { "type": "string" },
"age": { "type": "integer" },
"email": { "type": "string", "format": "email" },
"interests": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["user_id", "name", "age"]
}
Why Use Schemas Instead of Direct Function Arguments
While many LLM providers like OpenAI and Anthropic offer SDKs that support direct arguments and keyword arguments for function calls, Airtrain deliberately uses Pydantic schemas for several critical reasons:
LLM-to-LLM Communication
When one LLM needs to use the output from another LLM, having fixed input and output schemas with clear validations makes this interaction much more reliable:
- Structured Output Parsing: Fixed schemas allow one LLM to easily generate valid input for the next LLM in a chain.
- Validation Guarantees: Schema validation ensures that outputs meet the expected structure before being passed to downstream processes.
- Constrained Generation: LLM providers like Anthropic, OpenAI, and Fireworks support structured output generation through constrained sampling when given explicit schemas.
Scalability for Multiple Agents
The goal of Airtrain is to enable hundreds or even millions of LLM agents running in parallel:
- Consistent Interfaces: When output schemas are clearly defined, it becomes much easier to manage and optimize communication between many agents.
- Predictable Processing: Fixed schemas allow for predictable processing pipelines across large-scale agent deployments.
- Optimization Opportunities: Known output structures enable advanced optimizations like:
- Lookahead Techniques: Pre-computing likely responses based on schema knowledge
- Speculative Decoding: Accelerating generation by predicting outputs based on schema constraints
- Parallel Processing: Running multiple agents efficiently with known data exchange formats
Instead of arbitrary, free-form outputs that are difficult to manage at scale, schemas provide a structured approach that makes large-scale LLM agent systems practical and efficient.
Schema Publishing
Before a schema can be used in any task or AI employee, it must be published to the Airtrain system. This process:
- Validates the schema structure
- Generates a unique UUID for the schema
- Makes the schema available for reference
Publishing Process
- Submit schema for validation
- Receive unique schema ID
- Schema becomes available in the system registry
Schema UUID
Each published schema receives a unique identifier that:
- Is immutable
- Serves as a permanent reference
- Can be used to fetch the complete schema
- Links to all versions of the schema
Schema Management UI
Airtrain provides a user interface for managing schemas:
UI Features
- Visual schema editor
- Version history viewer
- Example data generator
- Schema validation tester
- Documentation generator
UI Operations
- Create new schemas
- Edit existing schemas
- View version differences
- Test schema validation
- Generate example data
- Manage access permissions
Schema Versioning
Airtrain uses semantic versioning (MAJOR.MINOR.PATCH) for schema version control. Each component of the version number indicates the type of changes made.
Version Number Components
MAJOR version (x.0.0)
- Breaking changes to the schema
- Examples:
- Adding/removing required fields
- Changing field types
- Renaming attributes
- Restructuring schema hierarchy
MINOR version (1.x.0)
- Backwards-compatible functionality changes
- Examples:
- Modifying field constraints
- Updating validation rules
- Adding optional fields
- Changing field descriptions
PATCH version (1.0.x)
- Backwards-compatible documentation updates
- Examples:
- Updating example configurations
- Adding more examples
- Improving documentation
- Fixing typos
Automatic Version Updates
The UI automatically handles version updates based on the type of change:
-
Example Updates
- Only updates PATCH version
- Example: 1.0.1 → 1.0.2
- No impact on schema behavior
-
Constraint Changes
- Updates MINOR version
- Example: 1.0.2 → 1.1.0
- Backwards-compatible changes
-
Structure Changes
- Updates MAJOR version
- Example: 1.1.0 → 2.0.0
- Potentially breaking changes
Version History
- All versions are preserved
- Previous versions remain accessible
- Change logs are automatically generated
- Version differences can be visualized
Schema Usage
Published schemas serve multiple purposes in the Airtrain ecosystem:
Documentation
- Provides clear interface definitions
- Shows expected data structures
- Documents validation rules
- Helps users understand task requirements
Runtime Optimization
- Enables efficient data validation
- Allows performance optimizations
- Supports type checking
- Facilitates error handling
Integration Support
- Ensures consistent data formats
- Enables automatic SDK generation
- Supports multiple programming languages
- Facilitates API documentation
Best Practices
When working with schemas:
-
Version Management
- Use semantic versioning
- Maintain backward compatibility when possible
- Document breaking changes
- Test against dependent tasks
-
Schema Design
- Keep schemas focused and specific
- Use clear, descriptive names
- Include field descriptions
- Define proper constraints
-
Publishing
- Test schemas before publishing
- Include comprehensive examples
- Document any special requirements
- Specify dependencies if any
Next Steps
Learn how to create and publish your first schema, or explore the schema registry to see examples of well-designed schemas in action.