Schema System

The schema system in Airtrain provides a robust foundation for defining data structures, validating inputs/outputs, and ensuring consistency across tasks and AI employees. Each schema must be published and registered in the system before it can be used in any task or AI employee implementation.

Schema Definition

Schemas in Airtrain can be defined using two methods:

Using Pydantic Models

Pydantic provides a Python-native way to define schemas with type hints and validation:

from pydantic import BaseModel
from typing import List, Optional

class UserProfile(BaseModel):
    user_id: str
    name: str
    age: int
    email: Optional[str]
    interests: List[str]

Using JSON Schema

For direct schema definition, you can use JSON Schema format:

{
  "type": "object",
  "properties": {
    "user_id": { "type": "string" },
    "name": { "type": "string" },
    "age": { "type": "integer" },
    "email": { "type": "string", "format": "email" },
    "interests": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["user_id", "name", "age"]
}

Why Use Schemas Instead of Direct Function Arguments

While many LLM providers like OpenAI and Anthropic offer SDKs that support direct arguments and keyword arguments for function calls, Airtrain deliberately uses Pydantic schemas for several critical reasons:

LLM-to-LLM Communication

When one LLM needs to use the output from another LLM, having fixed input and output schemas with clear validations makes this interaction much more reliable:

Structured Output Parsing: Fixed schemas allow one LLM to easily generate valid input for the next LLM in a chain.
Validation Guarantees: Schema validation ensures that outputs meet the expected structure before being passed to downstream processes.
Constrained Generation: LLM providers like Anthropic, OpenAI, and Fireworks support structured output generation through constrained sampling when given explicit schemas.

Scalability for Multiple Agents

The goal of Airtrain is to enable hundreds or even millions of LLM agents running in parallel:

Consistent Interfaces: When output schemas are clearly defined, it becomes much easier to manage and optimize communication between many agents.
Predictable Processing: Fixed schemas allow for predictable processing pipelines across large-scale agent deployments.
Optimization Opportunities: Known output structures enable advanced optimizations like:
- Lookahead Techniques: Pre-computing likely responses based on schema knowledge
- Speculative Decoding: Accelerating generation by predicting outputs based on schema constraints
- Parallel Processing: Running multiple agents efficiently with known data exchange formats

Instead of arbitrary, free-form outputs that are difficult to manage at scale, schemas provide a structured approach that makes large-scale LLM agent systems practical and efficient.

Schema Publishing

Before a schema can be used in any task or AI employee, it must be published to the Airtrain system. This process:

Validates the schema structure
Generates a unique UUID for the schema
Makes the schema available for reference

Publishing Process

Submit schema for validation
Receive unique schema ID
Schema becomes available in the system registry

Schema UUID

Each published schema receives a unique identifier that:

Is immutable
Serves as a permanent reference
Can be used to fetch the complete schema
Links to all versions of the schema

Schema Management UI

Airtrain provides a user interface for managing schemas:

UI Features

Visual schema editor
Version history viewer
Example data generator
Schema validation tester
Documentation generator

UI Operations

Create new schemas
Edit existing schemas
View version differences
Test schema validation
Generate example data
Manage access permissions

Schema Versioning

Airtrain uses semantic versioning (MAJOR.MINOR.PATCH) for schema version control. Each component of the version number indicates the type of changes made.

Version Number Components

MAJOR version (x.0.0)

Breaking changes to the schema
Examples:
- Adding/removing required fields
- Changing field types
- Renaming attributes
- Restructuring schema hierarchy

MINOR version (1.x.0)

Backwards-compatible functionality changes
Examples:
- Modifying field constraints
- Updating validation rules
- Adding optional fields
- Changing field descriptions

PATCH version (1.0.x)

Backwards-compatible documentation updates
Examples:
- Updating example configurations
- Adding more examples
- Improving documentation
- Fixing typos

Automatic Version Updates

The UI automatically handles version updates based on the type of change:

Example Updates
- Only updates PATCH version
- Example: 1.0.1 → 1.0.2
- No impact on schema behavior
Constraint Changes
- Updates MINOR version
- Example: 1.0.2 → 1.1.0
- Backwards-compatible changes
Structure Changes
- Updates MAJOR version
- Example: 1.1.0 → 2.0.0
- Potentially breaking changes

Version History

All versions are preserved
Previous versions remain accessible
Change logs are automatically generated
Version differences can be visualized

Schema Usage

Published schemas serve multiple purposes in the Airtrain ecosystem:

Documentation

Provides clear interface definitions
Shows expected data structures
Documents validation rules
Helps users understand task requirements

Runtime Optimization

Enables efficient data validation
Allows performance optimizations
Supports type checking
Facilitates error handling

Integration Support

Ensures consistent data formats
Enables automatic SDK generation
Supports multiple programming languages
Facilitates API documentation

Best Practices

When working with schemas:

Version Management
- Use semantic versioning
- Maintain backward compatibility when possible
- Document breaking changes
- Test against dependent tasks
Schema Design
- Keep schemas focused and specific
- Use clear, descriptive names
- Include field descriptions
- Define proper constraints
Publishing
- Test schemas before publishing
- Include comprehensive examples
- Document any special requirements
- Specify dependencies if any

Next Steps

Learn how to create and publish your first schema, or explore the schema registry to see examples of well-designed schemas in action.

Schema Definition​

Using Pydantic Models​

Using JSON Schema​

Why Use Schemas Instead of Direct Function Arguments​

LLM-to-LLM Communication​

Scalability for Multiple Agents​

Schema Publishing​

Publishing Process​

Schema UUID​

Schema Management UI​

UI Features​

UI Operations​

Schema Versioning​

Version Number Components​

Automatic Version Updates​

Version History​

Schema Usage​

Documentation​

Runtime Optimization​

Integration Support​

Best Practices​

Next Steps​