Skip to main content

Schema System

The schema system in Airtrain provides a robust foundation for defining data structures, validating inputs/outputs, and ensuring consistency across tasks and AI employees. Each schema must be published and registered in the system before it can be used in any task or AI employee implementation.

Schema Definition

Schemas in Airtrain can be defined using two methods:

Using Pydantic Models

Pydantic provides a Python-native way to define schemas with type hints and validation:

from pydantic import BaseModel
from typing import List, Optional

class UserProfile(BaseModel):
user_id: str
name: str
age: int
email: Optional[str]
interests: List[str]

Using JSON Schema

For direct schema definition, you can use JSON Schema format:

{
"type": "object",
"properties": {
"user_id": { "type": "string" },
"name": { "type": "string" },
"age": { "type": "integer" },
"email": { "type": "string", "format": "email" },
"interests": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["user_id", "name", "age"]
}

Why Use Schemas Instead of Direct Function Arguments

While many LLM providers like OpenAI and Anthropic offer SDKs that support direct arguments and keyword arguments for function calls, Airtrain deliberately uses Pydantic schemas for several critical reasons:

LLM-to-LLM Communication

When one LLM needs to use the output from another LLM, having fixed input and output schemas with clear validations makes this interaction much more reliable:

  • Structured Output Parsing: Fixed schemas allow one LLM to easily generate valid input for the next LLM in a chain.
  • Validation Guarantees: Schema validation ensures that outputs meet the expected structure before being passed to downstream processes.
  • Constrained Generation: LLM providers like Anthropic, OpenAI, and Fireworks support structured output generation through constrained sampling when given explicit schemas.

Scalability for Multiple Agents

The goal of Airtrain is to enable hundreds or even millions of LLM agents running in parallel:

  • Consistent Interfaces: When output schemas are clearly defined, it becomes much easier to manage and optimize communication between many agents.
  • Predictable Processing: Fixed schemas allow for predictable processing pipelines across large-scale agent deployments.
  • Optimization Opportunities: Known output structures enable advanced optimizations like:
    • Lookahead Techniques: Pre-computing likely responses based on schema knowledge
    • Speculative Decoding: Accelerating generation by predicting outputs based on schema constraints
    • Parallel Processing: Running multiple agents efficiently with known data exchange formats

Instead of arbitrary, free-form outputs that are difficult to manage at scale, schemas provide a structured approach that makes large-scale LLM agent systems practical and efficient.

Schema Publishing

Before a schema can be used in any task or AI employee, it must be published to the Airtrain system. This process:

  1. Validates the schema structure
  2. Generates a unique UUID for the schema
  3. Makes the schema available for reference

Publishing Process

  • Submit schema for validation
  • Receive unique schema ID
  • Schema becomes available in the system registry

Schema UUID

Each published schema receives a unique identifier that:

  • Is immutable
  • Serves as a permanent reference
  • Can be used to fetch the complete schema
  • Links to all versions of the schema

Schema Management UI

Airtrain provides a user interface for managing schemas:

UI Features

  • Visual schema editor
  • Version history viewer
  • Example data generator
  • Schema validation tester
  • Documentation generator

UI Operations

  • Create new schemas
  • Edit existing schemas
  • View version differences
  • Test schema validation
  • Generate example data
  • Manage access permissions

Schema Versioning

Airtrain uses semantic versioning (MAJOR.MINOR.PATCH) for schema version control. Each component of the version number indicates the type of changes made.

Version Number Components

MAJOR version (x.0.0)

  • Breaking changes to the schema
  • Examples:
    • Adding/removing required fields
    • Changing field types
    • Renaming attributes
    • Restructuring schema hierarchy

MINOR version (1.x.0)

  • Backwards-compatible functionality changes
  • Examples:
    • Modifying field constraints
    • Updating validation rules
    • Adding optional fields
    • Changing field descriptions

PATCH version (1.0.x)

  • Backwards-compatible documentation updates
  • Examples:
    • Updating example configurations
    • Adding more examples
    • Improving documentation
    • Fixing typos

Automatic Version Updates

The UI automatically handles version updates based on the type of change:

  1. Example Updates

    • Only updates PATCH version
    • Example: 1.0.1 → 1.0.2
    • No impact on schema behavior
  2. Constraint Changes

    • Updates MINOR version
    • Example: 1.0.2 → 1.1.0
    • Backwards-compatible changes
  3. Structure Changes

    • Updates MAJOR version
    • Example: 1.1.0 → 2.0.0
    • Potentially breaking changes

Version History

  • All versions are preserved
  • Previous versions remain accessible
  • Change logs are automatically generated
  • Version differences can be visualized

Schema Usage

Published schemas serve multiple purposes in the Airtrain ecosystem:

Documentation

  • Provides clear interface definitions
  • Shows expected data structures
  • Documents validation rules
  • Helps users understand task requirements

Runtime Optimization

  • Enables efficient data validation
  • Allows performance optimizations
  • Supports type checking
  • Facilitates error handling

Integration Support

  • Ensures consistent data formats
  • Enables automatic SDK generation
  • Supports multiple programming languages
  • Facilitates API documentation

Best Practices

When working with schemas:

  1. Version Management

    • Use semantic versioning
    • Maintain backward compatibility when possible
    • Document breaking changes
    • Test against dependent tasks
  2. Schema Design

    • Keep schemas focused and specific
    • Use clear, descriptive names
    • Include field descriptions
    • Define proper constraints
  3. Publishing

    • Test schemas before publishing
    • Include comprehensive examples
    • Document any special requirements
    • Specify dependencies if any

Next Steps

Learn how to create and publish your first schema, or explore the schema registry to see examples of well-designed schemas in action.