JSON Schema Migration Guide
JSON Schema Migration Guide
Section titled “JSON Schema Migration Guide”Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Overview
Section titled “Overview”Starting with version 0.3.21, the GenAI IDP solution uses JSON Schema format for document class definitions instead of the legacy custom format. This provides:
- ✅ Industry standard format with broad tooling support
- ✅ Better validation using standard JSON Schema validators
- ✅ Improved documentation through self-describing schemas
- ✅ Backward compatibility - automatic migration of legacy configurations
Format Comparison
Section titled “Format Comparison”Legacy Format (Pre-0.3.21)
Section titled “Legacy Format (Pre-0.3.21)”classes: - name: Payslip description: An employee wage statement attributes: - name: YTDNetPay description: Year-to-date net pay amount attributeType: simple evaluation_method: NUMERIC_EXACT
- name: CompanyAddress description: Complete business address attributeType: group evaluation_method: LLM groupAttributes: - name: Street description: Street address - name: City description: City name
- name: Deductions description: List of deductions attributeType: list listItemTemplate: itemDescription: A single deduction itemAttributes: - name: Type description: Deduction type - name: Amount description: Deduction amountJSON Schema Format (0.3.21+)
Section titled “JSON Schema Format (0.3.21+)”classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Payslip x-aws-idp-document-type: Payslip type: object description: An employee wage statement properties: YTDNetPay: type: string description: Year-to-date net pay amount x-aws-idp-evaluation-method: NUMERIC_EXACT
CompanyAddress: type: object description: Complete business address x-aws-idp-evaluation-method: LLM properties: Street: type: string description: Street address City: type: string description: City name
Deductions: type: array description: List of deductions x-aws-idp-list-item-description: A single deduction items: type: object properties: Type: type: string description: Deduction type Amount: type: string description: Deduction amountMigration Mapping
Section titled “Migration Mapping”Field Name Mapping
Section titled “Field Name Mapping”| Legacy Field | JSON Schema Field | Notes |
|---|---|---|
name | $id and x-aws-idp-document-type | Document class name |
description | description | Same field name |
attributes | properties | List → Object |
attributeType: simple | type: string | Simple values are strings |
attributeType: group | type: object with properties | Nested object |
attributeType: list | type: array with items | Array of items |
groupAttributes | properties (nested) | Object properties |
listItemTemplate | items | Array item schema |
itemAttributes | items.properties | Properties of array items |
itemDescription | x-aws-idp-list-item-description | AWS IDP extension |
evaluation_method | x-aws-idp-evaluation-method | AWS IDP extension |
confidence_threshold | x-aws-idp-confidence-threshold | AWS IDP extension |
prompt_override | x-aws-idp-prompt-override | AWS IDP extension |
Type Mapping
Section titled “Type Mapping”| Legacy Type | JSON Schema Type |
|---|---|
attributeType: simple | type: string |
attributeType: group | type: object |
attributeType: list | type: array |
Automatic Migration
Section titled “Automatic Migration”The solution automatically migrates legacy configurations to JSON Schema format:
When Migration Happens
Section titled “When Migration Happens”- First read after upgrade - When configuration is loaded from DynamoDB
- Automatic persistence - Migrated format is saved back to DynamoDB
- One-time process - Subsequent reads use JSON Schema format directly
Migration Behavior
Section titled “Migration Behavior”- ✅ Non-destructive - Legacy data is preserved during migration
- ✅ Idempotent - Won’t re-migrate already migrated data
- ✅ Transparent - Happens automatically without user intervention
- ✅ Logged - Migration activity logged to CloudWatch
Migration Logs
Section titled “Migration Logs”Check Lambda logs to verify migration:
aws logs tail /aws/lambda/<STACK>-ConfigurationResolverFunction-<ID> \ --region <REGION> --followLook for:
Migrating 6 legacy classes to JSON Schema formatSuccessfully migrated classes to JSON Schema formatAWS IDP Extensions
Section titled “AWS IDP Extensions”JSON Schema is extended with custom AWS IDP fields:
Document-Level Extensions
Section titled “Document-Level Extensions”x-aws-idp-document-type- Marks a schema as a document type (value is the document class name)
Attribute-Level Extensions
Section titled “Attribute-Level Extensions”x-aws-idp-evaluation-method- Evaluation method for attribute comparison- Valid values:
EXACT,NUMERIC_EXACT,FUZZY,SEMANTIC,LLM
- Valid values:
x-aws-idp-confidence-threshold- Confidence threshold (0.0 to 1.0)x-aws-idp-prompt-override- Custom prompt for attribute extraction
List-Specific Extensions
Section titled “List-Specific Extensions”x-aws-idp-list-item-description- Description for array itemsx-aws-idp-original-name- Preserved original attribute name from legacy format
Few-Shot Example Extensions
Section titled “Few-Shot Example Extensions”x-aws-idp-class-prompt- Classification prompt for examplex-aws-idp-attributes-prompt- Extraction prompt for examplex-aws-idp-image-path- Path to example image
Creating New Configurations
Section titled “Creating New Configurations”Using the Web UI (Recommended)
Section titled “Using the Web UI (Recommended)”The web UI provides two ways to create/edit document schemas:
-
Schema Builder - Visual editor with drag-and-drop interface
- Navigate to Configuration → Document Schema tab
- Click “Schema Builder” view
- Click “Add Class” to choose between:
- Custom Class — define your own class with custom fields
- Standard Class — import from 35+ pre-built document types (Invoice, Receipt, W-2, Bank Statement, Payslip, Driver License, Passport, tax forms, insurance cards, certificates, and more) derived from AWS BDA standard blueprints. Imported classes are fully editable.
- Add/edit document types and properties visually
-
JSON View - Direct JSON editing with validation
- Navigate to Configuration → JSON View
- Edit the
classesarray directly - Validation happens in real-time
Manual YAML Configuration
Section titled “Manual YAML Configuration”When creating configurations manually, use JSON Schema format:
classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: MyDocument x-aws-idp-document-type: MyDocument type: object description: Document description here properties: FieldName: type: string description: Field description x-aws-idp-evaluation-method: EXACTConfiguration Templates
Section titled “Configuration Templates”Find JSON Schema templates in:
config_library/unified/- Pattern 2 examples (Bedrock)
Validation Methods
Section titled “Validation Methods”The solution supports these evaluation methods:
Standard Methods
Section titled “Standard Methods”EXACT- Exact string matchNUMERIC_EXACT- Exact numeric match (handles different number formats)FUZZY- Fuzzy string matching (Levenshtein distance)SEMANTIC- Semantic similarity using embeddings
Advanced Methods
Section titled “Advanced Methods”LLM- LLM-based evaluation for complex/contextual comparisons- Useful for address blocks, multi-field groups
- Higher cost but more flexible
- Requires evaluation configuration with LLM model
Best Practices
Section titled “Best Practices”1. Use Descriptive Field Names
Section titled “1. Use Descriptive Field Names”Good:
properties: InvoiceDate: type: string description: Date when invoice was issuedAvoid:
properties: Date: # Too generic type: string2. Leverage Standard JSON Schema Features
Section titled “2. Leverage Standard JSON Schema Features”properties: TotalAmount: type: string # Store as string for exact extraction description: Total invoice amount including taxes x-aws-idp-evaluation-method: NUMERIC_EXACT
Email: type: string format: email # Standard JSON Schema format description: Customer email address
Status: type: string enum: [PAID, PENDING, OVERDUE] # Constrain values description: Payment status3. Structure Complex Objects Properly
Section titled “3. Structure Complex Objects Properly”For nested data like addresses:
properties: ShippingAddress: type: object description: Complete shipping address x-aws-idp-evaluation-method: LLM # Use LLM for complex structures properties: Street: type: string City: type: string State: type: string ZipCode: type: string4. Use Arrays for Repeating Data
Section titled “4. Use Arrays for Repeating Data”For line items, deductions, etc.:
properties: LineItems: type: array description: Invoice line items x-aws-idp-list-item-description: A single line item items: type: object properties: Description: type: string Quantity: type: string UnitPrice: type: string Total: type: string x-aws-idp-evaluation-method: NUMERIC_EXACTTroubleshooting
Section titled “Troubleshooting”UI Shows Legacy Format
Section titled “UI Shows Legacy Format”Symptoms:
- UI displays
attributesarray instead ofpropertiesobject - Configuration tab is blank
Solution:
- Refresh browser cache (hard refresh: Ctrl+Shift+R or Cmd+Shift+R)
- Check Lambda logs for migration errors
- Verify Lambda has latest code with migration support
Validation Errors for LLM Method
Section titled “Validation Errors for LLM Method”Symptoms:
- Error: “Invalid evaluation_method ‘LLM’”
Solution:
- Ensure using version 0.3.21 or later
- Check
lib/idp_common_pkg/idp_common/config/schema_constants.pyincludesEVALUATION_METHOD_LLM
Migration Not Running
Section titled “Migration Not Running”Symptoms:
- Legacy format still in DynamoDB after upgrade
- No migration logs in CloudWatch
Solution:
- Verify Lambda has
requirements.txtwith./lib/idp_common_pkg - Check Lambda includes
ConfigurationManagercode - Trigger migration manually via UI configuration load