Skip to content

JSON Schema Migration Guide

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0

Starting with version 0.3.21, the GenAI IDP solution uses JSON Schema format for document class definitions instead of the legacy custom format. This provides:

  • Industry standard format with broad tooling support
  • Better validation using standard JSON Schema validators
  • Improved documentation through self-describing schemas
  • Backward compatibility - automatic migration of legacy configurations
classes:
- name: Payslip
description: An employee wage statement
attributes:
- name: YTDNetPay
description: Year-to-date net pay amount
attributeType: simple
evaluation_method: NUMERIC_EXACT
- name: CompanyAddress
description: Complete business address
attributeType: group
evaluation_method: LLM
groupAttributes:
- name: Street
description: Street address
- name: City
description: City name
- name: Deductions
description: List of deductions
attributeType: list
listItemTemplate:
itemDescription: A single deduction
itemAttributes:
- name: Type
description: Deduction type
- name: Amount
description: Deduction amount
classes:
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: Payslip
x-aws-idp-document-type: Payslip
type: object
description: An employee wage statement
properties:
YTDNetPay:
type: string
description: Year-to-date net pay amount
x-aws-idp-evaluation-method: NUMERIC_EXACT
CompanyAddress:
type: object
description: Complete business address
x-aws-idp-evaluation-method: LLM
properties:
Street:
type: string
description: Street address
City:
type: string
description: City name
Deductions:
type: array
description: List of deductions
x-aws-idp-list-item-description: A single deduction
items:
type: object
properties:
Type:
type: string
description: Deduction type
Amount:
type: string
description: Deduction amount
Legacy FieldJSON Schema FieldNotes
name$id and x-aws-idp-document-typeDocument class name
descriptiondescriptionSame field name
attributespropertiesList → Object
attributeType: simpletype: stringSimple values are strings
attributeType: grouptype: object with propertiesNested object
attributeType: listtype: array with itemsArray of items
groupAttributesproperties (nested)Object properties
listItemTemplateitemsArray item schema
itemAttributesitems.propertiesProperties of array items
itemDescriptionx-aws-idp-list-item-descriptionAWS IDP extension
evaluation_methodx-aws-idp-evaluation-methodAWS IDP extension
confidence_thresholdx-aws-idp-confidence-thresholdAWS IDP extension
prompt_overridex-aws-idp-prompt-overrideAWS IDP extension
Legacy TypeJSON Schema Type
attributeType: simpletype: string
attributeType: grouptype: object
attributeType: listtype: array

The solution automatically migrates legacy configurations to JSON Schema format:

  1. First read after upgrade - When configuration is loaded from DynamoDB
  2. Automatic persistence - Migrated format is saved back to DynamoDB
  3. One-time process - Subsequent reads use JSON Schema format directly
  • Non-destructive - Legacy data is preserved during migration
  • Idempotent - Won’t re-migrate already migrated data
  • Transparent - Happens automatically without user intervention
  • Logged - Migration activity logged to CloudWatch

Check Lambda logs to verify migration:

Terminal window
aws logs tail /aws/lambda/<STACK>-ConfigurationResolverFunction-<ID> \
--region <REGION> --follow

Look for:

Migrating 6 legacy classes to JSON Schema format
Successfully migrated classes to JSON Schema format

JSON Schema is extended with custom AWS IDP fields:

  • x-aws-idp-document-type - Marks a schema as a document type (value is the document class name)
  • x-aws-idp-evaluation-method - Evaluation method for attribute comparison
    • Valid values: EXACT, NUMERIC_EXACT, FUZZY, SEMANTIC, LLM
  • x-aws-idp-confidence-threshold - Confidence threshold (0.0 to 1.0)
  • x-aws-idp-prompt-override - Custom prompt for attribute extraction
  • x-aws-idp-list-item-description - Description for array items
  • x-aws-idp-original-name - Preserved original attribute name from legacy format
  • x-aws-idp-class-prompt - Classification prompt for example
  • x-aws-idp-attributes-prompt - Extraction prompt for example
  • x-aws-idp-image-path - Path to example image

The web UI provides two ways to create/edit document schemas:

  1. Schema Builder - Visual editor with drag-and-drop interface

    • Navigate to Configuration → Document Schema tab
    • Click “Schema Builder” view
    • Click “Add Class” to choose between:
      • Custom Class — define your own class with custom fields
      • Standard Class — import from 35+ pre-built document types (Invoice, Receipt, W-2, Bank Statement, Payslip, Driver License, Passport, tax forms, insurance cards, certificates, and more) derived from AWS BDA standard blueprints. Imported classes are fully editable.
    • Add/edit document types and properties visually
  2. JSON View - Direct JSON editing with validation

    • Navigate to Configuration → JSON View
    • Edit the classes array directly
    • Validation happens in real-time

When creating configurations manually, use JSON Schema format:

classes:
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: MyDocument
x-aws-idp-document-type: MyDocument
type: object
description: Document description here
properties:
FieldName:
type: string
description: Field description
x-aws-idp-evaluation-method: EXACT

Find JSON Schema templates in:

  • config_library/unified/ - Pattern 2 examples (Bedrock)

The solution supports these evaluation methods:

  • EXACT - Exact string match
  • NUMERIC_EXACT - Exact numeric match (handles different number formats)
  • FUZZY - Fuzzy string matching (Levenshtein distance)
  • SEMANTIC - Semantic similarity using embeddings
  • LLM - LLM-based evaluation for complex/contextual comparisons
    • Useful for address blocks, multi-field groups
    • Higher cost but more flexible
    • Requires evaluation configuration with LLM model

Good:

properties:
InvoiceDate:
type: string
description: Date when invoice was issued

Avoid:

properties:
Date: # Too generic
type: string
properties:
TotalAmount:
type: string # Store as string for exact extraction
description: Total invoice amount including taxes
x-aws-idp-evaluation-method: NUMERIC_EXACT
Email:
type: string
format: email # Standard JSON Schema format
description: Customer email address
Status:
type: string
enum: [PAID, PENDING, OVERDUE] # Constrain values
description: Payment status

For nested data like addresses:

properties:
ShippingAddress:
type: object
description: Complete shipping address
x-aws-idp-evaluation-method: LLM # Use LLM for complex structures
properties:
Street:
type: string
City:
type: string
State:
type: string
ZipCode:
type: string

For line items, deductions, etc.:

properties:
LineItems:
type: array
description: Invoice line items
x-aws-idp-list-item-description: A single line item
items:
type: object
properties:
Description:
type: string
Quantity:
type: string
UnitPrice:
type: string
Total:
type: string
x-aws-idp-evaluation-method: NUMERIC_EXACT

Symptoms:

  • UI displays attributes array instead of properties object
  • Configuration tab is blank

Solution:

  1. Refresh browser cache (hard refresh: Ctrl+Shift+R or Cmd+Shift+R)
  2. Check Lambda logs for migration errors
  3. Verify Lambda has latest code with migration support

Symptoms:

  • Error: “Invalid evaluation_method ‘LLM’”

Solution:

  • Ensure using version 0.3.21 or later
  • Check lib/idp_common_pkg/idp_common/config/schema_constants.py includes EVALUATION_METHOD_LLM

Symptoms:

  • Legacy format still in DynamoDB after upgrade
  • No migration logs in CloudWatch

Solution:

  1. Verify Lambda has requirements.txt with ./lib/idp_common_pkg
  2. Check Lambda includes ConfigurationManager code
  3. Trigger migration manually via UI configuration load