Configuration and Customization
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Configuration and Customization
Section titled “Configuration and Customization”The GenAIIDP solution provides multiple configuration approaches to customize document processing behavior to suit your specific needs.
📝 Note: Starting with version 0.3.21, document class definitions use JSON Schema format instead of the legacy custom format. See json-schema-migration.md for migration details and format comparison. Legacy configurations are automatically migrated on first use.
Pattern Configuration via Web UI
Section titled “Pattern Configuration via Web UI”The web interface allows real-time configuration updates without stack redeployment:
- Document Classes: Define and modify document categories and their descriptions (using JSON Schema format). Choose from 35+ pre-built standard classes (Invoice, Receipt, W-2, Bank Statement, etc.) or create custom classes from scratch.
- Extraction Attributes: Configure fields to extract for each document class (defined as JSON Schema properties)
- Few Shot Examples: Upload and configure example documents to improve accuracy (supported in Pattern 2)
- Model Selection: Choose between available Bedrock models for classification and extraction
- Prompt Engineering: Customize system and task prompts for optimal results
- OCR Features: Configure Textract features (TABLES, FORMS, SIGNATURES, LAYOUT) for enhanced data capture
- Evaluation Methods: Set evaluation methods and thresholds for each attribute
- Summarization: Configure model, prompts, parameters, and enable/disable document summarization via the
enabledproperty
Configuration Versions
Section titled “Configuration Versions”The solution supports multiple named configuration versions, enabling you to maintain independent configuration snapshots for A/B testing, environment separation, and iterative prompt tuning — all without redeploying the stack. Each version stores a complete, self-contained configuration. The active version determines which configuration is used for new document processing.
Key capabilities:
- Create, edit, and delete configuration versions with unique names and descriptions
- Activate any version to make it the default for new processing
- Compare versions side-by-side to see differences (exportable as CSV/JSON)
- Track which version was used for each processed document and test run
- Select a specific version when uploading documents, running tests, or using the CLI
Managed Configuration Versions
Section titled “Managed Configuration Versions”The stack automatically deploys managed configuration versions for each pre-deployed test set (fake-w2, docsplit, ocr-benchmark, realkie-fcc-verified). These are marked with managed: true and have the following behavior:
- Overwritten on stack updates — always reflect the latest defaults shipped with the solution
- Save disabled — the Save button is disabled and an info banner explains the config is stack-managed
- Delete disabled — managed versions cannot be deleted in the UI or via the API
- Editable copies — use “Save as Version” to create a custom, editable copy
- Not importable — managed configs are stored separately (
config_library/managed_config/) and do not appear in the configuration import browser - Test Studio integration — when a test set is selected, the matching managed config version is auto-selected
For comprehensive documentation, see configuration-versions.md.
Configuration Management Features
Section titled “Configuration Management Features”- Save Changes: Save your current configuration changes. The button is enabled only when you have unsaved changes (comparing your edits against the last saved configuration). After a successful save, a confirmation banner is displayed.
- Unsaved Changes Indicator: Individual fields with unsaved edits display an orange dot next to the field label, and an info banner with a “Discard changes” button appears when the configuration form has unsaved edits.
- Browser Navigation Guard: The browser warns before leaving the page when unsaved configuration changes exist (both on browser close/refresh and SPA navigation).
- Save as Default: Save your current version’s configuration as the new default baseline. This replaces the existing default configuration. Warning: Default configurations may be overwritten during solution upgrades - export your configuration first for backup.
- Restore Default (All): Reset the current version’s configuration back to the default values, replacing all customizations.
- Refresh: Reload the configuration from the server. Use this to sync your view with the latest saved configuration, discard unsaved local changes, or verify your configuration after external updates.
- Export Configuration: Download your current configuration to local files in JSON or YAML format with customizable filenames. Use this to backup configurations before upgrades or share configurations between environments.
- Import Configuration: Upload configuration files from your local machine OR import from the Configuration Library:
- From Local File: Upload configuration files from your computer in JSON or YAML format with automatic format detection and validation
- From Configuration Library: Browse and import pre-configured document processing workflows from the solution’s built-in configuration library
- Pattern-Filtered: Only shows configurations compatible with your currently deployed pattern (Pattern 1, 2, or 3)
- Dual Format Support: Automatically detects and imports both
config.yamlandconfig.jsonformats - README Preview: View markdown-formatted documentation before importing to understand configuration purpose and features
- Format Indicators: Visual badges show file format (YAML/JSON) and README availability
- Library Contents: Includes sample configurations like lending-package-sample, bank-statement-sample, rvl-cdip, criteria-validation, and more
- Important: Importing a configuration replaces your existing custom configuration entirely. Any prior customizations not included in the imported file will be reset to defaults. Export your current configuration first if you want to preserve it.
Configuration changes are validated and applied immediately, with rollback capability if issues arise. See web-ui.md for details on using the administration interface.
Configuration Management via CLI
Section titled “Configuration Management via CLI”The IDP CLI provides command-line tools for configuration management:
idp-cli config-create: Generate configuration templates from system defaultsidp-cli config-validate: Validate configuration files against schemasidp-cli config-download: Download configuration from deployed stacksidp-cli config-upload: Upload configuration to deployed stacks
See idp-cli.md for complete command documentation.
Custom Configuration Path
Section titled “Custom Configuration Path”The solution now supports specifying a custom configuration file location via the CustomConfigPath CloudFormation parameter. This allows you to use your own configuration files stored in S3 instead of the default configuration library.
When deploying the stack, you can specify a custom configuration file:
CustomConfigPath: "s3://my-bucket/custom-config/config.yaml"Key Features:
- Override Default Configuration: When specified, your custom configuration completely replaces the default pattern configuration
- S3 URI Format: Accepts standard S3 URI format (e.g.,
s3://my-bucket/custom-config/config.yaml) - Least-Privilege Security: IAM permissions are conditionally granted only to the specific S3 bucket and object you specify
- All Patterns Supported: Works with Pattern 1 (BDA), Pattern 2 (Textract + Bedrock), and Pattern 3 (Textract + UDOP + Bedrock)
Security Benefits:
- Eliminates wildcard S3 permissions (
arn:aws:s3:::*/*) - Conditional IAM access only when CustomConfigPath is specified
- Proper S3 URI to ARN conversion for least-privilege compliance
- Passes security scans with minimal required permissions
Configuration File Requirements:
- Must be valid YAML format
- Only needs to include
notes,classes, and any settings that differ from system defaults (see “System Defaults and Configuration Inheritance” below) - Follow the same structure as the configuration files in the
config_librarydirectory
Leave the CustomConfigPath parameter empty (default) to use the standard configuration library included with the solution.
System Defaults and Configuration Inheritance
Section titled “System Defaults and Configuration Inheritance”The GenAI IDP Accelerator uses a system defaults architecture where configurations inherit from pattern-specific default files. This means user configurations only need to specify differences from the defaults, making them simpler and more maintainable.
How It Works
Section titled “How It Works”-
System defaults are loaded first from
lib/idp_common_pkg/idp_common/config/system_defaults/:pattern-1.yaml- BDA mode defaults (used whenuse_bda: true)pattern-2.yaml- Pipeline mode defaults (used whenuse_bda: false)
-
User configurations are merged on top, overriding only the specified values
-
Result: A complete configuration with user customizations applied to system defaults
Minimal Configuration Example
Section titled “Minimal Configuration Example”A user configuration only needs:
notes: "My document processing configuration"
classes: - $schema: https://json-schema.org/draft/2020-12/schema $id: Invoice type: object x-aws-idp-document-type: Invoice description: "A billing document" properties: invoice_number: type: string description: "Unique invoice identifier"All other settings (OCR, classification, extraction, assessment, evaluation, summarization, discovery, agents) are inherited from the pattern’s system defaults.
Override Example
Section titled “Override Example”To override specific settings while keeping others at defaults:
notes: "Configuration with custom classification method"
# Override just the classification methodclassification: classificationMethod: textbasedHolisticClassification
# Override assessment to enable granular modeassessment: granular: enabled: true
classes: # ... your document classesBenefits
Section titled “Benefits”- Simpler configs - Only specify what makes your use case unique
- Maintainable - System default updates automatically apply to all configs
- Focused - Easy to see what customizations are active
- Version-safe - Defaults evolve with the solution while custom overrides remain stable
Configuration Library
Section titled “Configuration Library”The config_library/ directory contains example configurations demonstrating this inheritance pattern. Each config contains:
notes:- Description of the configurationclasses:- Document class definitions (JSON Schema format)- Overrides - Only settings that differ from system defaults
See the config_library README for available configurations and usage examples.
Summarization Configuration
Section titled “Summarization Configuration”Enable/Disable Summarization
Section titled “Enable/Disable Summarization”Summarization can be controlled via the configuration file rather than CloudFormation stack parameters. This provides more flexibility and eliminates the need for stack redeployment when changing summarization behavior.
Configuration-based Control (Recommended):
summarization: enabled: true # Set to false to disable summarization model: us.anthropic.claude-3-7-sonnet-20250219-v1:0 temperature: 0.0 # ... other summarization settingsKey Benefits:
- Runtime Control: Enable/disable without stack redeployment
- Cost Optimization: Zero LLM costs when disabled (
enabled: false) - Simplified Architecture: No conditional logic in state machines
- Backward Compatible: Defaults to
enabled: truewhen property is missing
Behavior When Disabled:
- Summarization lambda is still called (minimal overhead)
- Service immediately returns with logging: “Summarization is disabled in configuration”
- No LLM API calls or S3 operations are performed
- Document processing continues to completion
Note: Prior to v0.4.0, this feature was controlled by the IsSummarizationEnabled CloudFormation parameter. The configuration-based approach provides runtime control without requiring stack redeployment.
Assessment Configuration
Section titled “Assessment Configuration”Enable/Disable Assessment
Section titled “Enable/Disable Assessment”Similar to summarization, assessment can now be controlled via the configuration file rather than CloudFormation stack parameters. This provides more flexibility and eliminates the need for stack redeployment when changing assessment behavior.
Configuration-based Control (Recommended):
assessment: enabled: true # Set to false to disable assessment model: us.amazon.nova-lite-v1:0 temperature: 0.0 # ... other assessment settingsKey Benefits:
- Runtime Control: Enable/disable without stack redeployment
- Cost Optimization: Zero LLM costs when disabled (
enabled: false) - Simplified Architecture: No conditional logic in state machines
- Backward Compatible: Defaults to
enabled: truewhen property is missing
Behavior When Disabled:
- Assessment lambda is still called (minimal overhead)
- Service immediately returns with logging: “Assessment is disabled via configuration”
- No LLM API calls or S3 operations are performed
- Document processing continues to completion
Note: Prior to v0.4.0, this feature was controlled by the IsAssessmentEnabled CloudFormation parameter. The configuration-based approach provides runtime control without requiring stack redeployment.
Advanced Assessment Configuration
Section titled “Advanced Assessment Configuration”For complex documents with many attributes, enable granular assessment for improved accuracy and performance:
assessment: enabled: true model: us.amazon.nova-lite-v1:0 granular_mode: true # Enable granular assessment simple_batch_size: 5 # Group simple attributes (3-5 recommended) list_batch_size: 1 # Process list items individually for accuracy max_workers: 10 # Parallel processing threadsBenefits:
- Better accuracy through focused prompts
- Cost optimization via prompt caching
- Reduced latency through parallel processing
- Scalability for documents with 100+ attributes
Ideal For:
- Bank statements with hundreds of transactions
- Documents with 10+ attributes
- Complex nested structures
- Performance-critical scenarios
For detailed information, see assessment.md.
Stack Parameters
Section titled “Stack Parameters”Key parameters that can be configured during CloudFormation deployment:
General Parameters
Section titled “General Parameters”AdminEmail: Administrator email for web UI accessAllowedSignUpEmailDomain: Optional domain(s) allowed for web UI user signupMaxConcurrentWorkflows: Control concurrent document processing (default: 100)DataRetentionInDays: Set retention period for documents and tracking records (default: 365 days)ErrorThreshold: Number of workflow errors that trigger alerts (default: 1)ExecutionTimeThresholdMs: Maximum acceptable execution time before alerting (default: 30000 ms)LogLevel: Set logging level (DEBUG, INFO, WARN, ERROR)WAFAllowedIPv4Ranges: IP restrictions for web UI access (default: allow all)CloudFrontPriceClass: Set CloudFront price class for UI distribution (CloudFront hosting only)CloudFrontAllowedGeos: Optional geographic restrictions for UI access (CloudFront hosting only)WebUIHosting: Select hosting mode —CloudFront(default) orALBfor VPC-based hosting (see ALB Hosting)CustomConfigPath: Optional S3 URI to a custom configuration file that overrides pattern presets. Leave blank to use selected pattern configuration. Example: s3://my-bucket/custom-config/config.yaml
Integration and Tracing Parameters
Section titled “Integration and Tracing Parameters”EnableXRayTracing: Enable X-Ray tracing for Lambda functions and Step Functions (default: true). Provides distributed tracing capabilities for debugging and performance analysis.EnableMCP: Enable Model Context Protocol (MCP) integration for external application access via AWS Bedrock AgentCore Gateway (default: true). See mcp-server.md for details.EnableECRImageScanning: Enable automatic vulnerability scanning for Lambda container images in ECR for Patterns 1-3 (default: false). Recommended for production deployments but may impact deployment reliability. See troubleshooting.md for guidance.
Pattern Selection
Section titled “Pattern Selection”IDPPattern: Select processing pattern:- Unified: Supports both BDA and Pipeline processing modes via
use_bdaflag
- Unified: Supports both BDA and Pipeline processing modes via
Pattern-Specific Parameters
Section titled “Pattern-Specific Parameters”- Configuration Preset:
ConfigurationPreset— Select from available presets (lending-package-sample, bank-statement-sample, etc.) - Custom Model ARNs: Optional custom fine-tuned classification/extraction model ARNs
Note: The processing mode (BDA vs Pipeline) is controlled by the
use_bdaflag in the configuration, not by deployment parameters. See the architecture docs for details.
- Pattern 3 (Textract + UDOP + Bedrock)
Optional Features
Section titled “Optional Features”EvaluationBaselineBucketName: Optional existing bucket for ground truth dataDocumentKnowledgeBase: Enable document knowledge base functionalityKnowledgeBaseModelId: Bedrock model for knowledge base queriesPostProcessingLambdaHookFunctionArn: Optional Lambda ARN for custom post-processing (see post-processing-lambda-hook.md for detailed implementation guidance)BedrockGuardrailId: Optional Bedrock Guardrail ID to applyBedrockGuardrailVersion: Version of Bedrock Guardrail to use
For details on processing modes, see architecture.md. For legacy pattern-specific references, see pattern-1.md (BDA) and pattern-2.md (Pipeline).
High Volume Processing
Section titled “High Volume Processing”Request Service Quota Limits
Section titled “Request Service Quota Limits”For high-volume document processing, consider requesting increases for these service quotas:
- Lambda Concurrent Executions: Default 1,000 per region
- Step Functions Executions: Default 25,000 per second (Standard workflow)
- Bedrock Model Invocations: Varies by model and region
- Claude models: Typically 5-20 requests per minute by default
- Titan models: 15-30 requests per minute by default
- SQS Message Rate: Default 300 per second for FIFO queues
- TextractLimitPage API: 15 transactions per second by default
- DynamoDB Read/Write Capacity: Uses on-demand capacity by default
Use the AWS Service Quotas console to request increases before deploying for production workloads. See monitoring.md for details on monitoring your resource usage and quotas.
Cost Estimation
Section titled “Cost Estimation”The solution provides built-in cost estimation capabilities:
- Real-time cost tracking for Bedrock model usage
- Per-document processing cost breakdown
- Historical cost analysis and trends
- Budget alerts and threshold monitoring
See COST_CALCULATOR.md for detailed cost analysis across different processing volumes.
Bedrock Guardrail Integration
Section titled “Bedrock Guardrail Integration”The solution supports Amazon Bedrock Guardrails for content safety and compliance across all patterns:
How Guardrails Work
Section titled “How Guardrails Work”Guardrails provide:
- Content Filtering: Block harmful, inappropriate, or sensitive content
- Topic Restrictions: Prevent processing of specific topic areas
- Data Protection: Redact or block personally identifiable information (PII)
- Custom Filters: Define organization-specific content policies
Configuring Guardrails
Section titled “Configuring Guardrails”Guardrails are configured with two CloudFormation parameters:
BedrockGuardrailId: The ID (not name) of an existing Bedrock GuardrailBedrockGuardrailVersion: The version of the guardrail to use (e.g., “DRAFT” or “1”)
This applies guardrails to all Bedrock model interactions, including:
- Document extraction (all patterns)
- Document summarization (all patterns)
- Document classification (Pattern 2 only)
- Knowledge base queries (if enabled)
Best Practices
Section titled “Best Practices”- Test Thoroughly: Validate guardrail behavior with representative documents
- Monitor Impact: Track processing latency and accuracy changes
- Regular Updates: Review and update guardrail policies as requirements evolve
- Compliance Alignment: Ensure guardrails align with organizational compliance requirements
For more information on creating and managing Guardrails, see the Amazon Bedrock documentation.
Concurrency and Throttling Management
Section titled “Concurrency and Throttling Management”The solution implements sophisticated concurrency control and throttling management:
Throttling and Retry (Bedrock, Textract, SageMaker)
Section titled “Throttling and Retry (Bedrock, Textract, SageMaker)”- Exponential Backoff: Automatic retry with increasing delays
- Jitter Addition: Random delay variation to prevent thundering herd
- Circuit Breaker: Temporary halt on repeated failures
- Rate Limiting: Configurable request rate controls
The solution tracks metrics for throttling events and successful retries, viewable in the CloudWatch dashboard.
Step Functions Retry Configuration
Section titled “Step Functions Retry Configuration”The Step Functions state machine includes comprehensive retry policies for API failures:
{ "Retry": [ { "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException"], "IntervalSeconds": 2, "MaxAttempts": 6, "BackoffRate": 2 }, { "ErrorEquals": ["States.TaskFailed"], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2 } ]}Concurrency Control
Section titled “Concurrency Control”- Workflow Limits: Maximum concurrent Step Function executions, controlled by
MaxConcurrentWorkflowsparameter - Lambda Concurrency: Per-function concurrent execution limits
- Queue Management: SQS visibility timeout (30 seconds) and message batching
- Dynamic Scaling: Automatic adjustment based on queue depth and in-flight workflows
Document Status Tracking
Section titled “Document Status Tracking”The solution provides multiple ways to track document processing status:
Using the Web UI
Section titled “Using the Web UI”The web UI dashboard provides a real-time view of document processing status, including:
- Document status (queued, processing, completed, failed)
- Processing time
- Classification results
- Extraction results
- Error details (if applicable)
See web-ui.md for details on using the dashboard.
Using the Lookup Script
Section titled “Using the Lookup Script”Use the included script to check document processing status via CLI:
bash scripts/lookup_file_status.sh <DOCUMENT_KEY> <STACK_NAME>Response Format
Section titled “Response Format”Status lookup returns comprehensive information:
{ "document_key": "example.pdf", "status": "COMPLETED", "workflow_arn": "arn:aws:states:...", "start_time": "2024-01-01T12:00:00Z", "end_time": "2024-01-01T12:05:30Z", "processing_time_seconds": 330, "pages_processed": 15, "document_class": "BankStatement", "attributes_found": 12, "output_location": "s3://output-bucket/results/example.json", "error_details": null}Evaluation Extensions in JSON Schema
Section titled “Evaluation Extensions in JSON Schema”Document class schemas support evaluation-specific extensions for fine-grained control over accuracy assessment. These extensions work with the Stickler-based evaluation framework to provide flexible, business-aligned evaluation capabilities.
Available Extensions
Section titled “Available Extensions”x-aws-idp-evaluation-method: Comparison method (EXACT, FUZZY, NUMERIC_EXACT, SEMANTIC, LLM, HUNGARIAN)x-aws-idp-evaluation-threshold: Minimum score to consider a match (0.0-1.0)x-aws-idp-evaluation-weight: Field importance for weighted scoring (default: 1.0, higher values = more important)
Example Configuration
Section titled “Example Configuration”classes: - $schema: "https://json-schema.org/draft/2020-12/schema" x-aws-idp-document-type: "Invoice" x-aws-idp-evaluation-match-threshold: 0.8 # Document-level threshold properties: invoice_number: type: string x-aws-idp-evaluation-method: EXACT x-aws-idp-evaluation-weight: 2.0 # Critical field - double weight invoice_date: type: string x-aws-idp-evaluation-method: FUZZY x-aws-idp-evaluation-threshold: 0.9 x-aws-idp-evaluation-weight: 1.5 # Important field vendor_name: type: string x-aws-idp-evaluation-method: FUZZY x-aws-idp-evaluation-threshold: 0.85 x-aws-idp-evaluation-weight: 1.0 # Normal weight (default) vendor_notes: type: string x-aws-idp-evaluation-method: SEMANTIC x-aws-idp-evaluation-threshold: 0.7 x-aws-idp-evaluation-weight: 0.5 # Less critical - half weightStickler Backend Integration
Section titled “Stickler Backend Integration”The evaluation framework uses Stickler as its evaluation engine. The SticklerConfigMapper automatically translates these IDP extensions to Stickler’s native format, providing:
- Field-level weighting for business-critical attributes
- Optimal list matching using the Hungarian algorithm
- Extensible comparator system with exact, fuzzy, numeric, semantic, and LLM-based comparison
- Native JSON Schema support with $ref resolution
Benefits
Section titled “Benefits”- Business Alignment: Weight critical fields higher to ensure evaluation scores reflect business priorities
- Flexible Comparison: Choose the right evaluation method for each field type
- Tunable Thresholds: Set field-specific thresholds for matching sensitivity
- Dynamic Schema Generation: Auto-generates evaluation schema from baseline data when configuration is missing (for development/prototyping)
For detailed evaluation capabilities and best practices, see evaluation.md.
Section Splitting Strategies
Section titled “Section Splitting Strategies”Pattern-2 and Pattern-3 support configurable strategies for how classified pages are grouped into document sections. This is controlled by the sectionSplitting configuration field:
Available Strategies
Section titled “Available Strategies”-
disabled: Treats the entire document as a single section with the first detected class. Simplest approach for single-document processing. -
page: Creates one section per page, preventing automatic joining of same-type documents. Useful for deterministic processing of documents containing multiple forms of the same type (e.g., multiple W-2s, multiple invoices in one packet). -
llm_determined(default): Uses LLM boundary detection with “Start”/“Continue” indicators to intelligently segment multi-document packets. Best for complex scenarios where document boundaries are not obvious.
Configuration Example
Section titled “Configuration Example”classification: sectionSplitting: page # or "disabled", "llm_determined"Use Cases
Section titled “Use Cases”- Single Document Processing: Use
disabledfor simplicity - Multiple Same-Type Forms: Use
pagefor deterministic splitting (resolves Issue #146) - Complex Multi-Document Packets: Use
llm_determinedfor intelligent boundary detection
For more details on classification methods and section splitting, see classification.md.
Page Limit Configuration
Section titled “Page Limit Configuration”Control how many pages are used during document classification to optimize performance and costs:
classification: maxPagesForClassification: "ALL" # or "1", "2", "3", etc.Behavior:
- “ALL” (default): Uses all pages for classification
- Numeric value: Classifies only the first N pages, then applies that classification to the entire document
Important: When using a numeric limit, the classification result from the first N pages is applied to ALL pages, effectively forcing a single class/section for the entire document.
Use Cases:
- Performance optimization for large documents
- Cost reduction for documents with consistent patterns
- Simplified processing for homogeneous document types
Prompt Preview
Section titled “Prompt Preview”The Configuration page includes a Prompt Preview tab that lets you see the actual prompts sent to the LLM for each processing step (Classification, Extraction, Assessment, Summarization) with your configuration values filled in. This is useful for optimizing document class schemas and prompt templates — you can see exactly how your class names, descriptions, and JSON Schema attributes appear in the prompt that the LLM receives. See web-ui.md for details.
Prompt Optimization
Section titled “Prompt Optimization”Bedrock Prompt Caching
Section titled “Bedrock Prompt Caching”The solution supports Bedrock prompt caching to reduce costs and improve performance by caching static portions of prompts. This feature is available across all patterns for classification, extraction, assessment, and summarization.
How It Works
Section titled “How It Works”Insert a <<CACHEPOINT>> delimiter in your prompt to separate static (cacheable) content from dynamic content:
extraction: task_prompt: | You are an expert document analyst. Follow these rules: - Extract exact values from the document - Preserve formatting as it appears
<<CACHEPOINT>>
Document to process: {DOCUMENT_TEXT}Everything before the <<CACHEPOINT>> delimiter is cached and reused across similar requests, while content after it remains dynamic. This can significantly reduce token costs and improve response times.
Best Practices
Section titled “Best Practices”- Place Static Content First: Instructions, rules, schemas, and examples should come before the cachepoint
- Dynamic Content Last: Document text, images, and variable data should come after the cachepoint
- Cache Hit Optimization: Keep static content consistent across requests for maximum cache utilization
Benefits
Section titled “Benefits”- Cost Savings: Cached tokens cost significantly less than regular input tokens
- Performance: Reduced processing time for cached content
- Token Efficiency: Particularly beneficial for long system prompts or few-shot examples
For pricing details on cached tokens, see cost-calculator.md.
Regex-Based Classification (Pattern-2)
Section titled “Regex-Based Classification (Pattern-2)”Pattern-2 supports optional regex patterns in document class definitions for performance optimization and deterministic classification when patterns are known.
Configuration
Section titled “Configuration”Add regex patterns to your class definitions:
classes: - name: W2 Tax Form description: IRS Form W-2 Wage and Tax Statement document_name_regex: "^w2_.*\\.pdf$" # Matches filenames starting with "w2_" document_page_content_regex: "Form W-2.*Wage and Tax Statement"
- name: Invoice description: Commercial invoice document_name_regex: "^invoice_\\d{6}\\.pdf$" # Matches invoice_123456.pdf document_page_content_regex: "^INVOICE\\s+#\\d+"Classification Logic
Section titled “Classification Logic”- Document Name Matching: If
document_name_regexmatches the document filename, all pages are classified as that type without LLM processing - Page Content Matching: During multimodal page-level classification, if
document_page_content_regexmatches page text, that page is classified without LLM processing - Fallback: If no regex matches, standard LLM classification is used
Benefits
Section titled “Benefits”- Performance: Significant speed improvements by bypassing LLM calls for known patterns
- Cost Savings: Reduced token consumption for documents matching regex patterns
- Deterministic: Consistent classification results for known document patterns
- Backward Compatible: Seamless fallback to LLM classification when patterns don’t match
Monitoring
Section titled “Monitoring”The system logs INFO-level messages when regex patterns match, providing visibility into optimization effectiveness.
For examples and demonstrations, see the step2_classification_with_regex.ipynb notebook.
OCR Backend Configuration (Pattern-2 and Pattern-3)
Section titled “OCR Backend Configuration (Pattern-2 and Pattern-3)”Patterns 2 and 3 support multiple OCR backend engines for flexible document processing:
Available Backends
Section titled “Available Backends”- Textract (default): AWS Textract with advanced feature support (TABLES, FORMS, SIGNATURES, LAYOUT)
- Bedrock: LLM-based OCR using Claude/Nova models with customizable prompts for better handling of complex documents
- None: Image-only processing without OCR (useful for pure visual analysis)
Configuration Example
Section titled “Configuration Example”ocr: backend: textract # or "bedrock", "none"
# For Bedrock backend: bedrock_model: us.anthropic.claude-3-5-sonnet-20241022-v2:0 system_prompt: "You are an OCR expert..." task_prompt: "Extract all text from this document..."Bedrock OCR Benefits
Section titled “Bedrock OCR Benefits”- Better handling of complex layouts and tables
- Customizable extraction logic through prompts
- Layout preservation capabilities
- Support for documents with challenging formatting
For more details on OCR configuration and feature selection, see the pattern-specific documentation.
Custom Prompt Lambda (Pattern-2 and Pattern-3)
Section titled “Custom Prompt Lambda (Pattern-2 and Pattern-3)”Patterns 2 and 3 support injection of custom business logic into the extraction process through a Lambda function.
Configuration
Section titled “Configuration”Add the Lambda ARN to your extraction configuration:
extraction: custom_prompt_lambda_arn: arn:aws:lambda:us-west-2:123456789012:function:GENAIIDP-MyCustomLogicLambda Interface
Section titled “Lambda Interface”Your Lambda receives:
- All template placeholders (DOCUMENT_TEXT, DOCUMENT_CLASS, ATTRIBUTE_NAMES_AND_DESCRIPTIONS, DOCUMENT_IMAGE)
- Complete document context
- Configuration parameters
The Lambda should return modified prompt content or additional context.
Use Cases
Section titled “Use Cases”- Document type-specific processing rules
- Integration with external systems for customer configurations
- Conditional processing based on document content
- Regulatory compliance and industry-specific requirements
Requirements
Section titled “Requirements”- Lambda function name must start with
GENAIIDP-prefix for IAM permissions - Function must handle JSON serialization for image URIs
- Implement comprehensive error handling (fail-fast behavior)
Demo Resources
Section titled “Demo Resources”See notebooks/examples/demo-lambda/ for:
- Interactive demonstration notebook (
step3_extraction_with_custom_lambda.ipynb) - SAM deployment template for example Lambda
- Complete documentation and examples
For more details, see extraction.md.
Review Agent Model (Agentic Extraction)
Section titled “Review Agent Model (Agentic Extraction)”For agentic extraction workflows, you can specify a separate model for reviewing extraction work:
extraction: model: us.amazon.nova-pro-v1:0 review_agent_model: us.anthropic.claude-3-7-sonnet-20250219-v1:0 # OptionalIf not specified, defaults to the main extraction model. This allows using a more powerful model for validation while using a cost-effective model for initial extraction.
Benefits:
- Cost optimization by using different models for different tasks
- Enhanced accuracy with specialized review model
- Flexibility in model selection for extraction vs. validation
Use Cases:
- Use Nova Pro for extraction, Claude Sonnet for review
- Balance between cost and accuracy requirements
- Experimentation with different model combinations
Cost Tracking and Optimization
Section titled “Cost Tracking and Optimization”The solution includes built-in cost tracking capabilities:
- Per-document cost metrics: Track token usage and API calls per document
- Real-time dashboards: Monitor costs in the CloudWatch dashboard
- Cost estimation: Configuration includes pricing estimates for each component
For detailed cost analysis and optimization strategies, see cost-calculator.md.
Image Processing Configuration
Section titled “Image Processing Configuration”The solution supports configurable image dimensions across all processing services (OCR, classification, extraction, and assessment) to optimize performance and accuracy for different document types.
New Default Behavior (Preserves Original Resolution)
Section titled “New Default Behavior (Preserves Original Resolution)”Important Change: As of the latest version, empty strings or unspecified image dimensions now preserve the original document resolution instead of resizing to default dimensions.
# Preserves original image resolution (recommended for high-accuracy processing)classification: image: target_width: "" # Empty string = no resizing target_height: "" # Empty string = no resizing
extraction: image: target_width: "" # Preserves original resolution target_height: "" # Preserves original resolution
assessment: image: target_width: "" # No resizing applied target_height: "" # No resizing appliedCustom Image Dimensions
Section titled “Custom Image Dimensions”You can still specify exact dimensions when needed for performance optimization:
# Custom dimensions for specific requirementsclassification: image: target_width: "1200" # Resize to 1200 pixels wide target_height: "1600" # Resize to 1600 pixels tall
# Performance-optimized dimensionsextraction: image: target_width: "800" # Smaller for faster processing target_height: "1000" # Maintains good qualityImage Resizing Features
Section titled “Image Resizing Features”- Aspect Ratio Preservation: Images are resized proportionally without distortion
- Smart Scaling: Only downsizes images when necessary (scale factor < 1.0)
- High-Quality Resampling: Better visual quality after resizing
- Original Format Preservation: Maintains PNG, JPEG, and other formats when possible
Configuration Benefits
Section titled “Configuration Benefits”- High-Resolution Processing: Empty strings preserve full document resolution for maximum OCR accuracy
- Service-Specific Tuning: Each service can use optimal image dimensions
- Runtime Configuration: No code changes needed to adjust image processing
- Backward Compatibility: Existing numeric values continue to work as before
- Memory Optimization: Configurable dimensions allow resource optimization
Best Practices
Section titled “Best Practices”- Use Empty Strings for High Accuracy: For critical documents requiring maximum OCR accuracy, use empty strings to preserve original resolution
- Specify Dimensions for Performance: For high-volume processing, consider smaller dimensions to improve speed
- Test Different Settings: Evaluate the trade-off between accuracy and performance for your specific document types
- Monitor Resource Usage: Higher resolution images consume more memory and processing time
Migration from Previous Versions
Section titled “Migration from Previous Versions”Previous Behavior: Empty strings defaulted to 951x1268 pixel resizing New Behavior: Empty strings preserve original image resolution
If you were relying on the previous default resizing behavior, explicitly set dimensions:
# To maintain previous default behaviorclassification: image: target_width: "951" target_height: "1268"Additional Configuration Resources
Section titled “Additional Configuration Resources”The solution provides additional configuration options through:
- Configuration files in the
config_librarydirectory - Pattern-specific settings in each pattern’s subdirectory
- Environment variables for Lambda functions
- CloudWatch alarms and notification settings
See the README.md for a high-level overview of the solution architecture and components.