Pattern 2: Bedrock Classification and Extraction
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Pattern 2: Bedrock Classification and Extraction
Section titled “Pattern 2: Bedrock Classification and Extraction”⚠️ DEPRECATED: Pattern 2 has been superseded by the Unified Pattern, which combines both BDA and pipeline processing modes into a single deployment. The
use_bdaconfiguration flag (set via the UI) controls whether documents are processed via BDA or the step-by-step pipeline. See architecture.md for details on the unified architecture.This document is retained as a reference for pipeline-specific concepts (OCR, classification, extraction, assessment, rule validation) that still apply when
use_bda: falseis set in the unified pattern (the default mode).
This pattern implements an intelligent document processing workflow that uses Amazon Bedrock with Nova or Claude models for both page classification/grouping and information extraction.
Table of Contents
Section titled “Table of Contents”- Architecture Overview
- Bedrock OCR Feature
- Customizing Classification
- Few Shot Example Feature
- Customizing Extraction
- Assessment Feature
- Testing
- Best Practices
Architecture Overview
Section titled “Architecture Overview”The workflow consists of three main processing steps with an optional assessment step:
- OCR processing using Amazon Textract
- Document classification using Claude via Amazon Bedrock (with two available methods):
- Page-level classification: Classifies individual pages and groups them
- Holistic packet classification: Analyzes multi-document packets to identify document boundaries
- Field extraction using Claude via Amazon Bedrock
- Assessment (optional): Confidence evaluation of extraction results using LLMs
State Machine Workflow
Section titled “State Machine Workflow”The Step Functions state machine (workflow.asl.json) orchestrates the following flow:
OCRStep → ClassificationStep → ProcessPageGroups (Map State for Extraction) → ProcessResultsStepEach step includes comprehensive retry logic for handling transient errors:
- Initial retry after 2 seconds
- Exponential backoff with rate of 2
- Maximum of 8-10 retry attempts depending on the step
Lambda Functions
Section titled “Lambda Functions”OCR Function
Section titled “OCR Function”- Purpose: Processes input PDFs using Amazon Textract or Amazon Bedrock
- Key Features:
- Supports two OCR backends:
- Amazon Textract (default)
- Amazon Bedrock LLMs (Claude, Nova)
- Concurrent page processing with ThreadPoolExecutor
- Configurable Image Processing: Enhanced image resizing with aspect-ratio preservation
- Configurable DPI: Adjustable DPI for PDF-to-image conversion
- Dual Image Strategy: Stores original high-DPI images while using resized images for OCR processing
- Smart Resizing: Only downsizes images when necessary (scale factor < 1.0)
- Image preprocessing and optimization
- Comprehensive error handling and retries
- Detailed metrics tracking
- Supports two OCR backends:
- Input:
{"execution_arn": "<ARN>","output_bucket": "<BUCKET>","input": {"detail": {"bucket": { "name": "<BUCKET>" },"object": { "key": "<KEY>" }}}}
- Output:
{"metadata": {"input_bucket": "<BUCKET>","object_key": "<KEY>","output_bucket": "<BUCKET>","output_prefix": "<PREFIX>","num_pages": "<NUMBER OF PAGES>"},"pages": {"<PAGE_NUMBER>": {"rawTextUri": "<S3_URI>","parsedTextUri": "<S3_URI>","imageUri": "<S3_URI>"}}}
Classification Function
Section titled “Classification Function”- Purpose: Classifies pages or document packets using Claude via Bedrock and segments into sections
- Key Features:
- Two classification methods:
- Page-level classification (multimodalPageLevelClassification)
- Holistic packet classification (textbasedHolisticClassification)
- RVL-CDIP dataset categories for classification
- Concurrent page processing
- Automatic image resizing and optimization
- Robust error handling with exponential backoff
- Few shot example support for improved accuracy
- Two classification methods:
- Input: Output from OCR function
- Output:
{"metadata": "<FROM_OCR>","sections": [{"id": "<GROUP_ID>","class": "<CLASS>","pages": [...]}]}
Extraction Function
Section titled “Extraction Function”- Purpose: Extracts fields using Claude via Bedrock
- Key Features:
- Document class-specific attribute extraction
- Configurable extraction attributes
- Comprehensive error handling
- Token usage tracking
- Few shot example support for improved accuracy
- Input: Individual section from Classification output
- Output:
{"section": {"id": "<ID>","class": "<CLASS>","page_ids": ["<PAGEID>", ...],"outputJSONUri": "<S3_URI>"},"pages": [...]}
ProcessResults Function
Section titled “ProcessResults Function”- Purpose: Consolidates results from all sections
- Output: Standardized format for GenAIIDP parent stack:
{"Sections": [{"Id": "<ID>","PageIds": ["<PAGEID>", ...],"Class": "<CLASS>","OutputJSONUri": "<S3_URI>"}],"Pages": [{"Id": "<ID>","Class": "<CLASS>","TextUri": "<S3_URI>","ImageUri": "<S3_URI>"}],"PageCount": "<TOTAL_PAGES>"}
Human-in-the-Loop (HITL)
Section titled “Human-in-the-Loop (HITL)”Pattern-2 supports Human-in-the-Loop (HITL) review capabilities using Amazon SageMaker Augmented AI (A2I). This feature allows human reviewers to validate and correct extracted information when the system’s confidence falls below a specified threshold.
Pattern-2 Specific Configuration:
EnableHITL: Boolean parameter to enable/disable the HITL featureIsPattern2HITLEnabled: Boolean parameter specific to Pattern-2 HITL enablementPattern2 - Existing Private Workforce ARN: Optional parameter to use existing private workforce
For comprehensive HITL documentation including workflow details, configuration steps, best practices, and troubleshooting, see the Human-in-the-Loop Review Guide.
Monitoring and Metrics
Section titled “Monitoring and Metrics”The pattern includes a comprehensive CloudWatch dashboard with:
Performance Metrics
Section titled “Performance Metrics”- Document and page throughput
- Token usage (input/output/total)
- Bedrock request statistics
- Processing latencies
- Throttling and retry metrics
Error Tracking
Section titled “Error Tracking”- Lambda function errors
- Long-running invocations
- Classification/extraction failures
- Throttling events
Lambda Function Metrics
Section titled “Lambda Function Metrics”- Duration
- Memory usage
- Error rates
- Concurrent executions
Template Outputs
Section titled “Template Outputs”The pattern exports these outputs to the parent stack:
StateMachineName: Name of Step Functions state machineStateMachineArn: ARN of Step Functions state machineStateMachineLogGroup: CloudWatch log group for state machineDashboardName: Name of pattern-specific dashboardDashboardArn: ARN of pattern-specific dashboard
Configuration
Section titled “Configuration”Stack Deployment Parameters:
ClassificationMethod: Classification methodology to use (options: ‘multimodalPageLevelClassification’ or ‘textbasedHolisticClassification’)- Summarization: Control summarization via configuration file
summarization.enabledproperty (replacesIsSummarizationEnabledparameter) ConfigurationDefaultS3Uri: Optional S3 URI to custom configuration (uses default configuration if not specified)MaxConcurrentWorkflows: Workflow concurrency limitLogRetentionDays: CloudWatch log retention periodExecutionTimeThresholdMs: Latency threshold for alerts
Configuration Management:
- Model selection is now handled through configuration files rather than CloudFormation parameters
- Configuration supports multiple presets per pattern (e.g., default, checkboxed_attributes_extraction, medical_records_summarization, few_shot_example)
- Configuration can be updated through the Web UI without stack redeployment
- Model choices are constrained through enum constraints in the configuration schema
Bedrock OCR Feature
Section titled “Bedrock OCR Feature”Overview
Section titled “Overview”Pattern 2 now supports Amazon Bedrock LLMs (Claude, Nova) as an alternative OCR backend alongside the traditional Amazon Textract service. This feature enables multimodal document processing where large language models can extract text from document images using their vision capabilities.
Configuration
Section titled “Configuration”Bedrock OCR is configured through the pattern’s configuration files. The OCR backend can be selected using the backend parameter:
ocr: backend: "bedrock" # Options: "textract", "bedrock", "none" model_id: "us.anthropic.claude-3-7-sonnet-20250219-v1:0" system_prompt: "You are an expert OCR system. Extract all text from the provided image accurately, preserving layout where possible." task_prompt: "Extract all text from this document image. Preserve the layout, including paragraphs, tables, and formatting."Enabling Bedrock OCR
Section titled “Enabling Bedrock OCR”To use Bedrock OCR:
-
Set the backend: Configure
backend: "bedrock"in your OCR configuration -
Choose a model: Select from supported vision-capable models:
us.amazon.nova-lite-v1:0us.amazon.nova-pro-v1:0us.amazon.nova-premier-v1:0us.amazon.nova-2-lite-v1:0us.anthropic.claude-3-haiku-20240307-v1:0us.anthropic.claude-haiku-4-5-20251001-v1:0us.anthropic.claude-3-5-sonnet-20241022-v2:0us.anthropic.claude-3-7-sonnet-20250219-v1:0us.anthropic.claude-sonnet-4-20250514-v1:0us.anthropic.claude-sonnet-4-20250514-v1:0:1mus.anthropic.claude-sonnet-4-5-20250929-v1:0us.anthropic.claude-sonnet-4-5-20250929-v1:0:1mus.anthropic.claude-sonnet-4-6us.anthropic.claude-sonnet-4-6:1mus.anthropic.claude-opus-4-20250514-v1:0us.anthropic.claude-opus-4-1-20250805-v1:0us.anthropic.claude-opus-4-5-20251101-v1:0us.anthropic.claude-opus-4-6-v1us.anthropic.claude-opus-4-6-v1:1meu.amazon.nova-lite-v1:0eu.amazon.nova-pro-v1:0eu.amazon.nova-2-lite-v1:0eu.anthropic.claude-3-haiku-20240307-v1:0eu.anthropic.claude-haiku-4-5-20251001-v1:0eu.anthropic.claude-3-5-sonnet-20241022-v2:0eu.anthropic.claude-3-7-sonnet-20250219-v1:0eu.anthropic.claude-sonnet-4-20250514-v1:0eu.anthropic.claude-sonnet-4-5-20250929-v1:0eu.anthropic.claude-sonnet-4-5-20250929-v1:0:1meu.anthropic.claude-sonnet-4-6eu.anthropic.claude-sonnet-4-6:1meu.anthropic.claude-opus-4-5-20251101-v1:0eu.anthropic.claude-opus-4-6-v1eu.anthropic.claude-opus-4-6-v1:1mqwen.qwen3-vl-235b-a22bglobal.amazon.nova-2-lite-v1:0global.anthropic.claude-haiku-4-5-20251001-v1:0global.anthropic.claude-sonnet-4-5-20250929-v1:0global.anthropic.claude-sonnet-4-5-20250929-v1:0:1mglobal.anthropic.claude-sonnet-4-6global.anthropic.claude-sonnet-4-6:1mglobal.anthropic.claude-opus-4-5-20251101-v1:0global.anthropic.claude-opus-4-6-v1global.anthropic.claude-opus-4-6-v1:1m
-
Configure prompts: Customize system and task prompts for your specific use case
-
Deploy: The configuration can be updated through the Web UI without stack redeployment
Benefits
Section titled “Benefits”Advantages of Bedrock OCR:
- Multimodal Understanding: LLMs can understand both visual layout and textual content
- Context Awareness: Better handling of complex document structures and relationships
- Flexibility: Customizable prompts for domain-specific terminology and formats
- Advanced Reasoning: Can handle challenging cases like handwritten text, poor quality scans, or complex layouts
- Unified Processing: Same models used for OCR, classification, and extraction provide consistency
When to Use Bedrock OCR:
- Documents with complex layouts or mixed content types
- Handwritten or low-quality documents where Textract struggles
- Domain-specific documents requiring contextual understanding
- When you want unified processing across the entire pipeline
- For experimental or specialized use cases requiring prompt customization
Cost Considerations
Section titled “Cost Considerations”Bedrock OCR Pricing:
- Charged per input/output token rather than per page
- Typically higher cost per page than Textract for standard documents
- Cost varies significantly by model (Nova Lite < Nova Pro < Claude models)
- Image tokens are more expensive than text tokens
Cost Optimization Tips:
- Model Selection: Use Nova Lite for cost-sensitive applications, Claude for quality-critical use cases
- Image Preprocessing: Enable image resizing and preprocessing to reduce token consumption
- Prompt Optimization: Use concise, focused prompts to minimize token usage
- Hybrid Approach: Use Textract for standard documents, Bedrock for complex cases
- Batch Processing: Process multiple pages efficiently with concurrent processing
OCR Configuration
Section titled “OCR Configuration”The OCR service in Pattern 2 supports enhanced image processing capabilities for optimal text extraction:
DPI Configuration
Section titled “DPI Configuration”Configure DPI (Dots Per Inch) for PDF-to-image conversion:
# Example OCR service initialization with custom DPIocr_service = OcrService( dpi=400, # Higher DPI for better quality (default: 300) resize_config={ 'target_width': 1200, 'target_height': 1600 })Image Resizing Configuration
Section titled “Image Resizing Configuration”The OCR service supports optional image resizing for processing optimization:
# OCR configuration exampleocr: image: resize_config: target_width: 951 # Target width for processing target_height: 1268 # Target height for processing preprocessing: true # Enable adaptive binarization preprocessingOCR Image Processing Features
Section titled “OCR Image Processing Features”- Configurable DPI: Higher DPI (400+) for better quality, standard DPI (300) for balanced performance
- Dual Image Strategy:
- Stores original high-DPI images in S3 for archival and downstream processing
- Uses resized images for OCR processing to optimize performance
- Aspect Ratio Preservation: Images are resized proportionally without distortion
- Smart Scaling: Only downsizes images when necessary (scale factor < 1.0)
- Image Preprocessing: Optional adaptive binarization to improve OCR accuracy on challenging documents
- Enhanced Logging: Detailed logging for DPI and resize operations
Image Preprocessing Configuration
Section titled “Image Preprocessing Configuration”The OCR service supports optional adaptive binarization preprocessing to improve OCR accuracy on challenging documents:
# Enable preprocessing for improved OCR accuracyocr: image: preprocessing: true # Enable adaptive binarizationAdaptive Binarization Benefits:
- Improved OCR Accuracy: Significantly enhances text extraction on documents with uneven lighting, shadows, or low contrast
- Background Noise Reduction: Removes background gradients and noise that can interfere with OCR
- Enhanced Edge Detection: Sharpens text boundaries for better character recognition
- Robust Processing: Handles challenging document conditions like poor scans or faded text
When to Enable Preprocessing:
- Documents with uneven lighting or shadows
- Low contrast text or faded documents
- Scanned documents with background noise
- Handwritten or mixed content documents
- When standard OCR accuracy is insufficient
Configuration Benefits
Section titled “Configuration Benefits”- Quality Control: Higher DPI settings improve OCR accuracy for complex documents
- Performance Optimization: Resized images reduce processing time and memory usage
- Storage Efficiency: Dual strategy balances quality preservation with processing efficiency
- Preprocessing Enhancement: Adaptive binarization improves OCR accuracy on challenging documents
- Flexibility: Runtime configuration allows adjustment without code changes
- Backward Compatibility: Default values maintain existing behavior
Best Practices for OCR
Section titled “Best Practices for OCR”-
DPI Selection:
- Use 300 DPI for standard documents
- Use 400+ DPI for documents with small text or complex layouts
- Consider processing costs when using higher DPI settings
-
Image Resizing:
- Enable resizing for large documents to improve processing speed
- Maintain aspect ratios to preserve text readability
- Test different dimensions based on document types
-
Performance Tuning:
- Monitor processing times and adjust DPI/resize settings accordingly
- Use concurrent processing for multi-page documents
- Balance quality requirements with processing costs
Customizing Classification
Section titled “Customizing Classification”The pattern supports two different classification methods:
-
Page-Level Classification (multimodalPageLevelClassification): This is the default method that classifies each page independently based on its visual layout and textual content. It outputs a simple JSON format with a single class label per page.
-
Holistic Packet Classification (textbasedHolisticClassification): This method examines the document as a whole to identify boundaries between different document types within a multi-document packet. It can detect logical document boundaries and identifies document types in the context of the whole document. This is especially useful for packets where individual pages may not be clearly classifiable on their own. It outputs a JSON format that identifies document segments with start and end page numbers.
You can select which method to use by setting the ClassificationMethod parameter when deploying the stack.
The classification system uses RVL-CDIP dataset categories and can be customized through the configuration files. Classification models and prompts are now managed through the configuration library rather than CloudFormation parameters.
Available categories:
- letter
- form
- handwritten
- advertisement
- scientific_report
- scientific_publication
- specification
- file_folder
- news_article
- budget
- invoice
- presentation
- questionnaire
- resume
- memo
Few Shot Example Feature
Section titled “Few Shot Example Feature”Pattern 2 supports few shot learning through example-based prompting to significantly improve classification and extraction accuracy. This feature allows you to provide concrete examples of documents with their expected classifications and attribute extractions.
Overview
Section titled “Overview”Few shot examples work by including reference documents with known classifications and expected attribute values in the prompts sent to the AI model. This helps the model understand the expected format and accuracy requirements for your specific use case.
Configuration
Section titled “Configuration”Few shot examples are configured using JSON Schema format in the configuration files in the config_library/unified/ directory. The few_shot_example configuration demonstrates how to set up examples:
classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Letter x-aws-idp-document-type: Letter type: object description: "A formal written correspondence..." properties: SenderName: type: string description: "The name of the person who wrote the letter..." x-aws-idp-examples: - x-aws-idp-class-prompt: "This is an example of the class 'Letter'" name: "Letter1" x-aws-idp-attributes-prompt: | expected attributes are: "SenderName": "Will E. Clark", "SenderAddress": "206 Maple Street P.O. Box 1056 Murray Kentucky 42071-1056", "RecipientName": "The Honorable Wendell H. Ford" x-aws-idp-image-path: "config_library/unified/few_shot_example/example-images/letter1.jpg" - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Email x-aws-idp-document-type: Email type: object description: "A digital message with email headers..." x-aws-idp-examples: - x-aws-idp-class-prompt: "This is an example of the class 'Email'" name: "Email1" x-aws-idp-attributes-prompt: | expected attributes are: "FromAddress": "Kelahan, Ben", "ToAddress": "TI New York: 'TI Minnesota", "Subject": "FW: Morning Team Notes 4/20" x-aws-idp-image-path: "config_library/unified/few_shot_example/example-images/email1.jpg"Benefits
Section titled “Benefits”Using few shot examples provides several advantages:
- Improved Accuracy: Models perform better when given concrete examples
- Consistent Formatting: Examples help ensure consistent output structure
- Domain Adaptation: Examples help models understand domain-specific terminology
- Reduced Hallucination: Examples reduce the likelihood of made-up data
- Better Edge Case Handling: Examples can demonstrate how to handle unusual cases
Integration with Template Prompts
Section titled “Integration with Template Prompts”The few shot examples are automatically integrated into the classification and extraction prompts using the {FEW_SHOT_EXAMPLES} placeholder. You can also use the {DOCUMENT_IMAGE} placeholder for precise image positioning:
Standard Template with Text Only:
# In classification task_prompttask_prompt: | Classify this document into exactly one of these categories: {CLASS_NAMES_AND_DESCRIPTIONS}
<few_shot_examples> {FEW_SHOT_EXAMPLES} </few_shot_examples>
<document_ocr_data> {DOCUMENT_TEXT} </document_ocr_data>
# In extraction task_prompttask_prompt: | Extract attributes from this document.
<few_shot_examples> {FEW_SHOT_EXAMPLES} </few_shot_examples>
<document_ocr_data> {DOCUMENT_TEXT} </document_ocr_data>Enhanced Template with Image Placement:
# In classification task_prompt with image positioningtask_prompt: | Classify this document into exactly one of these categories: {CLASS_NAMES_AND_DESCRIPTIONS}
<few_shot_examples> {FEW_SHOT_EXAMPLES} </few_shot_examples>
Now examine this new document: {DOCUMENT_IMAGE}
<document_ocr_data> {DOCUMENT_TEXT} </document_ocr_data>
Classification:
# In extraction task_prompt with image positioningtask_prompt: | Extract attributes from this {DOCUMENT_CLASS} document: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
<few_shot_examples> {FEW_SHOT_EXAMPLES} </few_shot_examples>
Analyze this document image: {DOCUMENT_IMAGE}
<document_ocr_data> {DOCUMENT_TEXT} </document_ocr_data>
Extract as JSON:Available Template Placeholders
Section titled “Available Template Placeholders”Pattern 2 supports several placeholders for building dynamic prompts:
{CLASS_NAMES_AND_DESCRIPTIONS}: List of document classes and their descriptions{FEW_SHOT_EXAMPLES}: Examples from the configuration (class-specific for extraction, all classes for classification){DOCUMENT_TEXT}: OCR-extracted text content from the document{DOCUMENT_IMAGE}: Document image(s) positioned at specific locations in the prompt{DOCUMENT_CLASS}: The classified document type (used in extraction prompts){ATTRIBUTE_NAMES_AND_DESCRIPTIONS}: List of attributes to extract with their descriptions
Image Placement Benefits:
- Visual Context: Position images where they provide maximum context for the task
- Multimodal Understanding: Help models correlate visual and textual information effectively
- Flexible Design: Create prompts that flow naturally between different content types
- Enhanced Accuracy: Strategic image placement can improve both classification and extraction performance
Using Few Shot Examples
Section titled “Using Few Shot Examples”To use few shot examples in your deployment:
- Use the example configuration: Deploy with
ConfigurationDefaultS3Uripointing toconfig_library/unified/few_shot_example/config.yaml - Create custom examples: Copy the example configuration and modify it with your own document examples
- Provide example images: Place example document images in the appropriate directory and reference them in the
imagePathfield
Best Practices
Section titled “Best Practices”- Quality over Quantity: Use 1-3 high-quality examples per document class
- Representative Examples: Choose examples that represent typical documents in your use case
- Clear Attribution: Ensure examples clearly show expected attribute extractions
- Diverse Coverage: Include examples that cover different variations and edge cases
Customizing Extraction
Section titled “Customizing Extraction”The extraction system can be customized through the configuration files rather than CloudFormation parameters:
-
Attribute Definitions:
- Define attributes per document class in the
classessection of the configuration - Specify descriptions for each attribute
- Configure the format and structure
- Define attributes per document class in the
-
Extraction Prompts:
- Customize system behavior through the
system_promptin configuration - Add domain expertise and guidance in the
task_prompt - Modify output formatting requirements
- Customize system behavior through the
-
Model Selection:
- Model selection is handled through enum constraints in the configuration
- Available models are defined in the configuration schema
- Changes can be made through the Web UI without redeployment
Example attribute definition from the configuration using JSON Schema format:
classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Invoice x-aws-idp-document-type: Invoice type: object description: A commercial document issued by a seller to a buyer relating to a sale properties: InvoiceNumber: type: string description: The unique identifier for the invoice. Look for 'invoice no', 'invoice #', or 'bill number', typically near the top of the document. InvoiceDate: type: string description: The date when the invoice was issued. May be labeled as 'date', 'invoice date', or 'billing date'. TotalAmount: type: string description: The final amount to be paid including all charges. Look for 'total', 'grand total', or 'amount due', typically the last figure on the invoice.Assessment Feature
Section titled “Assessment Feature”Pattern 2 includes an optional assessment feature that evaluates the confidence of extraction results using LLMs. This feature provides automated quality assurance by analyzing extraction outputs against source documents.
Overview
Section titled “Overview”The assessment feature runs after successful extraction and provides:
- Confidence Scores: Per-attribute confidence ratings (0.0-1.0)
- Explanatory Reasoning: Human-readable explanations for each confidence score
- UI Integration: Automatic display in the web interface visual editor
- Cost Optimization: Optional deployment and efficient token usage
- Granular Assessment: Advanced scalable approach for complex documents with many attributes
Enabling Assessment
Section titled “Enabling Assessment”Assessment can now be controlled via the configuration file rather than CloudFormation stack parameters. This provides more flexibility and eliminates the need for stack redeployment when changing assessment behavior.
Configuration-based Control (Recommended):
assessment: enabled: true # Set to false to disable assessment model: us.amazon.nova-lite-v1:0 temperature: 0.0 # ... other assessment settingsKey Benefits:
- Runtime Control: Enable/disable without stack redeployment
- Cost Optimization: Zero LLM costs when disabled (
enabled: false) - Simplified Architecture: No conditional logic in state machines
- Backward Compatible: Defaults to
enabled: truewhen property is missing
Behavior When Disabled:
- Assessment lambda is still called (minimal overhead)
- Service immediately returns with logging: “Assessment is disabled via configuration”
- No LLM API calls or S3 operations are performed
- Document processing continues to completion
Migration Note: The previous IsAssessmentEnabled CloudFormation parameter has been removed in favor of this configuration-based approach.
The assessment step is always called in the state machine workflow, but the service itself handles the enablement decision:
OCRStep → ClassificationStep → ProcessPageGroups (Map State): ExtractSection → AssessSection (if enabled)State Machine Integration
Section titled “State Machine Integration”The assessment step integrates seamlessly into Pattern-2’s ProcessSections map state:
{ "AssessSection": { "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke", "Parameters": { "FunctionName": "${AssessmentFunction}", "Payload": { "document.$": "$.document", "section_id.$": "$.section_id" } }, "End": true }}Configuration Options
Section titled “Configuration Options”Pattern 2 supports both standard and granular assessment approaches:
Standard Assessment
Section titled “Standard Assessment”For documents with moderate complexity:
assessment: model: "anthropic.claude-3-5-sonnet-20241022-v2:0" temperature: 0 system_prompt: "You are an expert document analyst..." task_prompt: | Assess the confidence of extraction results for this {DOCUMENT_CLASS} document.
Extraction Results: {EXTRACTION_RESULTS} Attributes: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS} Document Text: {DOCUMENT_TEXT} OCR Confidence: {OCR_TEXT_CONFIDENCE} {DOCUMENT_IMAGE}Granular Assessment
Section titled “Granular Assessment”For complex documents with many attributes or large lists:
assessment: model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0" temperature: 0 system_prompt: "You are an expert document analyst..." task_prompt: | Assess the confidence of extraction results for this {DOCUMENT_CLASS} document.
Attributes to assess: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS} Extraction results: {EXTRACTION_RESULTS} Document context: {DOCUMENT_TEXT} {OCR_TEXT_CONFIDENCE} {DOCUMENT_IMAGE}
# Granular assessment configuration granular: max_workers: 6 # Parallel processing threads simple_batch_size: 3 # Attributes per batch list_batch_size: 1 # List items per batchWhen to Use Granular Assessment
Section titled “When to Use Granular Assessment”Consider granular assessment for:
- Bank statements with hundreds of transactions
- Documents with 10+ attributes requiring individual attention
- Complex nested structures (group and list attributes)
- Performance-critical scenarios where parallel processing helps
- Cost optimization when prompt caching is available
Testing Assessment
Section titled “Testing Assessment”Use the provided assessment notebooks:
# Standard assessment testingjupyter notebook notebooks/e2e-example-with-assessment.ipynb
# Granular assessment testingjupyter notebook notebooks/examples/step4_assessment_granular.ipynbComprehensive Documentation
Section titled “Comprehensive Documentation”For detailed information about assessment configuration, output formats, confidence thresholds, UI integration, cost optimization, and troubleshooting, see the Assessment Documentation.
The assessment documentation covers:
- Complete configuration examples and placeholders
- Attribute types and assessment formats (simple, group, list)
- Confidence threshold configuration and UI integration
- Granular assessment architecture and performance tuning
- Cost optimization strategies and token reduction techniques
- Multimodal assessment with image processing
- Testing procedures and best practices
Best Practices
Section titled “Best Practices”-
Configuration Management:
- Use the configuration library for different use cases (default, medical_records, few_shot_example)
- Test configuration changes thoroughly before production deployment
- Leverage the Web UI for configuration updates without redeployment
-
Throttling Management:
- Implement exponential backoff with jitter
- Configure appropriate retry limits
- Monitor throttling metrics
-
Error Handling:
- Comprehensive error logging
- Graceful degradation
- Clear error messages
-
Performance Optimization:
- Concurrent processing where appropriate
- Image optimization
- Resource pooling
-
Monitoring:
- Detailed CloudWatch metrics
- Performance dashboards
- Error tracking
-
Security:
- KMS encryption
- Least privilege IAM roles
- Secure configuration management
-
Few Shot Examples:
- Use high-quality, representative examples
- Include examples for all document classes you expect to process
- Regularly review and update examples based on real-world performance
- Test configurations with examples before production deployment
-
Image Processing Optimization:
- Configure appropriate image dimensions for each service based on document complexity
- Use higher DPI (400+) for OCR when processing documents with small text or complex layouts
- Balance image quality with processing performance and costs
- Test different image configurations with your specific document types
- Monitor memory usage and processing times when adjusting image settings
- Leverage the dual image strategy in OCR to preserve quality while optimizing processing