Assessment Feature
Assessment Feature
Section titled “Assessment Feature”Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Overview
Section titled “Overview”The Assessment feature provides automated confidence evaluation of document extraction results using Large Language Models (LLMs). This feature analyzes extraction outputs against source documents to provide confidence scores and explanations for each extracted attribute, helping users understand the reliability of automated extractions.
Key Features
Section titled “Key Features”- Multimodal Analysis: Combines text analysis with document images for comprehensive confidence assessment
- Per-Attribute Scoring: Provides individual confidence scores and explanations for each extracted attribute
- Automatic Bounding Box Processing: Spatial localization of extracted fields with UI-compatible geometry output
- Token-Optimized Processing: Uses condensed text confidence data for 80-90% token reduction compared to full OCR results
- UI Integration: Seamlessly displays assessment results in the web interface with explainability information
- Confidence Threshold Support: Configurable global and per-attribute confidence thresholds with color-coded visual indicators
- Enhanced Visual Feedback: Real-time confidence assessment with green/red/black color coding in all data viewing interfaces
- Optional Deployment: Controlled by
IsAssessmentEnabledparameter (defaults to false for cost optimization) - Flexible Image Usage: Images only processed when explicitly requested via
{DOCUMENT_IMAGE}placeholder - Granular Assessment: Advanced scalable approach for complex documents with many attributes or list items
- Parallel Processing: Multi-threaded assessment execution for improved performance
- Prompt Caching: Leverages LLM caching capabilities to reduce costs for repeated assessments
- Visual Document Annotation: Automatic conversion of spatial data for immediate document visualization
Architecture
Section titled “Architecture”Assessment Workflow
Section titled “Assessment Workflow”- Post-Extraction Processing: Assessment runs after successful extraction within the same state machine
- Document Analysis: LLM analyzes extraction results against source document text and optionally images
- Confidence Scoring: Generates confidence scores (0.0-1.0) with explanatory reasoning for each attribute
- Result Integration: Appends assessment data to existing extraction results in
explainability_infoformat - UI Display: Assessment results automatically appear in the web interface visual editor
State Machine Integration
Section titled “State Machine Integration”The assessment step is conditionally integrated into Pattern-2’s ProcessSections map state:
{ "AssessSection": { "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke", "Parameters": { "FunctionName": "${AssessmentFunction}", "Payload": { "document.$": "$.document", "section_id.$": "$.section_id" } }, "End": true }}Configuration
Section titled “Configuration”Configuration-Based Control
Section titled “Configuration-Based Control”Assessment can now be controlled via the configuration file rather than CloudFormation stack parameters. This provides more flexibility and eliminates the need for stack redeployment when changing assessment behavior.
Configuration-based Control (Recommended):
assessment: enabled: true # Set to false to disable assessment model: us.amazon.nova-lite-v1:0 temperature: 0.0 # ... other assessment settingsKey Benefits:
- Runtime Control: Enable/disable without stack redeployment
- Cost Optimization: Zero LLM costs when disabled (
enabled: false) - Simplified Architecture: No conditional logic in state machines
- Backward Compatible: Defaults to
enabled: truewhen property is missing
Behavior When Disabled:
- Assessment lambda is still called (minimal overhead)
- Service immediately returns with logging: “Assessment is disabled via configuration”
- No LLM API calls or S3 operations are performed
- Document processing continues to completion
Migration Note: The previous IsAssessmentEnabled CloudFormation parameter has been removed in favor of this configuration-based approach.
Assessment Configuration Section
Section titled “Assessment Configuration Section”Add the assessment section to your configuration YAML:
assessment: model: "anthropic.claude-3-5-sonnet-20241022-v2:0" temperature: 0 top_k: 5 top_p: 0.1 max_tokens: 4096 system_prompt: | You are an expert document analyst specializing in assessing the confidence and accuracy of document extraction results. task_prompt: | <background> You are an expert document analysis assessment system. Your task is to evaluate the confidence of extraction results for a document of class {DOCUMENT_CLASS} and provide precise spatial localization for each field. </background>
<task> Analyze the extraction results against the source document and provide confidence assessments AND bounding box coordinates for each extracted attribute. Consider factors such as: 1. Text clarity and OCR quality in the source regions 2. Alignment between extracted values and document content 3. Presence of clear evidence supporting the extraction 4. Potential ambiguity or uncertainty in the source material 5. Completeness and accuracy of the extracted information 6. Precise spatial location of each field in the document </task>
<assessment-guidelines> For each attribute, provide: - A confidence score between 0.0 and 1.0 where: - 1.0 = Very high confidence, clear and unambiguous evidence - 0.8-0.9 = High confidence, strong evidence with minor uncertainty - 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity - 0.4-0.5 = Low confidence, weak or unclear evidence - 0.0-0.3 = Very low confidence, little to no supporting evidence - A clear explanation of the confidence reasoning - Precise spatial coordinates where the field appears in the document </assessment-guidelines>
<spatial-localization-guidelines> For each field, provide bounding box coordinates: - bbox: [x1, y1, x2, y2] coordinates in normalized 0-1000 scale - page: Page number where the field appears (starting from 1)
Coordinate system: - Use normalized scale 0-1000 for both x and y axes - x1, y1 = top-left corner of bounding box - x2, y2 = bottom-right corner of bounding box - Ensure x2 > x1 and y2 > y1 - Make bounding boxes tight around the actual text content </spatial-localization-guidelines>
<<CACHEPOINT>>
<document-image> {DOCUMENT_IMAGE} </document-image>
<ocr-text-confidence-results> {OCR_TEXT_CONFIDENCE} </ocr-text-confidence-results>
<<CACHEPOINT>>
<attributes-definitions> {ATTRIBUTE_NAMES_AND_DESCRIPTIONS} </attributes-definitions>
<extraction-results> {EXTRACTION_RESULTS} </extraction-results>
Provide confidence assessments with spatial localization in JSON format: { "attribute_name": { "confidence": 0.85, "confidence_reason": "Clear text with high OCR confidence, easily identifiable location", "bbox": [100, 200, 300, 250], "page": 1 } }Prompt Placeholders
Section titled “Prompt Placeholders”The assessment prompts support the following placeholders:
| Placeholder | Description |
|---|---|
{DOCUMENT_CLASS} | The classified document type |
{EXTRACTION_RESULTS} | JSON string of extraction results to assess |
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS} | Formatted list of attribute names and descriptions |
{DOCUMENT_TEXT} | Full document text (markdown) from OCR |
{OCR_TEXT_CONFIDENCE} | Condensed OCR confidence data (80-90% token reduction) |
{DOCUMENT_IMAGE} | Optional - Inserts document images at specified position |
Image Processing with DOCUMENT_IMAGE
Section titled “Image Processing with DOCUMENT_IMAGE”The {DOCUMENT_IMAGE} placeholder enables precise control over image inclusion:
Text-Only Assessment (Default)
Section titled “Text-Only Assessment (Default)”task_prompt: | Assess extraction results based on document text and OCR confidence data:
Document Text: {DOCUMENT_TEXT} OCR Confidence: {OCR_TEXT_CONFIDENCE} Extraction Results: {EXTRACTION_RESULTS}Multimodal Assessment
Section titled “Multimodal Assessment”task_prompt: | Assess extraction results by analyzing both text and visual document content:
Document Text: {DOCUMENT_TEXT}
{DOCUMENT_IMAGE}
Based on the above document image and text, assess these extraction results: {EXTRACTION_RESULTS}Important: Images are only processed when the {DOCUMENT_IMAGE} placeholder is explicitly present in the prompt template.
Automatic Bounding Box Processing
Section titled “Automatic Bounding Box Processing”The assessment feature includes automatic spatial localization capabilities that extract bounding box coordinates from LLM responses and convert them to a UI-compatible geometry format. This provides visual field localization consistent with Pattern-1 (BDA) without requiring additional configuration.
How It Works
Section titled “How It Works”1. Spatial Localization in Task Prompts
Section titled “1. Spatial Localization in Task Prompts”Include spatial localization guidelines in your assessment task prompts to request bounding box coordinates from the LLM:
assessment: task_prompt: | <spatial-localization-guidelines> For each field, provide bounding box coordinates: - bbox: [x1, y1, x2, y2] coordinates in normalized 0-1000 scale - page: Page number where the field appears (starting from 1)
Coordinate system: - Use normalized scale 0-1000 for both x and y axes - x1, y1 = top-left corner of bounding box - x2, y2 = bottom-right corner of bounding box - Ensure x2 > x1 and y2 > y1 - Make bounding boxes tight around the actual text content </spatial-localization-guidelines>
Provide confidence assessments with spatial localization in JSON format: { "attribute_name": { "confidence": 0.85, "confidence_reason": "Clear text with high OCR confidence", "bbox": [100, 200, 300, 250], "page": 1 } }2. Automatic Coordinate Conversion
Section titled “2. Automatic Coordinate Conversion”When the LLM provides bounding box data in the assessment response, the system automatically:
- Detects spatial data: Identifies
bboxandpagefields in the LLM response - Converts coordinates: Transforms from 0-1000 normalized scale to 0-1 decimal format
- Calculates dimensions: Converts [x1, y1, x2, y2] to {top, left, width, height} format
- Creates geometry objects: Formats data for Pattern-1/BDA UI compatibility
- Processes recursively: Handles nested group attributes and list items automatically
3. Coordinate System Transformation
Section titled “3. Coordinate System Transformation”The conversion process transforms coordinates from the LLM’s 0-1000 scale to the UI’s 0-1 decimal format:
# LLM Response Format{ "StatementDate": { "confidence": 0.95, "bbox": [100, 200, 400, 250], # [x1, y1, x2, y2] in 0-1000 scale "page": 1 }}
# Automatically Converted to UI Format{ "StatementDate": { "confidence": 0.95, "confidence_threshold": 0.85, "geometry": [{ "boundingBox": { "top": 0.2, # y1 / 1000 "left": 0.1, # x1 / 1000 "width": 0.3, # (x2 - x1) / 1000 "height": 0.05 # (y2 - y1) / 1000 }, "page": 1 }] }}4. Pattern-1 Compatibility
Section titled “4. Pattern-1 Compatibility”The geometry format exactly matches Pattern-1 (BDA) specifications:
- boundingBox object: Contains top, left, width, height as decimal values (0-1)
- page field: 1-based page numbering
- Array structure: geometry as array to support multiple regions per field
- Recursive processing: Handles nested attributes like
CompanyAddress.State
Configuration-Free Operation
Section titled “Configuration-Free Operation”The bounding box feature requires no additional configuration:
- Automatic detection: System detects when LLM provides spatial data
- Fallback handling: Works normally when no bounding boxes are provided
- Backward compatibility: Existing configurations continue to work unchanged
- Optional enhancement: Bounding boxes enhance existing assessment without breaking changes
Output Format
Section titled “Output Format”Assessment results are appended to extraction results in the explainability_info format expected by the UI. The format varies based on the attribute type defined in your document class configuration.
Attribute Types and Assessment Formats
Section titled “Attribute Types and Assessment Formats”The assessment service supports three types of attributes, each with a specific assessment response format:
1. Simple Attributes
Section titled “1. Simple Attributes”For basic single-value extractions like dates, amounts, or names:
Configuration:
properties: StatementDate: type: string description: "The date of the bank statement"Assessment Response (without spatial data):
{ "StatementDate": { "confidence": 0.85, "confidence_reason": "Date clearly visible in statement header" }}Assessment Response (with automatic spatial data):
{ "StatementDate": { "confidence": 0.85, "confidence_reason": "Date clearly visible in statement header", "confidence_threshold": 0.85, "geometry": [{ "boundingBox": { "top": 0.2, "left": 0.1, "width": 0.15, "height": 0.03 }, "page": 1 }] }}2. Group Attributes
Section titled “2. Group Attributes”For nested object structures with multiple related fields:
Configuration:
properties: AccountDetails: type: object description: "Bank account information" properties: AccountNumber: type: string description: "The account number" RoutingNumber: type: string description: "The bank routing number"Assessment Response (with automatic spatial data):
{ "AccountDetails": { "AccountNumber": { "confidence": 0.90, "confidence_reason": "Account number clearly printed in standard location", "confidence_threshold": 0.90, "geometry": [{ "boundingBox": { "top": 0.15, "left": 0.2, "width": 0.25, "height": 0.04 }, "page": 1 }] }, "RoutingNumber": { "confidence": 0.75, "confidence_reason": "Routing number visible but slightly blurred", "confidence_threshold": 0.90, "geometry": [{ "boundingBox": { "top": 0.2, "left": 0.2, "width": 0.2, "height": 0.03 }, "page": 1 }] } }}3. List Attributes
Section titled “3. List Attributes”For arrays of items, such as transactions in a bank statement:
Configuration:
properties: Transactions: type: array description: "List of all transactions on the statement" x-aws-idp-list-item-description: "Individual transaction entry" items: type: object properties: Date: type: string description: "Transaction date" Description: type: string description: "Transaction description" Amount: type: string description: "Transaction amount"Assessment Response (with automatic spatial data):
{ "Transactions": [ { "Date": { "confidence": 0.95, "confidence_reason": "Date clearly printed in standard format", "confidence_threshold": 0.80, "geometry": [{ "boundingBox": { "top": 0.3, "left": 0.1, "width": 0.12, "height": 0.025 }, "page": 1 }] }, "Description": { "confidence": 0.88, "confidence_reason": "Description text is clear and readable", "confidence_threshold": 0.75, "geometry": [{ "boundingBox": { "top": 0.3, "left": 0.25, "width": 0.35, "height": 0.025 }, "page": 1 }] }, "Amount": { "confidence": 0.92, "confidence_reason": "Amount aligned in currency column with clear digits", "confidence_threshold": 0.85, "geometry": [{ "boundingBox": { "top": 0.3, "left": 0.65, "width": 0.15, "height": 0.025 }, "page": 1 }] } }, { "Date": { "confidence": 0.90, "confidence_reason": "Date visible but slightly smudged", "confidence_threshold": 0.80, "geometry": [{ "boundingBox": { "top": 0.33, "left": 0.1, "width": 0.12, "height": 0.025 }, "page": 1 }] }, "Description": { "confidence": 0.85, "confidence_reason": "Description partially cut off but main text readable", "confidence_threshold": 0.75, "geometry": [{ "boundingBox": { "top": 0.33, "left": 0.25, "width": 0.3, "height": 0.025 }, "page": 1 }] }, "Amount": { "confidence": 0.94, "confidence_reason": "Amount clearly printed with proper decimal alignment", "confidence_threshold": 0.85, "geometry": [{ "boundingBox": { "top": 0.33, "left": 0.65, "width": 0.15, "height": 0.025 }, "page": 1 }] } } ]}Complete Example
Section titled “Complete Example”Here’s a complete example showing all three attribute types in a single assessment response:
{ "inference_result": { "StatementDate": "2024-01-31", "AccountDetails": { "AccountNumber": "1234567890", "RoutingNumber": "021000021" }, "Transactions": [ { "Date": "2024-01-15", "Description": "Direct Deposit - Salary", "Amount": "3500.00" }, { "Date": "2024-01-20", "Description": "ATM Withdrawal", "Amount": "-200.00" } ] }, "explainability_info": [ { "StatementDate": { "confidence": 0.95, "confidence_reason": "Statement date clearly printed in header", "confidence_threshold": 0.85, "geometry": [{ "boundingBox": {"top": 0.1, "left": 0.1, "width": 0.15, "height": 0.03}, "page": 1 }] }, "AccountDetails": { "AccountNumber": { "confidence": 0.90, "confidence_reason": "Account number clearly visible in account section", "confidence_threshold": 0.90, "geometry": [{ "boundingBox": {"top": 0.15, "left": 0.2, "width": 0.25, "height": 0.04}, "page": 1 }] }, "RoutingNumber": { "confidence": 0.85, "confidence_reason": "Routing number printed clearly below account number", "confidence_threshold": 0.90, "geometry": [{ "boundingBox": {"top": 0.2, "left": 0.2, "width": 0.2, "height": 0.03}, "page": 1 }] } }, "Transactions": [ { "Date": { "confidence": 0.95, "confidence_reason": "Transaction date clearly printed", "confidence_threshold": 0.80, "geometry": [{ "boundingBox": {"top": 0.3, "left": 0.1, "width": 0.12, "height": 0.025}, "page": 1 }] }, "Description": { "confidence": 0.88, "confidence_reason": "Description text is clear and complete", "confidence_threshold": 0.75, "geometry": [{ "boundingBox": {"top": 0.3, "left": 0.25, "width": 0.35, "height": 0.025}, "page": 1 }] }, "Amount": { "confidence": 0.92, "confidence_reason": "Amount properly aligned in currency format", "confidence_threshold": 0.85, "geometry": [{ "boundingBox": {"top": 0.3, "left": 0.65, "width": 0.15, "height": 0.025}, "page": 1 }] } }, { "Date": { "confidence": 0.90, "confidence_reason": "Date readable with minor print quality issues", "confidence_threshold": 0.80, "geometry": [{ "boundingBox": {"top": 0.33, "left": 0.1, "width": 0.12, "height": 0.025}, "page": 1 }] }, "Description": { "confidence": 0.85, "confidence_reason": "Description clear, standard ATM format", "confidence_threshold": 0.75, "geometry": [{ "boundingBox": {"top": 0.33, "left": 0.25, "width": 0.3, "height": 0.025}, "page": 1 }] }, "Amount": { "confidence": 0.94, "confidence_reason": "Negative amount clearly indicated with proper formatting", "confidence_threshold": 0.85, "geometry": [{ "boundingBox": {"top": 0.33, "left": 0.65, "width": 0.15, "height": 0.025}, "page": 1 }] } } ] } ], "metadata": { "assessment_time_seconds": 4.12, "assessment_parsing_succeeded": true }}Assessment Response Requirements
Section titled “Assessment Response Requirements”Important Guidelines:
- Match Extraction Structure: The assessment response must exactly match the structure of the
inference_result - List Item Assessment: For list attributes, assess each individual item separately, not as an aggregate
- Nested Confidence: Group attributes should have confidence assessments for each sub-attribute
- Consistent Format: Each confidence assessment should include
confidence(0.0-1.0) and optionallyconfidence_reason - Threshold Integration: The system automatically adds
confidence_thresholdvalues based on configuration
Confidence Thresholds
Section titled “Confidence Thresholds”Overview
Section titled “Overview”The assessment feature supports flexible confidence threshold configuration to help users identify extraction results that may require review. Thresholds can be set globally or per-attribute, with the UI providing immediate visual feedback through color-coded displays.
Configuration Options
Section titled “Configuration Options”Global Thresholds
Section titled “Global Thresholds”Set system-wide confidence requirements for all attributes:
{ "inference_result": { "YTDNetPay": "75000", "PayPeriodStartDate": "2024-01-01" }, "explainability_info": [ { "global_confidence_threshold": 0.85, "YTDNetPay": { "confidence": 0.92, "confidence_reason": "Clear match found in document" }, "PayPeriodStartDate": { "confidence": 0.75, "confidence_reason": "Moderate OCR confidence" } } ]}Per-Attribute Thresholds
Section titled “Per-Attribute Thresholds”Override global settings for specific fields requiring different confidence levels:
{ "explainability_info": [ { "YTDNetPay": { "confidence": 0.92, "confidence_threshold": 0.95, "confidence_reason": "Financial data requires high confidence" }, "PayPeriodStartDate": { "confidence": 0.75, "confidence_threshold": 0.70, "confidence_reason": "Date fields can accept moderate confidence" } } ]}Mixed Configuration
Section titled “Mixed Configuration”Combine global defaults with attribute-specific overrides:
{ "explainability_info": [ { "global_confidence_threshold": 0.80, "CriticalField": { "confidence": 0.85, "confidence_threshold": 0.95, "confidence_reason": "Override: higher threshold for critical data" }, "StandardField": { "confidence": 0.82, "confidence_reason": "Uses global threshold of 0.80" } } ]}Assessment Prompt Integration
Section titled “Assessment Prompt Integration”Include threshold guidance in your assessment prompts to ensure consistent confidence evaluation:
assessment: task_prompt: | Assess extraction confidence using these thresholds as guidance: - Financial data (amounts, taxes): 0.90+ confidence required - Personal information (names, addresses): 0.85+ confidence required - Dates and standard fields: 0.75+ confidence acceptable
Provide confidence scores between 0.0 and 1.0 with explanatory reasoning: { "attribute_name": { "confidence": 0.85, "confidence_threshold": 0.90, "confidence_reason": "Explanation of confidence assessment" } }UI Integration
Section titled “UI Integration”Assessment results automatically appear in the web interface with enhanced visual indicators:
Visual Feedback System
Section titled “Visual Feedback System”The UI provides immediate confidence feedback through color-coded displays:
Color Coding
Section titled “Color Coding”- 🟢 Green: Confidence meets or exceeds threshold (high confidence)
- 🔴 Red: Confidence falls below threshold (requires review)
- ⚫ Black: Confidence available but no threshold for comparison
Display Modes
Section titled “Display Modes”1. With Threshold (Color-Coded)
YTDNetPay: 75000Confidence: 92.0% / Threshold: 95.0% [RED - Below Threshold]
PayPeriodStartDate: 2024-01-01Confidence: 85.0% / Threshold: 70.0% [GREEN - Above Threshold]2. Confidence Only (Black Text)
EmployeeName: John SmithConfidence: 88.5% [BLACK - No Threshold Set]3. No Display When neither confidence nor threshold data is available, no confidence indicator is shown.
Interface Coverage
Section titled “Interface Coverage”1. Visual Editor Tab
- Split-pane layout with document image (left) and form-based field editing (right)
- Color-coded confidence display (green=meets threshold, red=below threshold, black=no threshold)
- Bounding Box Visualization: Assessment geometry data automatically displayed as overlays on document images
- Visual connection between form fields and document bounding boxes with spatial localization
- Interactive overlay showing precise field locations from assessment spatial data
- Supports nested data structures (arrays, objects) with recursive confidence display
- Inline editing with change tracking (✏️ Edited badges, blue/orange left borders)
2. JSON Editor Tab
- Raw JSON editing for advanced users with full JSON validation
- Section filtering with multiselect dropdown
- Same confidence data available in JSON structure
3. Revision History Tab
- Complete audit trail showing all edits with timestamps
- Reviewer identification and field-level diffs
- Confidence scores preserved in edit history
4. Smart Filtering
- Low Confidence Filter: Toggle to show only fields with confidence below threshold
- Evaluation Mismatches Filter: Show only fields where prediction doesn’t match baseline
- Collapsible Tree Navigation: Expand/Collapse All controls for nested structures
5. Nested Data Support Confidence indicators work with complex document structures:
FederalTaxes[0]: ├── YTD: 2111.2 [Confidence: 67.6% / Threshold: 85.0% - RED] └── Period: 40.6 [Confidence: 75.8% - BLACK]
StateTaxes[0]: ├── YTD: 438.36 [Confidence: 84.4% / Threshold: 80.0% - GREEN] └── Period: 8.43 [Confidence: 83.2% / Threshold: 80.0% - GREEN]Image Processing Configuration
Section titled “Image Processing Configuration”The assessment service supports configurable image dimensions for optimal confidence evaluation:
New Default Behavior (Preserves Original Resolution)
Section titled “New Default Behavior (Preserves Original Resolution)”Important Change: Empty strings or unspecified image dimensions now preserve the original document resolution for maximum assessment accuracy:
assessment: model: "us.amazon.nova-lite-v1:0" # Image processing settings - preserves original resolution image: target_width: "" # Empty string = no resizing (recommended) target_height: "" # Empty string = no resizing (recommended)Custom Image Dimensions
Section titled “Custom Image Dimensions”Configure specific dimensions when performance optimization is needed:
# For detailed visual assessment with controlled dimensionsassessment: image: target_width: "1200" # Resize to 1200 pixels wide target_height: "1600" # Resize to 1600 pixels tall
# For standard confidence evaluationassessment: image: target_width: "800" # Smaller for faster processing target_height: "1000" # Maintains good qualityImage Resizing Features for Assessment
Section titled “Image Resizing Features for Assessment”- Original Resolution Preservation: Empty strings preserve full document resolution for maximum assessment accuracy
- Aspect Ratio Preservation: Images maintain proportions for accurate visual analysis when dimensions are specified
- Smart Scaling: Only downsizes when necessary to preserve visual detail
- High-Quality Resampling: Better image quality for confidence assessment
- Performance Optimization: Configurable dimensions allow balancing accuracy vs. speed
Configuration Benefits for Assessment
Section titled “Configuration Benefits for Assessment”- Maximum Assessment Accuracy: Empty strings preserve full document resolution for best confidence evaluation
- Enhanced Visual Analysis: Original resolution improves confidence evaluation accuracy
- Better OCR Verification: Higher quality images help verify extraction results against visual content
- Improved Confidence Scoring: Better image quality leads to more accurate confidence assessments
- Service-Specific Tuning: Optimize image dimensions for different assessment complexity levels
- Resource Optimization: Choose between accuracy (original resolution) and performance (smaller dimensions)
Migration from Previous Versions
Section titled “Migration from Previous Versions”Previous Behavior: Empty strings defaulted to 951x1268 pixel resizing New Behavior: Empty strings preserve original image resolution
If you were relying on the previous default resizing behavior, explicitly set dimensions:
# To maintain previous default behaviorassessment: image: target_width: "951" target_height: "1268"Best Practices for Assessment
Section titled “Best Practices for Assessment”- Use Empty Strings for High Accuracy: For critical confidence assessment, use empty strings to preserve original resolution
- Consider Assessment Complexity: Complex documents with fine details benefit from higher resolution
- Test Assessment Quality: Evaluate confidence assessment accuracy with your specific document types
- Monitor Resource Usage: Higher resolution images consume more memory and processing time
- Balance Accuracy vs Performance: Choose appropriate settings based on your assessment requirements and processing volume
Granular Assessment
Section titled “Granular Assessment”Overview
Section titled “Overview”For complex documents with many attributes or large lists (such as bank statements with hundreds of transactions), the standard assessment approach can become inefficient and less accurate. The Granular Assessment feature addresses these challenges by breaking down the assessment process into smaller, focused tasks that can be processed in parallel.
When to Use Granular Assessment
Section titled “When to Use Granular Assessment”Consider granular assessment for:
- Documents with many attributes (10+ simple attributes)
- Large list structures (bank transactions, line items, etc.)
- Complex nested data (multiple group attributes)
- Performance-critical scenarios where parallel processing provides benefits
- Cost optimization when prompt caching is available
Key Benefits
Section titled “Key Benefits”- Improved Accuracy: Smaller, focused prompts lead to better LLM performance
- Cost Optimization: Leverages prompt caching to reduce token usage significantly
- Reduced Latency: Parallel processing of independent assessment tasks
- Better Scalability: Handles documents with hundreds of attributes or list items
Configuration
Section titled “Configuration”Enable granular assessment by adding the granular section to your assessment configuration:
assessment: # Standard assessment configuration model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0" temperature: 0 system_prompt: "You are an expert document analyst..." task_prompt: | Assess the confidence of extraction results for this {DOCUMENT_CLASS} document.
Attributes to assess: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
Extraction results: {EXTRACTION_RESULTS}
Document context: {DOCUMENT_TEXT} {OCR_TEXT_CONFIDENCE} {DOCUMENT_IMAGE}
Provide confidence assessments in JSON format.
# Granular assessment configuration granular: max_workers: 6 # Number of parallel threads simple_batch_size: 3 # Attributes per simple batch list_batch_size: 1 # List items per batch (usually 1)How It Works
Section titled “How It Works”The granular assessment service automatically:
- Analyzes attribute structure to determine optimal task breakdown
- Creates focused tasks:
- Simple batches: Groups of 3-5 simple attributes
- Group tasks: Individual group attributes with their sub-attributes
- List item tasks: Individual items from list attributes
- Builds cached base content with document context and images
- Processes tasks in parallel using configurable thread pool
- Aggregates results into the same format as standard assessment
Task Types
Section titled “Task Types”Simple Batch Tasks
Section titled “Simple Batch Tasks”Groups simple attributes together for efficient processing:
# Configuration with 10 simple attributesattributes: - name: "StatementDate" - name: "AccountNumber" - name: "RoutingNumber" # ... 7 more attributes
# Results in 4 tasks: [3, 3, 3, 1] attributes eachGroup Tasks
Section titled “Group Tasks”Processes complex nested structures as single units:
# Each group becomes one focused taskproperties: AccountDetails: type: object properties: AccountNumber: type: string RoutingNumber: type: string AccountType: type: stringList Item Tasks
Section titled “List Item Tasks”Assesses each list item individually for maximum accuracy:
# 100 transactions = 100 individual assessment tasksproperties: Transactions: type: array items: type: object properties: Date: type: string Description: type: string Amount: type: stringPerformance Tuning
Section titled “Performance Tuning”Batch Size Configuration
Section titled “Batch Size Configuration”granular: simple_batch_size: 3 # Smaller = more accurate, larger = faster list_batch_size: 1 # Usually keep at 1 for best accuracy max_workers: 6 # Balance between speed and resource usageModel Selection
Section titled “Model Selection”Granular assessment works best with models supporting prompt caching:
us.anthropic.claude-3-7-sonnet-20250219-v1:0(recommended)us.anthropic.claude-3-5-haiku-20241022-v1:0(cost-effective)us.amazon.nova-lite-v1:0orus.amazon.nova-pro-v1:0
Cost Optimization with Caching
Section titled “Cost Optimization with Caching”The granular approach leverages prompt caching for significant cost savings:
First Task: [Full document context] + [3 attributes] = Full costSecond Task: [Cached context] + [3 different attributes] = Cache read + new content onlyThird Task: [Cached context] + [3 different attributes] = Cache read + new content only...Typical savings: 60-80% reduction in token costs for documents with many attributes.
Usage Example
Section titled “Usage Example”from idp_common.assessment import create_assessment_service
# Load configuration with granular settingsconfig = { "assessment": { "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", "granular": { "max_workers": 6, "simple_batch_size": 3, "list_batch_size": 1 } # ... other assessment config }}
# Factory function automatically selects granular serviceassessment_service = create_assessment_service( region="us-west-2", config=config)
# Same interface as standard assessmentdocument = assessment_service.assess_document(document)Monitoring Granular Assessment
Section titled “Monitoring Granular Assessment”Granular assessment provides additional metadata:
{ "metadata": { "granular_assessment_used": true, "assessment_tasks_total": 25, "assessment_tasks_successful": 24, "assessment_tasks_failed": 1, "assessment_time_seconds": 8.5 }}Migration from Standard Assessment
Section titled “Migration from Standard Assessment”- Add granular configuration to existing assessment config
- Test with small documents first to validate behavior
- Tune batch sizes based on your document complexity
- Monitor performance and cost metrics
- Gradually roll out to production workloads
The granular service maintains full backward compatibility - existing configurations continue to work without changes.
Cost Optimization
Section titled “Cost Optimization”Token Reduction Strategy
Section titled “Token Reduction Strategy”The assessment feature implements several cost optimization techniques:
- Text Confidence Data: Uses condensed OCR confidence information instead of full raw OCR results (80-90% token reduction)
- Conditional Image Processing: Images only processed when
{DOCUMENT_IMAGE}placeholder is present - Configuration-Based Control: Assessment can be enabled/disabled via configuration
enabledproperty for flexible deployment - Efficient Prompting: Optimized prompt templates minimize token usage while maintaining accuracy
- Configurable Image Dimensions: Adjust image resolution to balance assessment quality and processing costs
- Granular Assessment with Caching: For complex documents, use granular assessment with prompt caching for 60-80% cost reduction
Testing and Validation
Section titled “Testing and Validation”End-to-End Testing
Section titled “End-to-End Testing”Use the provided notebooks for comprehensive testing:
# Standard assessment testingjupyter notebook notebooks/e2e-example-with-assessment.ipynb
# Granular assessment testingjupyter notebook notebooks/examples/step4_assessment_granular.ipynbThe notebooks demonstrate:
- Document processing with assessment enabled
- Confidence score interpretation
- Integration with existing extraction workflows
- Performance and cost analysis
- Granular assessment configuration and usage
Configuration Validation
Section titled “Configuration Validation”Assessment enforces strict configuration requirements:
# Missing prompt templateValueError: "Assessment task_prompt is required in configuration but not found"
# Invalid DOCUMENT_IMAGE usageValueError: "Invalid DOCUMENT_IMAGE placeholder usage: found 2 occurrences, but exactly 1 is required"
# Template formatting errorValueError: "Assessment prompt template formatting failed: missing required placeholder"Best Practices
Section titled “Best Practices”1. Prompt Design
Section titled “1. Prompt Design”- Be Specific: Clearly define what constitutes high vs. low confidence
- Include Examples: Provide examples of confidence reasoning in system prompts
- Use Structured Output: Request consistent JSON format for programmatic processing
2. Cost Management
Section titled “2. Cost Management”- Enable Selectively: Only enable assessment for critical document types
- Text-First: Start with text-only assessment before adding images
- Monitor Usage: Track token consumption and adjust prompts accordingly
3. Model Selection
Section titled “3. Model Selection”- Claude 3.5 Sonnet: Recommended for balanced performance and cost
- Claude 3 Haiku: Consider for high-volume, cost-sensitive scenarios
- Temperature 0: Use deterministic output for consistent confidence scoring
4. Confidence Threshold Configuration
Section titled “4. Confidence Threshold Configuration”- Risk-Based Thresholds: Set higher thresholds (0.90+) for critical financial or personal data
- Field-Specific Requirements: Use per-attribute thresholds for different data types
- Global Defaults: Establish reasonable global thresholds (0.75-0.85) as baselines
- Incremental Tuning: Start with conservative thresholds and adjust based on accuracy analysis
5. Integration Patterns
Section titled “5. Integration Patterns”- Conditional Logic: Implement business rules based on confidence scores and thresholds
- Human Review: Route low-confidence extractions (below threshold) for manual review
- Quality Metrics: Track confidence distributions to identify improvement opportunities
- Visual Feedback: Leverage color-coded UI indicators for immediate quality assessment
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”-
Assessment Not Running
- Verify
assessment.enabled: truein configuration file - Check state machine definition includes assessment step
- Confirm assessment Lambda function deployed successfully
- Verify
-
Template Errors
- Ensure
task_promptis defined in assessment configuration - Validate placeholder syntax and formatting
- Check for exactly one
{DOCUMENT_IMAGE}placeholder if using images
- Ensure
-
Poor Confidence Scores
- Review prompt templates for clarity and specificity
- Consider adding domain-specific guidance in system prompts
- Validate OCR quality and text confidence data
-
High Costs
- Monitor token usage in CloudWatch logs
- Consider text-only assessment without images
- Optimize prompt templates to reduce unnecessary context
-
Confidence Threshold Issues
- Verify
confidence_thresholdvalues are between 0.0 and 1.0 - Check explainability_info structure includes threshold data
- Ensure UI displays match expected color coding (green/red/black)
- Validate nested data confidence display for complex structures
- Verify
Monitoring
Section titled “Monitoring”Key metrics to monitor:
InputDocumentsForAssessment: Number of documents assessedassessment_time_seconds: Processing time per assessmentassessment_parsing_succeeded: Success rate of JSON parsing- Token consumption logs in CloudWatch
Related Documentation
Section titled “Related Documentation”- Pattern 2 Documentation - Assessment integration details
- Configuration Guide - Configuration schema details
- Extraction Documentation - Base extraction functionality
- Web UI Documentation - UI integration and display