Customizing Classification
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Customizing Classification
Section titled “Customizing Classification”Document classification is a key component of the GenAIIDP solution that categorizes each document or page into predefined classes. This guide explains how to customize classification to best suit your document processing needs.
Classification Methods Across Patterns
Section titled “Classification Methods Across Patterns”The solution supports multiple classification approaches that vary by pattern:
Pattern 1: BDA-Based Classification
Section titled “Pattern 1: BDA-Based Classification”- Classification is performed by the BDA (Bedrock Data Automation) project configuration
- Uses BDA blueprints to define classification rules
- Not configurable inside the GenAIIDP solution itself
- Configuration happens at the BDA project level
Pattern 2: Bedrock LLM-Based Classification
Section titled “Pattern 2: Bedrock LLM-Based Classification”Pattern 2 offers two main classification approaches, configured through different templates:
MultiModal Page-Level Classification with Sequence Segmentation (default)
Section titled “MultiModal Page-Level Classification with Sequence Segmentation (default)”- Classifies each page independently using both text and image data
- Uses sequence segmentation with BIO-like tagging for document boundary detection
- Each page receives both a document type and a boundary indicator (“start” or “continue”)
- Automatically segments multi-document packets where multiple documents may be combined
- Works exceptionally well for complex document packets containing multiple documents of the same or different types
- Supports optional few-shot examples to improve classification accuracy
- Deployed when you select ‘few_shot_example_with_multimodal_page_classification’ during stack deployment
- See the few-shot-examples.md documentation for details on configuring examples
Sequence Segmentation Approach
Section titled “Sequence Segmentation Approach”The multimodal page-level classification implements a sophisticated sequence segmentation approach similar to BIO (Begin-Inside-Outside) tagging commonly used in NLP. This enables accurate segmentation of multi-document packets where a single file may contain multiple distinct documents.
How It Works:
Each page receives two pieces of information during classification:
- Document Type: The classification label (e.g., “invoice”, “letter”, “financial_statement”)
- Document Boundary: A boundary indicator that signals document transitions:
"start": Indicates the beginning of a new document (similar to “Begin” in BIO)"continue": Indicates continuation of the current document (similar to “Inside” in BIO)
Benefits of Sequence Segmentation:
- Multi-Document Packet Support: Accurately segments packets containing multiple documents
- Type-Aware Boundaries: Detects when a new document of the same type begins
- Automatic Section Creation: Pages are grouped into sections based on both type and boundaries
- Improved Accuracy: Context-aware classification that considers document flow
- No Manual Splitting Required: Eliminates the need to manually separate documents before processing
Example Segmentation:
Consider a packet with 6 pages containing two invoices and one letter:
Page 1: type="invoice", boundary="start" → Section 1 (Invoice #1)Page 2: type="invoice", boundary="continue" → Section 1 (Invoice #1)Page 3: type="letter", boundary="start" → Section 2 (Letter)Page 4: type="letter", boundary="continue" → Section 2 (Letter)Page 5: type="invoice", boundary="start" → Section 3 (Invoice #2)Page 6: type="invoice", boundary="continue" → Section 3 (Invoice #2)The system automatically creates three sections, properly separating the two invoices despite them having the same document type.
Page Context for Classification
Section titled “Page Context for Classification”The multimodal page-level classification supports including surrounding pages as context to improve classification accuracy. This is particularly useful when a single page doesn’t contain enough information to determine its document type or boundary status.
Configuration:
classification: classificationMethod: multimodalPageLevelClassification contextPagesCount: 1 # Include 1 page before and 1 page after as context # contextPagesCount: 0 # Default: no additional context (current behavior) # contextPagesCount: 2 # Include 2 pages before and 2 pages afterHow It Works:
When contextPagesCount is set to a value greater than 0, the classification prompt includes surrounding pages as additional context:
contextPagesCount: 1: Includes 1 page before and 1 page after the target pagecontextPagesCount: 2: Includes 2 pages before and 2 pages after the target page- Edge handling: At document boundaries, only available pages are included (e.g., first page has no “before” pages)
Enhanced Prompt Structure:
The system replaces the standard {DOCUMENT_TEXT} and {DOCUMENT_IMAGE} placeholders with context-aware versions that clearly separate context pages from the page being classified:
Text Context Structure:
For context, here is the OCR text for the page(s) immediately prior to the page you should classify:<context-pages-before>[OCR text from all context pages before - combined if multiple pages]</context-pages-before>
Here is the OCR text for the page to classify:<current-page>[OCR text for the page being classified]</current-page>
For context, here is the OCR text for the page(s) immediately after the page you should classify:<context-pages-after>[OCR text from all context pages after - combined if multiple pages]</context-pages-after>Image Context Structure:
For context, here are the image(s) for the page(s) immediately prior to the page you should classify:[Image 1 - context page before][Image 2 - context page before (if contextPagesCount >= 2)]
Here is the image for the page to classify:[Image - current page being classified]
For context, here are the image(s) for the page(s) immediately after the page you should classify:[Image 1 - context page after][Image 2 - context page after (if contextPagesCount >= 2)]Note: Context pages are combined within their respective sections (before or after). The structure uses descriptive text labels and XML tags (<context-pages-before>, <current-page>, <context-pages-after>) to clearly indicate which content is for context versus which content should be classified.
Benefits:
- Improved Boundary Detection: Context helps the LLM identify document transitions
- Better Classification Accuracy: Surrounding pages provide additional clues
- Handles Ambiguous Pages: Pages that look similar can be distinguished by context
- Flexible Configuration: Adjust context size based on document complexity
Use Cases:
- Documents where headers/footers span multiple pages
- Multi-page forms where individual pages look similar
- Document packages with varying page layouts
- Cases where LLM boundary detection has been unreliable
Considerations:
- Increases token usage proportionally to the number of context pages
- May increase latency due to larger prompts
- Works best when surrounding pages provide meaningful classification hints
Configuration for Boundary Detection:
The boundary detection is automatically included in the classification results. No special configuration is needed - the system will populate the document_boundary field in the metadata for each page:
{ "page_id": "1", "classification": { "doc_type": "invoice", "confidence": 0.95, "metadata": { "document_boundary": "start" // New document begins } }}Text-Based Holistic Classification
Section titled “Text-Based Holistic Classification”- Analyzes entire document packets to identify logical boundaries
- Identifies distinct document segments within multi-page documents
- Determines document type for each segment
- Better suited for multi-document packets where context spans multiple pages
- Deployed when you select the default pipeline mode configuration during stack deployment or update
The default configuration in config_library/unified/default/config.yaml implements this approach with a task prompt that instructs the model to:
- Read through the entire document package to understand its contents
- Identify page ranges that form complete, distinct documents
- Match each document segment to one of the defined document types
- Record the start and end pages for each identified segment
Example configuration:
classification: classificationMethod: textbasedHolisticClassification model: us.amazon.nova-pro-v1:0 task_prompt: >- <task-description> You are a document classification system. Your task is to analyze a document package containing multiple pages and identify distinct document segments, classifying each segment according to the predefined document types provided below. </task-description>
<document-types> {CLASS_NAMES_AND_DESCRIPTIONS} </document-types>
<document-boundary-rules> Rules for determining document boundaries: - Content continuity: Pages with continuing paragraphs, numbered sections, or ongoing narratives belong to the same document - Visual consistency: Similar layouts, headers, footers, and styling indicate pages belong together - Logical structure: Documents typically have clear beginning, middle, and end sections - New document indicators: Title pages, cover sheets, or significantly different subject matter signal a new document </document-boundary-rules>
<<CACHEPOINT>>
<document-text> {DOCUMENT_TEXT} </document-text>Limitations of Text-Based Holistic Classification
Section titled “Limitations of Text-Based Holistic Classification”Despite its strengths in handling full-document context, this method has several limitations:
Context & Model Constraints::
- Long documents can exceed the context window of smaller models, resulting in request failure.
- Lengthy inputs may dilute the model’s focus, leading to inaccurate or inconsistent classifications.
- Requires high-context models such as Amazon Nova Premier, which supports up to 1 million tokens. Smaller models are not suitable for this method.
- For more details on supported models and their context limits, refer to the Amazon Bedrock Supported Models documentation.
Scalability Challenges: Not ideal for very large or visually complex document sets. In such cases, the Multi-Modal Page-Level Classification method is more appropriate.
Pattern 3: UDOP-Based Classification
Section titled “Pattern 3: UDOP-Based Classification”- Classification is performed by a pre-trained UDOP (Unified Document Processing) model
- Model is deployed on Amazon SageMaker
- Performs multi-modal page-level classification (classifies each page based on OCR data and page image)
- Not configurable inside the GenAIIDP solution
Section Splitting Strategies
Section titled “Section Splitting Strategies”The sectionSplitting configuration controls how classified pages are grouped into document sections. This setting works with both classification methods and provides three strategies:
Available Strategies
Section titled “Available Strategies”1. disabled - No Splitting (Entire Document = One Section)
Section titled “1. disabled - No Splitting (Entire Document = One Section)”Behavior:
- All pages are assigned to a single section
- Uses majority voting to determine the document class (most common classification wins)
- Excludes unclassifiable/blank pages from voting to prevent them from affecting the result
- If there’s a tie, uses the first page’s classification for determinism
- Ignores any page-level classification boundaries
Use Cases:
- Documents known to be single-type with no internal divisions
- Simplified processing where granular section splitting isn’t needed
- When you want to force all pages to be treated as one cohesive document
- Documents with occasional blank or unclassifiable pages (these won’t affect the final classification)
Configuration Example:
classification: sectionSplitting: disabled classificationMethod: multimodalPageLevelClassificationResult:
- Document with 10 pages → 1 section containing all 10 pages
- All pages assigned the most common (voted) class
Voting Behavior:
The disabled strategy uses majority voting to determine the document classification, which provides robust handling of edge cases:
-
Config-Driven Voting: Only pages whose classification matches a valid document type defined in your configuration are eligible to vote. This automatically excludes:
- Blank pages (
unclassifiable_blank_page,blank, etc.) - Error states (
error (backoff/retry),unclassified) - LLM hallucinations or typos that don’t match any defined class
- Blank pages (
-
Majority Wins: The classification that appears most frequently among votable pages becomes the document classification.
-
Tie-Breaking: If multiple classifications have the same count, the classification from the earliest page (by page number) is used for determinism.
-
Fallback: If no pages have valid classifications (all are unclassifiable types), the first page’s classification is used.
Example:
6-page document with classifications:- Page 1: DRILLING_PLAN_GEOLOGIC- Page 2: DRILLING_PLAN_GEOLOGIC- Page 3: DRILLING_PLAN_GEOLOGIC- Page 4: DRILLING_PLAN_GEOLOGIC- Page 5: DRILLING_PLAN_GEOLOGIC- Page 6: unclassifiable_blank_page (excluded from voting)
Voting result: DRILLING_PLAN_GEOLOGIC (5 votes)→ Entire document classified as DRILLING_PLAN_GEOLOGICGitHub Issue Reference: This voting behavior addresses Issue #167 where documents with blank last pages were incorrectly classified as the blank page type.
2. page - Per-Page Splitting (Each Page = Own Section)
Section titled “2. page - Per-Page Splitting (Each Page = Own Section)”Behavior:
- Every page becomes an independent section
- Each page keeps its individually classified document type
- Prevents automatic joining of same-type documents
Use Cases:
- Critical for long documents with multiple same-type forms (e.g., multiple W-2 forms, multiple invoices)
- When LLM boundary detection is unreliable or fails frequently
- Government form processing where each form must be processed independently
- Scenarios where deterministic splitting is required
Configuration Example:
classification: sectionSplitting: page classificationMethod: multimodalPageLevelClassificationResult:
- Document with 10 pages → 10 sections (one per page)
- Each page maintains its individual classification
GitHub Issue Reference: This strategy directly addresses Issue #146 where long documents with multiple same-type forms were being incorrectly joined together.
3. llm_determined - LLM Boundary Detection (Default)
Section titled “3. llm_determined - LLM Boundary Detection (Default)”Behavior:
- Uses “Start”/“Continue” boundary indicators from LLM responses
- Automatically groups related pages into logical sections
- Implements BIO-like tagging for sophisticated document segmentation
Use Cases:
- Complex multi-document packets requiring intelligent boundary detection
- When LLM boundary detection works reliably
- Default behavior that works well for most use cases
Configuration Example:
classification: sectionSplitting: llm_determined # This is the default classificationMethod: multimodalPageLevelClassificationResult:
- Document with 10 pages → Variable number of sections based on LLM boundary detection
- Pages grouped according to document boundaries and type changes
Strategy Comparison Table
Section titled “Strategy Comparison Table”| Strategy | Sections Created | Boundary Detection | Same-Type Handling | Deterministic | Performance |
|---|---|---|---|---|---|
disabled | 1 section always | None | All joined | Yes | Fastest |
page | N sections (N pages) | Per-page | Never joined | Yes | Fast |
llm_determined | Variable | LLM boundaries | May join | No | Standard |
Configuration Placement
Section titled “Configuration Placement”The sectionSplitting setting is placed in the classification configuration section:
classification: model: us.amazon.nova-pro-v1:0 classificationMethod: multimodalPageLevelClassification sectionSplitting: page # Options: disabled, page, llm_determined maxPagesForClassification: "ALL" temperature: "0.0" # ... other classification settingsInteraction with Classification Methods
Section titled “Interaction with Classification Methods”The sectionSplitting setting works with both classification methods:
With multimodalPageLevelClassification:
disabled: First page’s class applies to all pages in one sectionpage: Each page’s individual classification preserved in separate sectionsllm_determined: Pages grouped by class + boundary metadata
With textbasedHolisticClassification:
disabled: First segment’s class applies to all pages in one sectionpage: Each page gets its own section with the class assigned by holistic methodllm_determined: LLM-determined segments used as sections (default behavior)
Real-World Example: Multiple W-2 Forms
Section titled “Real-World Example: Multiple W-2 Forms”Consider a 6-page document containing three W-2 forms (2 pages each):
With sectionSplitting: llm_determined (may work or may fail):
Result depends on LLM boundary detection accuracyBest case: 3 sections (one per W-2)Worst case: 1 section (all W-2s incorrectly joined)With sectionSplitting: page (deterministic solution):
Page 1 → Section 1 (W-2)Page 2 → Section 2 (W-2)Page 3 → Section 3 (W-2)Page 4 → Section 4 (W-2)Page 5 → Section 5 (W-2)Page 6 → Section 6 (W-2)
Result: 6 independent sectionsEach W-2 page processed separatelyNo risk of incorrect joiningWith sectionSplitting: disabled (simplest case):
All 6 pages → Section 1 (W-2)
Result: Single sectionEntire document treated as one unitChoosing Between Classification Methods
Section titled “Choosing Between Classification Methods”When deciding between Text-Based Holistic Classification and MultiModal Page-Level Classification with Sequence Segmentation, consider these factors:
Use Text-Based Holistic Classification When:
Section titled “Use Text-Based Holistic Classification When:”- Documents have clear logical boundaries based on content
- Text context spans multiple pages and requires understanding the full document
- You have access to high-context models (e.g., Amazon Nova Premier)
- Document packets are relatively small (within model context limits)
- Visual elements are less important than textual continuity
Use MultiModal Page-Level Classification with Sequence Segmentation When:
Section titled “Use MultiModal Page-Level Classification with Sequence Segmentation When:”- Document packets contain multiple documents of the same type (e.g., multiple invoices)
- Visual layout and image content are important for classification
- You need to process very large document packets that might exceed context limits
- Documents have clear visual boundaries (headers, footers, different layouts)
- You want to leverage both text and image information for better accuracy
- Processing speed is important (parallel page processing is possible)
Comparison Table
Section titled “Comparison Table”| Feature | Text-Based Holistic | MultiModal Page-Level with Sequence Segmentation |
|---|---|---|
| Context Awareness | Full document context | Page-level with boundary detection |
| Multi-document Packets | Good | Excellent (handles same-type documents) |
| Visual Processing | Text only | Text + Images |
| Model Requirements | High-context models | Standard models |
| Processing Speed | Sequential | Can be parallelized |
| Boundary Detection | Content-based | BIO-like tagging |
| Large Documents | Limited by context | No practical limit |
Customizing Classification in Pattern 2
Section titled “Customizing Classification in Pattern 2”Configuration Settings
Section titled “Configuration Settings”Page Limit Configuration
Section titled “Page Limit Configuration”Control how many pages are used for classification:
classification: maxPagesForClassification: "ALL" # Default: use all pages # Or: "1", "2", "3", etc. - use only first N pagesImportant: When set to a number (e.g., "3"), only the first N pages are classified, but the result is applied to ALL pages in the document. This forces the entire document to be assigned a single class with one section.
Prompt Components
Section titled “Prompt Components”In Pattern 2, you can customize classification behavior through various prompt components:
System Prompts
Section titled “System Prompts”Define overall model behavior and constraints:
system_prompt: | You are an expert document classifier specializing in financial and business documents. Your task is to analyze document images and classify them into predefined categories. Focus on visual layout, textual content, and common patterns found in each document type. When in doubt, analyze the most prominent features like headers, logos, and form fields.Task Prompts
Section titled “Task Prompts”Specify classification instructions and formatting:
task_prompt: | Analyze the following document page and classify it into one of these categories: {{document_classes}}
Return ONLY the document class name without additional explanations. If the document doesn't fit any of the provided classes, classify it as "other".Class Descriptions
Section titled “Class Descriptions”Provide detailed descriptions for each document category:
document_classes: invoice: description: "A commercial document issued by a seller to a buyer, related to a sale transaction and indicating the products, quantities, and agreed prices for products or services." receipt: description: "A document acknowledging that something of value has been received, often as proof of payment." bank_statement: description: "A document issued by a bank showing transactions and balances for a specific account over a defined period."Using CachePoint for Classification
Section titled “Using CachePoint for Classification”The solution integrates with Amazon Bedrock CachePoint for improved performance:
- Caches frequently used prompts and responses
- Reduces latency for similar classification requests
- Optimizes costs through response reuse
- Automatic cache management and expiration
CachePoint is particularly beneficial with few-shot examples, as these can add significant token count to prompts. The <<CACHEPOINT>> delimiter in prompt templates separates:
- Static portion (before CACHEPOINT): Class definitions, few-shot examples, instructions
- Dynamic portion (after CACHEPOINT): The specific document being processed
This approach allows the static portion to be cached and reused across multiple document processing requests, while only the dynamic portion varies per document, significantly reducing costs and improving performance.
Example task prompt with CachePoint for few-shot examples:
classification: task_prompt: | Classify this document into exactly one of these categories:
{CLASS_NAMES_AND_DESCRIPTIONS}
<few_shot_examples> {FEW_SHOT_EXAMPLES} </few_shot_examples>
<<CACHEPOINT>>
<document_content> {DOCUMENT_TEXT} </document_content>Document Classes
Section titled “Document Classes”Standard Document Classes
Section titled “Standard Document Classes”The solution includes standard document classes based on the RVL-CDIP dataset:
letter: Formal written correspondenceform: Structured documents with fieldsemail: Digital messages with headershandwritten: Documents with handwritten contentadvertisement: Marketing materialsscientific_report: Research documentsscientific_publication: Academic papersspecification: Technical specificationsfile_folder: Organizational documentsnews_article: Journalistic contentbudget: Financial planning documentsinvoice: Commercial billing documentspresentation: Slide-based documentsquestionnaire: Survey formsresume: Employment documentsmemo: Internal communications
Custom Document Classes
Section titled “Custom Document Classes”You can define custom document classes through the Web UI configuration:
- Navigate to the Configuration section
- Select the Document Classes tab
- Click “Add New Class”
- Provide:
- Class name (machine-readable identifier)
- Display name (human-readable name)
- Detailed description (to guide the classification model)
- Save changes
Image Placement with {DOCUMENT_IMAGE} Placeholder
Section titled “Image Placement with {DOCUMENT_IMAGE} Placeholder”Pattern 2 supports precise control over where document images are positioned within your classification prompts using the {DOCUMENT_IMAGE} placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
How {DOCUMENT_IMAGE} Works
Section titled “How {DOCUMENT_IMAGE} Works”Without Placeholder (Default Behavior):
classification: task_prompt: | Analyze this document:
{DOCUMENT_TEXT}
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}Images are automatically appended after the text content.
With Placeholder (Controlled Placement):
classification: task_prompt: | Analyze this document:
{DOCUMENT_IMAGE}
Text content: {DOCUMENT_TEXT}
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}Images are inserted exactly where {DOCUMENT_IMAGE} appears in the prompt.
Usage Examples
Section titled “Usage Examples”Image Before Text Analysis:
task_prompt: | Look at this document image first:
{DOCUMENT_IMAGE}
Now read the extracted text: {DOCUMENT_TEXT}
Based on both the visual layout and text content, classify this document as one of: {CLASS_NAMES_AND_DESCRIPTIONS}Image in the Middle for Context:
task_prompt: | You are classifying business documents. Here are the possible types: {CLASS_NAMES_AND_DESCRIPTIONS}
Examine this document image: {DOCUMENT_IMAGE}
Additional text content extracted from the document: {DOCUMENT_TEXT}
Classification:Integration with Few-Shot Examples
Section titled “Integration with Few-Shot Examples”The {DOCUMENT_IMAGE} placeholder works seamlessly with few-shot examples:
classification: task_prompt: | Here are examples of each document type: {FEW_SHOT_EXAMPLES}
Now classify this new document: {DOCUMENT_IMAGE}
Text: {DOCUMENT_TEXT}
Classification: {CLASS_NAMES_AND_DESCRIPTIONS}Benefits
Section titled “Benefits”- 🎯 Contextual Placement: Position images where they provide maximum context
- 📱 Better Multimodal Understanding: Help models correlate visual and textual information
- 🔄 Flexible Prompt Design: Create prompts that flow naturally between different content types
- ⚡ Improved Performance: Strategic image placement can improve classification accuracy
- 🔒 Backward Compatible: Existing prompts without the placeholder continue to work unchanged
Multi-Page Documents
Section titled “Multi-Page Documents”For documents with multiple pages, the system provides comprehensive image support:
- No Image Limits: All document pages are processed following Bedrock API removal of image count restrictions
- Info Logging: System logs image counts for monitoring and debugging purposes
- Automatic Pagination: Images are processed in page order for all pages
Setting Up Few Shot Examples in Pattern 2
Section titled “Setting Up Few Shot Examples in Pattern 2”Pattern 2’s multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the ‘few_shot_example_with_multimodal_page_classification’ configuration.
Benefits of Few-Shot Examples
Section titled “Benefits of Few-Shot Examples”- 🎯 Improved Accuracy: Models understand document patterns better through concrete examples
- 📏 Consistent Output: Examples establish exact structure and formatting standards
- 🚫 Reduced Hallucination: Examples reduce likelihood of made-up classifications
- 🔧 Domain Adaptation: Examples help models understand domain-specific terminology
- 💰 Cost Effectiveness with Caching: Using prompt caching with few-shot examples significantly reduces costs
Few Shot Example Configuration
Section titled “Few Shot Example Configuration”In Pattern 2, few-shot examples are configured within document class definitions using JSON Schema format:
classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Letter x-aws-idp-document-type: Letter type: object description: "A formal written correspondence..." properties: SenderName: type: string description: "The name of the person who wrote the letter..." x-aws-idp-examples: - x-aws-idp-class-prompt: "This is an example of the class 'Letter'" name: "Letter1" x-aws-idp-image-path: "config_library/unified/your_config/example-images/letter1.jpg" - x-aws-idp-class-prompt: "This is an example of the class 'Letter'" name: "Letter2" x-aws-idp-image-path: "config_library/unified/your_config/example-images/letter2.png"Example Image Path Support
Section titled “Example Image Path Support”The imagePath field supports multiple formats:
- Single Image File:
"config_library/unified/examples/letter1.jpg" - Local Directory with Multiple Images:
"config_library/unified/examples/letters/" - S3 Prefix with Multiple Images:
"s3://my-config-bucket/examples/letter/" - Direct S3 Image URI:
"s3://my-config-bucket/examples/letter1.jpg"
For comprehensive details on configuring few-shot examples, including multimodal vs. text-only approaches, example management, and advanced features, refer to the few-shot-examples.md documentation.
Image Processing Configuration
Section titled “Image Processing Configuration”The classification service supports configurable image dimensions for optimal performance and quality:
New Default Behavior (Preserves Original Resolution)
Section titled “New Default Behavior (Preserves Original Resolution)”Important Change: Empty strings or unspecified image dimensions now preserve the original document resolution for maximum classification accuracy:
classification: model: us.amazon.nova-pro-v1:0 # Image processing settings - preserves original resolution image: target_width: "" # Empty string = no resizing (recommended) target_height: "" # Empty string = no resizing (recommended)Custom Image Dimensions
Section titled “Custom Image Dimensions”Configure specific dimensions when performance optimization is needed:
# For high-accuracy classification with controlled dimensionsclassification: image: target_width: "1200" # Resize to 1200 pixels wide target_height: "1600" # Resize to 1600 pixels tall
# For fast processing with lower resolutionclassification: image: target_width: "600" # Smaller for faster processing target_height: "800" # Maintains reasonable qualityImage Resizing Features
Section titled “Image Resizing Features”- Original Resolution Preservation: Empty strings preserve full document resolution for maximum accuracy
- Aspect Ratio Preservation: Images are resized proportionally without distortion when dimensions are specified
- Smart Scaling: Only downsizes images when necessary (scale factor < 1.0)
- High-Quality Resampling: Better visual quality after resizing
- Performance Optimization: Configurable dimensions allow balancing accuracy vs. speed
Configuration Benefits
Section titled “Configuration Benefits”- Maximum Classification Accuracy: Empty strings preserve full document resolution for best results
- Service-Specific Tuning: Each service can use optimal image dimensions
- Runtime Configuration: No code changes needed to adjust image processing
- Backward Compatibility: Existing numeric values continue to work as before
- Memory Optimization: Configurable dimensions allow resource optimization
- Better Resource Utilization: Choose between accuracy (original resolution) and performance (smaller dimensions)
Migration from Previous Versions
Section titled “Migration from Previous Versions”Previous Behavior: Empty strings defaulted to 951x1268 pixel resizing New Behavior: Empty strings preserve original image resolution
If you were relying on the previous default resizing behavior, explicitly set dimensions:
# To maintain previous default behaviorclassification: image: target_width: "951" target_height: "1268"Best Practices for Classification
Section titled “Best Practices for Classification”- Use Empty Strings for High Accuracy: For critical document classification, use empty strings to preserve original resolution
- Consider Document Types: Complex layouts benefit from higher resolution, simple text documents may work well with smaller dimensions
- Test Performance Impact: Higher resolution images provide better accuracy but consume more resources
- Monitor Processing Time: Balance classification accuracy with processing speed based on your requirements
JSON and YAML Output Support
Section titled “JSON and YAML Output Support”The classification service supports both JSON and YAML output formats from LLM responses, with automatic format detection and parsing:
Automatic Format Detection
Section titled “Automatic Format Detection”The system automatically detects whether the LLM response is in JSON or YAML format:
# JSON response (traditional)classification: task_prompt: | Classify this document and respond with JSON: {"class": "invoice", "confidence": 0.95}
# YAML response (more token-efficient)classification: task_prompt: | Classify this document and respond with YAML: class: invoice confidence: 0.95Token Efficiency Benefits
Section titled “Token Efficiency Benefits”YAML format provides significant token savings:
- 10-30% fewer tokens than equivalent JSON
- No quotes required around keys
- More compact syntax for nested structures
- Natural support for multiline content
Example Prompt Configurations
Section titled “Example Prompt Configurations”JSON-focused prompt:
classification: system_prompt: | You are a document classifier. Respond only with JSON format. task_prompt: | Classify this document and return a JSON object with the class name and confidence score.YAML-focused prompt:
classification: system_prompt: | You are a document classifier. Respond only with YAML format. task_prompt: | Classify this document and return YAML with the class name and confidence score.Backward Compatibility
Section titled “Backward Compatibility”- All existing JSON-based prompts continue to work unchanged
- The system automatically detects and parses both formats
- No configuration changes required for existing deployments
- Intelligent fallback between formats if parsing fails
Implementation Details
Section titled “Implementation Details”The classification service uses the new extract_structured_data_from_text() function which:
- Automatically detects JSON vs YAML format
- Provides robust parsing with multiple extraction strategies
- Handles malformed content gracefully
- Returns both parsed data and detected format for logging
Regex-Based Classification for Performance Optimization
Section titled “Regex-Based Classification for Performance Optimization”Pattern 2 now supports optional regex-based classification that can provide significant performance improvements and cost savings by bypassing LLM calls when document patterns are recognized.
Document Name Regex (All Pages Same Class)
Section titled “Document Name Regex (All Pages Same Class)”When you want all pages of a document to be classified as the same class, you can use document name regex to instantly classify entire documents based on their filename or ID:
classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Payslip x-aws-idp-document-type: Payslip type: object description: "Employee wage statement showing earnings and deductions" x-aws-idp-document-name-regex: "(?i).*(payslip|paystub|salary|wage).*" properties: EmployeeName: type: string description: "Name of the employee"Benefits:
- Instant Classification: Entire document classified without any LLM calls
- Massive Performance Gains: ~100-1000x faster than LLM classification
- Zero Token Usage: Complete elimination of API costs for matched documents
- Deterministic Results: Consistent classification for known patterns
When document ID matches the pattern:
- All pages are immediately classified as the matching class
- Single section is created containing all pages
- No backend service calls are made
- Info logging confirms regex match
Page Content Regex (Multi-Modal Page-Level Classification)
Section titled “Page Content Regex (Multi-Modal Page-Level Classification)”For multi-class configurations using page-level classification, you can use page content regex to classify individual pages based on text patterns:
classification: classificationMethod: multimodalPageLevelClassification
classes: - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Invoice x-aws-idp-document-type: Invoice type: object description: "Business invoice document" x-aws-idp-document-page-content-regex: "(?i)(invoice\\s+number|bill\\s+to|amount\\s+due)" properties: InvoiceNumber: type: string description: "Invoice number" - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Payslip x-aws-idp-document-type: Payslip type: object description: "Employee wage statement" x-aws-idp-document-page-content-regex: "(?i)(gross\\s+pay|net\\s+pay|employee\\s+id)" properties: EmployeeName: type: string description: "Employee name" - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Other x-aws-idp-document-type: Other type: object description: "Documents that don't match specific patterns" # No regex - will always use LLM properties: {}Benefits:
- Selective Performance Gains: Pages matching patterns are classified instantly
- Mixed Processing: Some pages use regex, others fall back to LLM
- Cost Optimization: Reduced token usage proportional to regex matches
- Maintained Accuracy: LLM fallback ensures all pages are properly classified
How it works:
- Each page’s text content is checked against all class regex patterns
- First matching pattern wins and classifies the page instantly
- Pages with no matches use standard LLM classification
- Results are seamlessly integrated into document sections
Regex Pattern Best Practices
Section titled “Regex Pattern Best Practices”-
Case-Insensitive Matching: Always use
(?i)flag(?i).*(invoice|bill).* # Matches any case variation -
Flexible Whitespace: Use
\\s+for varying spaces/tabs(?i)(gross\\s+pay|net\\s+pay) # Handles "gross pay", "gross pay" -
Multiple Alternatives: Use
|for different terms(?i).*(payslip|paystub|salary|wage).* # Any of these terms -
Balanced Specificity: Specific enough to avoid false matches
# Good: Specific to W2 forms(?i)(form\\s+w-?2|wage\\s+and\\s+tax|employer\\s+identification)# Too broad: Could match many documents(?i)(form|wage|tax)
Performance Analysis
Section titled “Performance Analysis”Use notebooks/examples/step2_classification_with_regex.ipynb to:
- Test regex patterns against your documents
- Compare processing speeds (regex vs LLM)
- Analyze cost savings through token usage reduction
- Validate classification accuracy
- Debug pattern matching behavior
Error Handling
Section titled “Error Handling”The regex system includes robust error handling:
- Invalid Patterns: Compilation errors are logged, system falls back to LLM
- Runtime Failures: Pattern matching errors default to LLM classification
- Graceful Degradation: Service continues working with invalid regex
- Comprehensive Logging: Detailed logs for debugging pattern issues
Configuration Examples
Section titled “Configuration Examples”Common Document Types:
classes: # W2 Tax Forms - $schema: "https://json-schema.org/draft/2020-12/schema" $id: W2 x-aws-idp-document-type: W2 type: object description: "W2 Tax Form" x-aws-idp-document-page-content-regex: "(?i)(form\\s+w-?2|wage\\s+and\\s+tax|social\\s+security)" properties: {}
# Bank Statements - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Bank-Statement x-aws-idp-document-type: Bank-Statement type: object description: "Bank Statement" x-aws-idp-document-page-content-regex: "(?i)(account\\s+number|statement\\s+period|beginning\\s+balance)" properties: {}
# Driver Licenses - $schema: "https://json-schema.org/draft/2020-12/schema" $id: US-drivers-licenses x-aws-idp-document-type: US-drivers-licenses type: object description: "US Driver's License" x-aws-idp-document-page-content-regex: "(?i)(driver\\s+license|state\\s+id|date\\s+of\\s+birth)" properties: {}
# Invoices - $schema: "https://json-schema.org/draft/2020-12/schema" $id: Invoice x-aws-idp-document-type: Invoice type: object description: "Invoice" x-aws-idp-document-page-content-regex: "(?i)(invoice\\s+number|bill\\s+to|remit\\s+payment)" properties: {}Best Practices for Classification
Section titled “Best Practices for Classification”- Provide Clear Class Descriptions: Include distinctive features and common elements
- Use Few Shot Examples: Include 2-3 diverse examples per class
- Choose the Right Method: Use page-level with sequence segmentation for multi-document packets, holistic for context-dependent documents
- Balance Class Coverage: Ensure all expected document types have classes
- Monitor and Refine: Use the evaluation framework to track classification accuracy
- Consider Visual Elements: Describe visual layout and design patterns in class descriptions
- Test with Real Documents: Validate classification against actual document samples
- Optimize Image Dimensions: Configure appropriate image sizes based on document complexity and processing requirements
- Balance Quality vs Performance: Higher resolution images provide better accuracy but consume more resources
- Consider Output Format: Use YAML prompts for token efficiency, especially with complex nested responses
- Leverage Format Flexibility: Take advantage of automatic format detection to optimize prompts for different use cases
- Understand Boundary Indicators: Review the
document_boundarymetadata to understand how documents are being segmented - Handle Multi-Document Packets: Use sequence segmentation when processing files containing multiple documents of the same type
- Test Segmentation Logic: Verify that documents are correctly separated by reviewing section boundaries in the results
- Consider Document Flow: Ensure your document classes account for typical document structures (headers, body, footers)
- Leverage BIO-like Tagging: Take advantage of the automatic boundary detection to eliminate manual document splitting
- Use Regex for Known Patterns: Add regex patterns for document types with predictable content or naming conventions
- Test Regex Thoroughly: Validate regex patterns against diverse document samples before production use
- Balance Regex Specificity: Make patterns specific enough to avoid false matches but flexible enough to catch variations
- Monitor Regex Performance: Track how often regex patterns match vs fall back to LLM classification