OCR Image Sizing Best Practices Guide
OCR Image Sizing Best Practices Guide
Section titled “OCR Image Sizing Best Practices Guide”Overview
Section titled “Overview”The OCR service automatically applies sensible default image size limits to optimize the balance between OCR accuracy and resource consumption. This guide explains the sizing strategy and how to customize it for your specific use cases.
Default Behavior (NEW)
Section titled “Default Behavior (NEW)”Automatic Optimization
Section titled “Automatic Optimization”- Default limits: 951x1268 pixels when no image sizing is configured
- Why defaults matter: Prevents excessive token consumption, memory issues, and processing delays
- Backward compatibility: Existing explicit configurations continue to work unchanged
When Defaults Are Applied
Section titled “When Defaults Are Applied”ocr: image: # No target_width or target_height specified # → Automatic 951x1268 limits applied dpi: 150When Defaults Are NOT Applied
Section titled “When Defaults Are NOT Applied”ocr: image: # Explicit configuration provided target_width: 1200 target_height: 900 # → Your explicit values usedSizing Recommendations by Use Case
Section titled “Sizing Recommendations by Use Case”| Use Case | Dimensions | Token Usage/Page | Best For | Configuration |
|---|---|---|---|---|
| High Accuracy | 1600×1200 | 500-800 | Forms, tables, handwriting | target_width: 1600target_height: 1200 |
| Standard Documents | 1200×900 | 300-500 | Printed text, simple layouts | target_width: 1200target_height: 900 |
| Token-Conscious | 800×600 | 150-300 | Basic text extraction | target_width: 800target_height: 600 |
| Minimal Processing | 600×450 | 100-200 | Speed over accuracy | target_width: 600target_height: 450 |
| No Limits | Original | 1000-4000+ | When quality is critical | target_width: ""target_height: "" |
Cost Impact Analysis
Section titled “Cost Impact Analysis”Before Default Sizing
Section titled “Before Default Sizing”- Typical page: 1000-4000+ tokens
- 10-page document: 40,000+ tokens
- Monthly cost impact: Can be substantial for high-volume processing
With Default Sizing (951×1268)
Section titled “With Default Sizing (951×1268)”- Typical page: 400-600 tokens
- 10-page document: ~5,000 tokens
- Cost reduction: 60-85% on vision model costs
Resource Benefits
Section titled “Resource Benefits”- Memory usage: Reduced OutOfMemory errors during concurrent processing
- Processing speed: Faster uploads, downloads, and processing
- Network efficiency: Lower bandwidth consumption
Configuration Examples
Section titled “Configuration Examples”Use Automatic Defaults (Recommended)
Section titled “Use Automatic Defaults (Recommended)”ocr: image: dpi: 150 # No sizing specified = automatic 951×1268 defaults appliedHigh-Volume Text Processing
Section titled “High-Volume Text Processing”ocr: image: dpi: 150 target_width: 1200 target_height: 900 # Balances quality with token efficiencyForms and Complex Documents
Section titled “Forms and Complex Documents”ocr: image: dpi: 150 target_width: 1600 target_height: 1200 # Maximum recommended size for accuracyToken-Optimized Processing
Section titled “Token-Optimized Processing”ocr: image: dpi: 150 target_width: 800 target_height: 600 # Minimizes token usage while maintaining readabilityWorking with Configuration Systems
Section titled “Working with Configuration Systems”# Empty strings are treated the same as no configuration# This handles configuration systems that return empty strings for unset valuesocr: image: dpi: 150 target_width: "" target_height: "" # → Automatic 951x1268 defaults applied (same as if not specified)Partial Configuration (Disables Defaults)
Section titled “Partial Configuration (Disables Defaults)”ocr: image: dpi: 150 target_width: 800 # target_height missing - disables automatic defaults # → No size limits applied (preserves existing behavior)Migration Guide
Section titled “Migration Guide”For Existing Deployments
Section titled “For Existing Deployments”- No action required: Existing configurations continue to work
- Opt into defaults: Remove
target_widthandtarget_heightfrom config - Monitor improvements: Track token usage and processing performance
Performance Monitoring
Section titled “Performance Monitoring”- Monitor token consumption in LLM processing stages
- Watch for memory usage improvements during concurrent processing
- Track overall document processing times
Troubleshooting
Section titled “Troubleshooting”OCR Quality Issues
Section titled “OCR Quality Issues”- Text unclear: Increase image dimensions or check source document quality
- Tables misaligned: Try 1600×1200 or higher for complex layouts
- Handwriting errors: Use maximum recommended sizing (1600×1200)
Performance Issues
Section titled “Performance Issues”- Memory errors: Ensure sizing limits are applied (not disabled)
- Slow processing: Reduce image dimensions if quality permits
- High costs: Monitor and optimize based on use case requirements
Best Practices Summary
Section titled “Best Practices Summary”- Start with defaults: Let automatic sizing optimize your resource usage
- Measure and adjust: Monitor token usage and accuracy for your specific documents
- Use case specific: Different document types may benefit from different sizing
- Test thoroughly: Validate OCR accuracy with your specific document samples
- Monitor costs: Track token consumption impact of sizing decisions
Technical Details
Section titled “Technical Details”How Defaults Work
Section titled “How Defaults Work”- Applied when both
target_widthandtarget_heightare unspecified orNone - Fallback to defaults when invalid values are provided
- Partial configurations (only width OR height) disable defaults to preserve existing behavior
Memory Optimization
Section titled “Memory Optimization”- Images are extracted at target size to prevent memory issues
- Concurrent processing optimized to avoid OutOfMemory errors
- Aggressive memory cleanup after each page processing
Aspect Ratio Preservation
Section titled “Aspect Ratio Preservation”- All resizing preserves original aspect ratio
- Never upscales images (quality would not improve)
- Uses intelligent scaling to fit within target dimensions
Logging and Monitoring
Section titled “Logging and Monitoring”Configuration Visibility
Section titled “Configuration Visibility”INFO OCR Service initialized - DPI: 150, Image sizing: 1600x1200Default Application
Section titled “Default Application”INFO No image sizing configured, applying default limits: 1600x1200 to optimize resource usage and token consumptionExplicit Configuration
Section titled “Explicit Configuration”INFO Using configured image sizing: 800x600Error Handling
Section titled “Error Handling”WARNING Invalid resize configuration values: width=abc, height=xyz. Falling back to defaults: 1600x1200This comprehensive logging helps you understand exactly what sizing strategy is being applied and troubleshoot any configuration issues.