Capacity Planning
Capacity Planning
Section titled “Capacity Planning”Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Overview
Section titled “Overview”The GenAI IDP accelerator includes comprehensive capacity planning capabilities to help you optimize document processing performance, predict resource requirements, and ensure your system can handle expected workloads. This system provides real-time capacity analysis, AWS service quota recommendations, and performance optimization guidance.
This feature is designed for Pattern 2 and the Unified pattern due to their well-defined processing steps (OCR, Classification, Extraction, Assessment, Summarization) and predictable Bedrock model quota requirements. The Unified pattern uses the same Bedrock pipeline as Pattern 2 for capacity planning purposes.
Key Benefits
Section titled “Key Benefits”- Predictive Analysis: Calculate processing capacity requirements before scaling production workloads
- Cost Optimization: Right-size AWS resources based on actual processing patterns
- Performance Planning: Identify bottlenecks and optimize processing pipelines
- Quota Management: Automatically calculate required AWS service quotas (TPM and RPM)
- Load Distribution: Plan processing schedules to maximize throughput
- Real-time Monitoring: Track capacity utilization and adjust dynamically
Architecture Overview
Section titled “Architecture Overview”The capacity planning system consists of several integrated components that work together to provide comprehensive capacity analysis:
Core Components
Section titled “Core Components”- GraphQL Resolver:
CalculateCapacityResolverFunctionthat handles capacity calculation requests - Capacity Calculation Engine: Lambda-based processing engine that analyzes document requirements and generates recommendations
- Web UI Interface: Interactive React-based capacity planning calculator with real-time visualizations
- Token Usage Analysis: Automatic population of token usage from processed documents’ metering data
- Quota Analysis: Automated AWS service quota requirement calculation (TPM and RPM) with direct links to AWS console
- Latency Distribution Modeling: Statistical analysis of processing times with percentile-based predictions from real document data
Data Flow
Section titled “Data Flow”- Input Configuration: Users define document types, token usage, and processing schedules through the Web UI
- Historical Analysis: System extracts token usage and page counts from processed documents’ metering data
- Capacity Calculation: GraphQL resolver invokes calculation engine to process requirements
- Real Metrics: Processing times and queue delays are calculated from actual processed document data
- Quota Assessment: System calculates required AWS service quotas (TPM and RPM) with direct console links
- Visualization: Results displayed through interactive dashboard with hourly token distribution charts
Capacity Planning Features
Section titled “Capacity Planning Features”1. Interactive Capacity Calculator
Section titled “1. Interactive Capacity Calculator”The Web UI provides an intuitive interface for capacity planning with real-time token usage population:
Document Configuration:
- Document type selection only from classes defined in View/Edit Configuration
- Average pages per document (automatically extracted from processed documents’ metering data)
- Token usage configuration for each processing step (OCR, Classification, Extraction, Assessment, Summarization)
- Support for automatic token and page count population from processed documents’ metering data
- CSV import/export functionality for bulk configuration
Processing Schedule Configuration:
- Hourly processing schedule with document type and volume specification
- Visual time slot selection (24-hour format)
- Document type filtering based on configured document types only
- CSV import/export for schedule management
Real-time Token Analysis:
- Hourly token distribution visualization with stacked bar charts
- Peak hour analysis with load distribution insights
- Processing time percentile analysis (P50, P75, P90, P95, P99) from real document data
- Quota-based status indicators (green/blue when within quota, red when quota exceeded)
2. Advanced Analytics Engine
Section titled “2. Advanced Analytics Engine”The capacity calculation system provides sophisticated analysis through GraphQL resolvers:
Token Usage Extraction:
- Automatic extraction from processed documents’ metering data
- Context-aware parsing of OCR, classification, extraction, assessment, and summarization tokens
- Request count estimation based on average requests per document from metering data
- Page count extraction from multiple sources (OCR requests, document fields, sections)
Latency Distribution Modeling:
- Statistical analysis of processing times from actual processed documents (P50, P75, P90, P95, P99)
- Queue delay calculation from real QueuedTime/WorkflowStartTime timestamps
- SLA compliance checking against configured maximum latency (in seconds)
- Performance warning alerts for quota exceedances
RPM (Requests Per Minute) Calculation:
- Calculates average requests per document from metering data samples
- Multiplies by scheduled documents per hour to get total requests per hour
- Applies SLA factor for peak demand calculation
- Formula:
avg_requests_per_doc × scheduled_docs_per_hour / 60 × sla_factor
3. Document Token Usage Population
Section titled “3. Document Token Usage Population”Automatic Token Population from Processed Documents:
- Integration with Documents context to access processed document data
- Extraction of token usage from metering data structure
- Support for multiple document selection and batch population
- Document picker modal with filtering and selection capabilities
- Single-class document validation - multi-class documents are not supported
Metering Data Processing:
- Context-prefixed key parsing (OCR/, Classification/, Extraction/, Assessment/, Summarization/, BDAProject/bda/)
- Token count aggregation from inputTokens, outputTokens, and totalTokens fields
- Request count calculation: average requests per document from metering samples
- Page count extraction from multiple sources:
- OCR Bedrock requests (
metrics.requestsas page count) - Metering
pages,pageCount,PageCountfields - Document-level
PageCount,pageCount,Pagesfields - Sections
EndPagefield as fallback
- OCR Bedrock requests (
Document Type Validation:
- Document Type dropdown only shows classes from View/Edit Configuration
- Documents must be classified with types defined in configuration
- Multi-class documents are rejected with validation error
- Unclassified documents are not supported
4. AWS Service Quota Analysis
Section titled “4. AWS Service Quota Analysis”Automated Quota Calculation:
- Dynamic model configuration from deployment settings
- Tokens Per Minute (TPM) quota requirements for each model
- Requests Per Minute (RPM) quota requirements for each model
- Regional quota availability analysis with direct console links
Quota Requirements Display:
- Separate tables for TPM and RPM quotas
- Current vs. required quota comparison with utilization percentages
- Status indicators: ✅ Sufficient, ⚠️ Increase Needed
- Direct “Request Increase” buttons linking to AWS Service Quotas console
Latency Bar Color Coding:
- Green (P50, P75): When all model quotas are within limits
- Blue (P90, P95, P99): When all model quotas are within limits
- Red: Only when any model quota is exceeded (shows “Increase Needed”)
Environment Configuration Requirements (Lambda):
BEDROCK_MODEL_QUOTA_CODES: JSON mapping of model IDs to TPM quota codes (required)BEDROCK_MODEL_RPM_QUOTA_CODES: JSON mapping of model IDs to RPM quota codes (required)METERING_TABLE_NAME: DynamoDB table for metering data (required)TRACKING_TABLE: DynamoDB table for document tracking (required)LAMBDA_MEMORY_GB: Lambda memory size for gb_seconds to seconds conversion (required)
Configuration and Customization
Section titled “Configuration and Customization”Stack-Level Parameters
Section titled “Stack-Level Parameters”Core Capacity Settings:
MaxConcurrentWorkflows: Maximum parallel executions (default: 100)DataRetentionInDays: Data retention period (default: 365)ErrorThreshold: Error alerting threshold (default: 1)ExecutionTimeThresholdMs: Processing timeout (default: 300000ms)
Processing Configuration:
LogLevel: Logging verbosity (DEBUG, INFO, WARN, ERROR)LogRetentionDays: CloudWatch log retention (default: 30)EnableXRayTracing: Distributed tracing enablementEnableMCP: Model Context Protocol integration
Environment Variables (Lambda)
Section titled “Environment Variables (Lambda)”Required Lambda Environment Variables:
BEDROCK_MODEL_QUOTA_CODES: JSON mapping of Bedrock model IDs to TPM quota codesBEDROCK_MODEL_RPM_QUOTA_CODES: JSON mapping of Bedrock model IDs to RPM quota codesMETERING_TABLE_NAME: DynamoDB table name for metering dataTRACKING_TABLE: DynamoDB table name for document trackingLAMBDA_MEMORY_GB: Lambda memory size (e.g., “1.0” for 1GB)MIN_TOKENS_PER_REQUEST: Minimum tokens per request for safety calculations
Complexity Thresholds (for recommendations):
MEDIUM_COMPLEXITY_THRESHOLD: Token density threshold for medium complexityHIGH_COMPLEXITY_THRESHOLD: Token density threshold for high complexityPAGE_COMPLEXITY_FACTOR: Multiplier for page count complexityHIGH_COMPLEXITY_MULTIPLIER: Multiplier for high complexity documentsMEDIUM_COMPLEXITY_MULTIPLIER: Multiplier for medium complexity documents
Recommendation Thresholds:
RECOMMENDATION_HIGH_COMPLEXITY_THRESHOLD: Complexity factor threshold for high complexity warningRECOMMENDATION_MEDIUM_COMPLEXITY_THRESHOLD: Complexity factor threshold for medium complexity warningRECOMMENDATION_HIGH_LOAD_THRESHOLD: Load factor threshold for high load warningRECOMMENDATION_MEDIUM_LOAD_THRESHOLD: Load factor threshold for medium load warningRECOMMENDATION_HIGH_LATENCY_THRESHOLD: P99 latency threshold (seconds) for high latency warningRECOMMENDATION_LARGE_DOC_THRESHOLD: Token threshold for large document warningRECOMMENDATION_HIGH_PAGE_THRESHOLD: Page count threshold for high page warning
Environment Variables (UI)
Section titled “Environment Variables (UI)”UI Configuration Parameters:
VITE_DEFAULT_MAX_LATENCY: Default maximum latency setting in seconds (default: 60)VITE_DEFAULT_TOKENS_BY_STEP: JSON object with default token limits per processing stepVITE_DEFAULT_MAX_TOKENS_PER_REQUEST: Default maximum tokens per API request (4000)VITE_BDA_TOKENS_PER_PAGE: Estimated tokens per page for BDA pattern processing (2000)VITE_AWS_REGION: AWS region for console URL generation (required)VITE_BEDROCK_MODEL_QUOTA_CODES: Optional JSON mapping for direct quota code links
Using the Capacity Planning System
Section titled “Using the Capacity Planning System”1. Accessing the Capacity Planner
Section titled “1. Accessing the Capacity Planner”⚠️ Pattern 2 or Unified Only: This feature is available for Pattern 2 and Unified pattern deployments. The navigation link will not appear for Pattern 1 or Pattern 3.
Navigate to the Web UI and select the “Capacity Planning” section:
- Prerequisites:
- Must be using Pattern 2 or Unified pattern (Textract + Bedrock pipeline)
- Process some documents first to populate metering data
- Documents must be single-class and classified with configured document types
- Configuration: Visit “View/Edit Configuration” tab to load your pattern configuration
- Navigation: Click on “Capacity Planning” in the main navigation (visible for Pattern 2 and Unified)
- Pattern Detection: System automatically detects your deployment pattern. The Unified pattern is treated as Pattern 2 for capacity planning purposes
2. Document Configuration
Section titled “2. Document Configuration”Step 1: Document Type Setup
- Select document types only from classes configured in View/Edit Configuration
- Use “Populate tokens from Documents” to automatically extract token usage from processed documents
- Average pages per document is automatically extracted from metering data
- Set token usage for each processing step (OCR, Classification, Extraction, Assessment, Summarization)
Step 2: Token Population from Processed Documents
// Example of automatic token and page extraction from metering data{ "OCR/bedrock/us.amazon.nova-lite-v1:0": { "totalTokens": 1500, "requests": 3, // Used as page count "pages": 3 }, "Classification/bedrock/anthropic.claude-3-haiku": { "inputTokens": 800, "outputTokens": 200, "requests": 1 }, "Extraction/bedrock/anthropic.claude-3-haiku": { "inputTokens": 2000, "outputTokens": 500, "requests": 2 }}Page Count Extraction Priority:
- OCR Bedrock requests count (
metrics.requests) - BDA pattern pages (
BDAProject/bda/*.pages) - Metering fields (
pages,pageCount,PageCount) - Document-level fields (
doc.PageCount,doc.pageCount,doc.Pages) - Sections EndPage field as fallback
Step 3: CSV Import/Export
- Import document configurations from CSV files
- Export current configurations for backup or sharing
- Validation for required OCR tokens when Bedrock OCR is configured
3. Processing Schedule Configuration
Section titled “3. Processing Schedule Configuration”Hourly Processing Schedule:
- Configure processing volumes by hour using 24-hour time slots
- Select document types only from types configured in Document Processing section
- Specify documents per hour for each time slot and document type
- Visual time slot selection with hour range display (e.g., “09:00 - 10:00”)
Maximum Latency Configuration:
- Enter maximum allowed latency in seconds (1-3600 seconds)
- Quick reference: 60s = 1 min | 120s = 2 min | 300s = 5 min | 600s = 10 min
- Used for SLA compliance checking and performance validation
- Displayed with automatic conversion to minutes for reference
4. Capacity Calculation and Results
Section titled “4. Capacity Calculation and Results”Running Capacity Analysis:
- Click “Calculate Capacity Requirements” to perform analysis
- System validates configuration and processes requirements
- Real processing times are extracted from actual processed documents
- Real queue delays are calculated from QueuedTime/WorkflowStartTime timestamps
Capacity Metrics Display:
- Total Docs: Aggregate documents per hour across all time slots
- Total Pages: Calculated from document volumes and average pages
- Total Tokens: Aggregated token usage across all processing steps (displayed in millions)
Latency Distribution Analysis:
- Processing time percentiles (P50, P75, P90, P95, P99) from real document data
- Base processing time: Median processing time from actual documents
- Queue delay: Actual queue delays from QueuedTime/WorkflowStartTime timestamps
- Quota Status: Shows ”✅ Within Quota” or “⚠️ Quota Exceeded” based on all model quotas
- SLA Target: Configured maximum latency in seconds
Latency Bar Colors:
- Green/Blue: When all model quotas are within limits (regardless of SLA)
- Red: Only when any model quota shows “Increase Needed”
5. AWS Service Quota Management
Section titled “5. AWS Service Quota Management”Quota Requirements Analysis:
- Bedrock Models TPM: Tokens Per Minute requirements by model and processing step
- Bedrock Models RPM: Requests Per Minute requirements by model and processing step
- Current vs. required quota comparison with utilization percentages
- Status indicators: ✅ Sufficient, ⚠️ Increase Needed
RPM Calculation Method:
1. Sample up to 100 documents with metering data2. Calculate average requests per document for each processing step3. Multiply by scheduled documents per hour4. Apply 10% safety buffer for burst traffic5. Convert to per-minute rate
Formula: peak_rpm = (avg_requests_per_doc × scheduled_docs_per_hour / 60) × 1.1
Where:- avg_requests_per_doc: Calculated from metering samples- scheduled_docs_per_hour: Sum across all hourly time slots- 1.1 = 10% safety buffer (not SLA factor)Direct AWS Console Integration:
- “Request Increase” buttons that open AWS Service Quotas console
- Region-specific console URLs using configured AWS region
- Direct links to specific model quotas when
VITE_BEDROCK_MODEL_QUOTA_CODESis configured - Fallback to generic Bedrock quotas page when configuration is missing
OCR Quota Handling:
- OCR quota requirements are automatically skipped when OCR tokens are 0 (OCR not in use)
- This prevents errors when using Textract OCR instead of Bedrock OCR
6. Safety Buffer and Quota Calculation Details
Section titled “6. Safety Buffer and Quota Calculation Details”10% Safety Buffer:
All TPM and RPM quota calculations include a 10% safety buffer (multiplier of 1.1) to ensure adequate capacity. This buffer accounts for:
- Burst traffic: Documents arriving faster than scheduled averages
- Token count variations: Actual token usage may vary per document
- Request count variations: Some documents may require more API requests
- System overhead: Retries, error handling, and processing variations
TPM (Tokens Per Minute) Calculation:
Formula: peak_tpm = max(tokens_per_hour_for_each_hour / 60) × 1.1
Where:- tokens_per_hour = Σ(scheduled_docs × tokens_per_doc)- max() = peak across all 24 hours (not average)- 1.1 = 10% safety buffer- Each processing step calculated separatelyExample:
Hour 9: 100 invoices × 5,000 extraction tokens = 500,000 tokens/hour 500,000 / 60 = 8,333 tokens/minute 8,333 × 1.1 = 9,166 TPM required for extractionWhy Peak Hour (not Average):
- AWS Bedrock quotas are enforced per minute
- System must handle peak demand to avoid throttling
- Average load calculations would underestimate requirements
Understanding Overload States:
When scheduled demand exceeds quota capacity, the system reports:
- Historical queue delays: Actual delays from processed documents
- Overload warning: Indicates demand > capacity
- Action required: Increase quotas or reduce scheduled volume
Important: The system does NOT predict future queue delays during overload because they depend on:
- How long the overload persists
- Whether quota increases are approved
- Changes to the processing schedule
- System backlog at overload start
If overload persists, queue will grow indefinitely until quotas are increased or demand is reduced.
Advanced Features
Section titled “Advanced Features”1. Token Usage Analysis and Visualization
Section titled “1. Token Usage Analysis and Visualization”Hourly Token Distribution Chart:
- Interactive stacked bar chart showing token usage by hour
- Color-coded by processing step:
- Purple: OCR (only when Bedrock OCR is configured)
- Orange: Classification
- Green: Extraction
- Blue: Assessment
- Red: Summarization
- Dynamic scaling based on peak token usage
- Hover tooltips with detailed token counts per step
Peak Hour Analysis:
- Automatic identification of peak processing hours
- Peak vs. average load comparison with percentage differences
- Peak inference type identification (which step uses most tokens)
- Load distribution insights across active processing hours
2. Real Data Requirements
Section titled “2. Real Data Requirements”No Estimation Mode: The capacity planning system requires real processed documents with metering data. It does not estimate or use default values for:
- Processing times (requires
/lambda/durationgb_seconds or WorkflowStartTime/CompletionTime timestamps) - Queue delays (requires QueuedTime/WorkflowStartTime timestamps)
- Request counts (requires metering data with requests field)
- Page counts (requires metering or document-level page data)
Error Messages When Data is Missing:
- “No processed documents found with metering data”
- “No processing time data found in documents”
- “No request count data found for [step_name]“
3. Data Import/Export and Integration
Section titled “3. Data Import/Export and Integration”CSV Import/Export Functionality:
- Document configuration CSV import with validation
- Processing schedule CSV import/export
- Capacity plan export with comprehensive metrics
- Quota requirements export for documentation and planning
Document Context Integration:
- Integration with Documents context for processed document access
- Document filtering: Only COMPLETED documents with Metering data
- Single-class validation: Multi-class documents are rejected
- Configuration validation: Document type must exist in View/Edit Configuration
Troubleshooting and Best Practices
Section titled “Troubleshooting and Best Practices”Common Issues and Solutions
Section titled “Common Issues and Solutions”Configuration Not Loaded:
- Symptom: Warning message “Configuration not loaded”
- Solution: Visit “View/Edit Configuration” tab first to load pattern configuration
No Documents Available for Token Population:
- Symptom: Alert “No documents available. Please visit the Documents tab first”
- Solution: Visit Documents tab to load document data, then return to Capacity Planning
Document Type Not in Configuration:
- Symptom: “Document type ‘X’ is not defined in configuration”
- Solution: Add the document type to View/Edit Configuration, or use a document with a configured type
Multi-Class Document Rejected:
- Symptom: “Document has multiple classes. Capacity planning only supports single-class documents”
- Solution: Use single-class documents for capacity planning
No Request Count Data Found:
- Symptom: “No request count data found for [step_name]”
- Solution: Process documents through the full workflow to generate metering data with request counts
No Processing Time Data:
- Symptom: “No processing time data found in documents”
- Solution: Ensure documents have
/lambda/durationgb_seconds or WorkflowStartTime/CompletionTime timestamps
OCR Quota Error When Not Using Bedrock OCR:
- Symptom: Error about missing OCR metering data
- Solution: This is fixed - OCR requirements are now skipped when OCR tokens are 0
Best Practices
Section titled “Best Practices”Capacity Planning Workflow:
- Process Sample Documents: Run real documents through the full workflow first
- Load Configuration: Visit View/Edit Configuration to load pattern settings
- Populate from Documents: Use “Populate tokens from Documents” for accurate data
- Configure Schedule: Define realistic processing schedules
- Calculate: Run capacity calculations to identify quota requirements
- Request Quota Increases: Use direct AWS console links to request needed quotas
Token Usage Management:
- Always use actual processed documents for token population
- Validate that document types match View/Edit Configuration classes
- Use single-class documents for accurate per-type analysis
- Export configurations as CSV for backup
Quota Management:
- Request quota increases proactively based on capacity analysis
- Check both TPM and RPM quotas - both can be limiting factors
- Monitor utilization percentages to avoid service limits
- The Lambda requires
BEDROCK_MODEL_QUOTA_CODESandBEDROCK_MODEL_RPM_QUOTA_CODESto be configured
Performance Optimization:
- Analyze peak hour token distribution to optimize schedules
- Monitor latency distribution percentiles against SLA requirements
- Use the “Quota Status” indicator to quickly assess if quotas are sufficient
- Green/blue bars indicate healthy quota status; red indicates quota issues
Integration with Other Features
Section titled “Integration with Other Features”Evaluation Framework Integration
Section titled “Evaluation Framework Integration”The capacity planning system integrates with the evaluation framework to provide:
- Accuracy vs. Performance Trade-offs: Balance processing speed with extraction accuracy
- Baseline Performance Metrics: Use evaluation results to establish capacity baselines
- Quality-Adjusted Capacity Planning: Factor accuracy requirements into capacity calculations
Cost Calculator Integration
Section titled “Cost Calculator Integration”Capacity planning works with the cost calculator to provide:
- Volume-Based Cost Projections: Calculate costs based on planned processing volumes
- Optimization Cost Analysis: Assess cost impact of performance optimizations
- ROI Analysis: Evaluate return on investment for capacity increases
Monitoring System Integration
Section titled “Monitoring System Integration”The capacity planning system leverages monitoring capabilities for:
- Real-time Capacity Tracking: Monitor actual vs. planned capacity utilization
- Performance Trend Analysis: Use historical data for future capacity planning
- Automated Alerting: Trigger alerts when capacity thresholds are exceeded
Testing and Quality Assurance
Section titled “Testing and Quality Assurance”The capacity planning feature includes comprehensive unit tests to ensure reliability and correctness.
Running Tests
Section titled “Running Tests”# From project rootmake setup # Install dependenciesmake test # Run all tests (includes capacity planning)make test-capacity # Run only capacity planning testsmake test-capacity-coverage # Run with coverage reportTest Coverage
Section titled “Test Coverage”- 80%+ code coverage on critical paths
- 45+ unit tests covering:
- Environment variable validation
- Input sanitization and validation
- Quota calculation logic
- Latency distribution calculations
- Decimal conversion utilities
- Recommendation generation
Test Documentation
Section titled “Test Documentation”For detailed testing information, see:
Version History
Section titled “Version History”Latest Updates
Section titled “Latest Updates”Security & Quality Improvements (2026-02-18):
- ✅ Added comprehensive unit tests (80%+ coverage)
- ✅ Added environment variable validation with startup checks
- ✅ Added input sanitization with size limits (1MB)
- ✅ Added DynamoDB pagination for large datasets
- ✅ Replaced browser alerts with Cloudscape Flashbar notifications
- ✅ Integrated tests into main test suite (
make test)
Feature Updates:
- Max Latency Unit Change: Changed from minutes to seconds for more precise SLA configuration
- Quota-Based Bar Colors: Latency bars now show green/blue when quota is sufficient, red only when quota exceeded
- Document Type Filtering: Dropdown now shows only classes from View/Edit Configuration
- RPM Calculation Fix: Fixed calculation to use average requests per document from metering samples
- OCR Quota Skip: Lambda now skips OCR quota requirements when OCR tokens are 0 (OCR not in use)
- avgPages Extraction: Improved extraction from multiple sources (metering, document fields, sections)
- Removed “Exceeds SLA” Badge: Replaced with quota-based color coding for clarity
This comprehensive capacity planning system ensures your GenAI IDP deployment can handle current and future document processing requirements while optimizing for performance, cost, and reliability.