Pattern 1: Bedrock Data Automation (BDA) Workflow
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Pattern 1: Bedrock Data Automation (BDA) Workflow
Section titled “Pattern 1: Bedrock Data Automation (BDA) Workflow”⚠️ DEPRECATED: Pattern 1 has been superseded by the Unified Pattern, which combines both BDA and pipeline processing modes into a single deployment. The
use_bdaconfiguration flag (set via the UI) controls whether documents are processed via BDA or the step-by-step pipeline. See architecture.md for details on the unified architecture.This document is retained as a reference for BDA-specific concepts and behavior that still apply when
use_bda: trueis set in the unified pattern.
This pattern implements an intelligent document processing workflow using Amazon Bedrock Data Automation (BDA) for orchestrating ML-powered document processing tasks. It leverages BDA’s ability to extract insights from documents using pre-configured templates and workflows.
Table of Contents
Section titled “Table of Contents”Architecture Overview
Section titled “Architecture Overview”Flow Overview
Section titled “Flow Overview”- Document events from S3 trigger workflow execution
- BDA Invoke Lambda starts BDA job asynchronously with a task token
- BDA Completion Lambda processes job completion events from EventBridge
- Completion Lambda sends task success/failure to Step Functions using the stored token
- Process Results Lambda copies output files to designated location
Components
Section titled “Components”- Main Functions:
- BDA Invoke Function (bda_invoke_function): Initiates BDA jobs and stores task tokens
- BDA Completion Function (bda_completion_function): Handles job completion events
- Process Results Function (processresults_function): Copies and organizes output files
- State Machine: Coordinates workflow execution using waitForTaskToken pattern
- EventBridge: Routes BDA job completion events to the Completion Function
- DynamoDB: Tracks task tokens for asynchronous callback
- S3 Buckets: Input, Working, and Output storage
State Machine Workflow
Section titled “State Machine Workflow”InvokeDataAutomation (with waitForTaskToken) | ├── Success -> ProcessResultsStep | └── Failure -> FailStateDeployment
Section titled “Deployment”Prerequisites
Section titled “Prerequisites”- Bedrock Data Automation project already set up and configured
- Required AWS permissions for Bedrock, Lambda, Step Functions, and S3
- S3 buckets created for input, working, and output storage
Configuration
Section titled “Configuration”Stack Deployment Parameters:
BDAProjectArn: ARN of your Bedrock Data Automation project- Summarization: Control summarization via configuration file
summarization.enabledproperty (replacesIsSummarizationEnabledparameter) ConfigurationDefaultS3Uri: Optional S3 URI to custom configuration (uses default configuration if not specified)InputBucket: S3 bucket for input documentsWorkingBucket: S3 bucket for temporary BDA job outputOutputBucket: S3 bucket for final processed resultsTrackingTable: DynamoDB table for task token trackingCustomerManagedEncryptionKeyArn: KMS key ARN for encryptionLogRetentionDays: CloudWatch log retention periodExecutionTimeThresholdMs: Latency threshold for alerts
Stack Outputs:
SageMakerA2IReviewPortalURL: URL for the SageMaker A2I human review portal (when HITL is enabled)
Configuration Management:
- Configuration now supports multiple presets per pattern (e.g., default, checkboxed_attributes_extraction, medical_records_summarization)
- Configuration can be updated through the Web UI without stack redeployment
- Summarization functionality is controlled through the configuration file
summarization.enabledproperty rather than CloudFormation parameters - BDA-specific configuration is handled within the Bedrock Data Automation project rather than the IDP stack configuration
Note on BDA Configuration: Unlike Patterns 2 and 3, Pattern 1 delegates most document processing configuration to the Bedrock Data Automation (BDA) project itself. Classification and extraction behaviors are configured within the BDA project using BDA Blueprints rather than through the IDP configuration system.
Monitoring and Metrics
Section titled “Monitoring and Metrics”CloudWatch Metrics
Section titled “CloudWatch Metrics”The pattern publishes detailed metrics to CloudWatch:
-
BDA API Metrics:
BDARequestsTotal: Total number of API requestsBDARequestsSucceeded: Successful API requestsBDARequestsFailed: Failed API requestsBDARequestsThrottles: API throttling eventsBDARequestsRetrySuccess: Successful retries after throttlingBDARequestsMaxRetriesExceeded: Cases where max retries were exhaustedBDARequestsLatency: API request duration in millisecondsBDARequestsTotalLatency: Total duration including retries
-
BDA Job Metrics:
BDAJobsTotal: Total number of BDA jobsBDAJobsSucceeded: Successfully completed jobsBDAJobsFailed: Failed job executions
Dashboard Components
Section titled “Dashboard Components”The included CloudWatch dashboard provides visibility into the workflow:
-
API Request Panels:
- API request success/failure rates per minute
- API throttling and retry metrics
- Job execution success/failure trends
-
Lambda Performance:
- Function duration for all Lambda functions
- Long-running invocation tracking
- Memory utilization metrics
-
Error Tracking:
- Log-based panels for API throttling events
- Job execution failures with detailed error messages
- Lambda function errors and timeouts
Concurrency and Throttling
Section titled “Concurrency and Throttling”BDA API Throttling
Section titled “BDA API Throttling”Implements exponential backoff with retry handling for transient errors:
MAX_RETRIES = 7INITIAL_BACKOFF = 2 # secondsMAX_BACKOFF = 300 # 5 minutes
# Retryable error codesretryable_errors = [ 'ThrottlingException', 'ServiceQuotaExceededException', 'RequestLimitExceeded', 'TooManyRequestsException', 'InternalServerException']Error Handling
Section titled “Error Handling”- Retries on transient failures with exponential backoff
- Clear distinction between retryable and non-retryable errors
- Detailed metrics for tracking throttling events and retries
- Dead Letter Queue for EventBridge target to capture unprocessed events
- Comprehensive error logging with cause and stack traces
Workflow Details
Section titled “Workflow Details”Invocation Step
Section titled “Invocation Step”# Example BDA invocation payloadpayload = { "inputConfiguration": { "s3Uri": input_s3_uri }, "outputConfiguration": { "s3Uri": output_s3_uri }, "dataAutomationConfiguration": { "dataAutomationProjectArn": data_project_arn, "stage": "LIVE" }, "dataAutomationProfileArn": f"arn:aws:bedrock:{region}:{account_id}:data-automation-profile/us.data-automation-v1", "notificationConfiguration": { "eventBridgeConfiguration": { "eventBridgeEnabled": True } }}Processing Step
Section titled “Processing Step”- Tracks execution task tokens in DynamoDB with expiration time
- Listens for EventBridge events from BDA job completion
- Retrieves task token from DynamoDB when job completes
- Sends success or failure to Step Functions workflow
- Publishes detailed metrics for monitoring
Results Processing
Section titled “Results Processing”- Copies BDA output files from working bucket to final output location
- Organizes results in the same directory structure as input
- Produces standardized output format for UI consumption
- Updates execution status with job result information
Human-in-the-Loop (HITL)
Section titled “Human-in-the-Loop (HITL)”Pattern-1 supports Human-in-the-Loop (HITL) review capabilities using Amazon SageMaker Augmented AI (A2I). This feature allows human reviewers to validate and correct extracted information when the system’s confidence falls below a specified threshold.
Pattern-1 Specific Configuration:
EnableHITL: Boolean parameter to enable/disable the HITL featurePattern1 - Existing Private Workforce ARN: Optional parameter to use existing private workforce
For comprehensive HITL documentation including workflow details, configuration steps, best practices, and troubleshooting, see the Human-in-the-Loop Review Guide.
Edit Mode (Data-Only)
Section titled “Edit Mode (Data-Only)”Pattern-1 supports a data-only Edit Mode through the Web UI, allowing users to edit extraction data (predictions and ground truth) without re-invoking Bedrock Data Automation.
Capabilities
Section titled “Capabilities”- Edit Extraction Data: Click “Edit Mode” then use “Edit Data” buttons on each section to open the Visual Editor
- Modify Predictions: Update predicted field values and review confidence scores
- Edit Ground Truth: Modify baseline/ground truth data for evaluation comparison
- Reprocess: “Save and Reprocess” triggers evaluation and summarization without BDA re-invocation
Limitations
Section titled “Limitations”Since Pattern-1 uses BDA for document splitting and classification:
- Section Structure: Read-only - cannot add, delete, or modify sections
- Page Assignments: Read-only - BDA controls which pages belong to which sections
- Classification: Read-only - document classes are determined by BDA blueprints
How It Works
Section titled “How It Works”When you click “Save and Reprocess” with existing pages and sections data:
- The workflow detects existing document data (pages > 0 and sections present)
- BDA invocation step is automatically skipped
- Process proceeds directly to evaluation and summarization
- Document status updates to COMPLETED when finished
This is useful for:
- Correcting extraction errors in the Visual Editor
- Adding baseline data for evaluation comparison
- Re-running evaluation after data corrections
- Updating document summaries after data modifications
Best Practices
Section titled “Best Practices”-
BDA Project Configuration:
- Configure classification and extraction within the BDA project using BDA Blueprints
- Use BDA’s built-in capabilities for document type detection and field extraction
- Test BDA configuration thoroughly before integrating with IDP stack
-
Configuration Management:
- Use the configuration library for IDP-specific settings (summarization, evaluation, etc.)
- BDA-specific configuration should be managed within the BDA project
- Leverage the Web UI for IDP configuration updates without redeployment
-
Monitoring and Scaling:
- Monitor BDA service quotas and adjust concurrency as needed
- Implement exponential backoff with jitter for API throttling
- Set up EventBridge rules to capture all job status events
- Include DLQ for EventBridge targets to capture unprocessed events
-
Error Handling:
- Ensure token storage has appropriate TTL to avoid stale tokens
- Handle partial successes appropriately in the results processor
- Maintain comprehensive logging for troubleshooting
- Use CloudWatch dashboards to monitor performance metrics
-
Security and Reliability:
- Enable detailed CloudWatch metrics for API requests and job executions
- Configure alerts for unusual throttling or error patterns
- Use appropriate IAM roles with least privilege principles
- Implement proper error handling for BDA job failures
-
HITL Management:
- Configure confidence thresholds through the Web UI Portal Configuration tab based on business requirements
- Regularly check the Review Portal for pending tasks to avoid processing delays
- Establish consistent correction guidelines if multiple reviewers are involved