GovCloud Operations
GovCloud Operations
Section titled “GovCloud Operations”Monitoring, troubleshooting, and operational best practices for the GenAI IDP solution in GovCloud.
Monitoring
Section titled “Monitoring”CloudWatch Dashboards
Section titled “CloudWatch Dashboards”The solution deploys CloudWatch dashboards automatically. Access them via the CloudWatch console in your GovCloud region.
Key metrics to monitor:
- Step Functions: Execution success/failure rates, duration
- Lambda Functions: Invocation count, error rate, duration, throttles
- SQS Queues: Queue depth, age of oldest message
- DynamoDB: Read/write capacity, throttled requests
CloudWatch Alarms
Section titled “CloudWatch Alarms”Out of the box, the stack creates alarms for two step-function conditions:
- Step Functions execution failures
- Step Function slow executions
Alarms publish to an SNS topic — subscribe your team’s email or pager to receive notifications.
Log Groups
Section titled “Log Groups”All Workflow Lambda functions write to dedicated CloudWatch Log Groups with the naming convention /{stack-name}-stack-PATTERN2STACK-{cfn-nested-stack-id}/lambda/{function}. The functions for Pattern 2 are:
OCRFunction,ClassificationFunction,ExtractionFunction,AssessmentFunction,ProcessResultsFunction,SummarizationFunction,EvaluationFunction,RuleValidationFunction,RuleValidationOrchestrationFunction.Use CloudWatch Logs Insights to query across functions:
fields @timestamp, @message| filter @message like /ERROR/| sort @timestamp desc| limit 50Headless Lambda Log groups
Section titled “Headless Lambda Log groups”The headless deployment of the IDP solution provisions additional lambda functions to support the headless workflow
-
API Lambda Handler
- Log Group
{stack-name}-ApiHandlerLogGroup-{cfn-id}
- Log Group
-
Batch Pre-Processor
- Log Group:
/aws/lambda/{stack-name}-BatchPreProcessorFunction-{cfn-id}
- Log Group:
-
Job Tracker
- Log Group:
{stack-name}-JobTrackerLogGroup-{cfn-id}
Note:
{cfn-id}is a unique alphanumeric string generated by CloudFormation at stack creation time. It is unique for each resource and is stable across stack updates; however, it changes if the stack is deleted and recreated. - Log Group:
Troubleshooting
Section titled “Troubleshooting”Document Processing Failures
Section titled “Document Processing Failures”- Check Step Functions execution history: Open the Step Functions console, find the failed execution, and inspect the failed state’s input/output
- Check Lambda logs: The failed state maps to a specific Lambda — check its CloudWatch log group for the error
- Common causes:
- Textract unable to process document (unsupported format, corrupt file)
- Bedrock model throttling (check for
ThrottlingException)
VPC Connectivity Issues
Section titled “VPC Connectivity Issues”If Lambda functions timeout after deploying in a VPC:
- Verify all required VPC endpoints exist (see VPC Deployment Guide)
- Check the VPC interface endpoint security group allows HTTPS inbound traffic (port 443) from the CIDR range or security groups of the lambda functions
- Check the private subnets that lambdas are deployed into and confirm VPC Gateway endpoint routes exist for S3 and DynamoDb
Queue Backlog
Section titled “Queue Backlog”If documents are queuing up and not processing:
- Check SQS queue depth in CloudWatch
- Verify Lambda concurrency limits aren’t being hit
- Check the DynamoDB concurrency table for stuck entries
- Look for throttling errors in the QueueProcessor Lambda logs
Operational Best Practices
Section titled “Operational Best Practices”- Set up SNS subscriptions for the alarm topic before processing production workloads
- Enable S3 access logging on the input and output buckets for audit trails
- Review CloudWatch dashboards on a regular cadence to catch trends before they become incidents
- Test failover by processing sample documents after any infrastructure changes
- Monitor costs through AWS Cost and Usage Reports. Note: GovCloud accounts do not directly have access to this information and must log into their commercial account partitions for access. Check out AWS GovCloud (US) Billing and Payment for more information.
Related Documentation
Section titled “Related Documentation”- GovCloud Deployment Guide — prerequisites, deployment packages, and deploy commands
- GovCloud Architecture — services removed vs. retained, limitations, and workarounds
- Batch Jobs REST API — API reference, authentication, and bastion tunnel setup
- VPC Deployment Guide — VPC endpoints, security groups, and network configuration