Skip to content

AWS Services and IAM Role Requirements for GenAI IDP Accelerator

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0

AWS Services and IAM Role Requirements for GenAI IDP Accelerator

Section titled “AWS Services and IAM Role Requirements for GenAI IDP Accelerator”

This document outlines the AWS services used by the GenAI Intelligent Document Processing (IDP) Accelerator solution, along with the IAM role scopes needed for deployment and operation.

ServiceUsageDeploymentRuntime
Amazon S3Stores input documents, processed outputs, and web UI assets
Amazon DynamoDBTracks document processing, manages configurations and concurrency
AWS LambdaExecutes document processing functions and business logic
AWS Step FunctionsOrchestrates document processing workflows
Amazon SQSQueues documents for processing and handles throttling
Amazon EventBridgeTriggers document processing workflows when files are uploaded
Amazon CloudFrontDelivers the web UI with global distribution (default hosting mode)
Elastic Load Balancing (ALB)Alternative web UI hosting via Application Load Balancer for VPC-based deployments (see ALB Hosting)
AWS CloudFormationDeploys and manages the solution infrastructure
AWS SAMSimplifies serverless application deployment
AWS CodeBuildBuilds and packages the web UI assets
ServiceUsageDeploymentRuntime
Amazon BedrockProvides foundation models for document understanding
Amazon Bedrock GuardrailsEnforces content safety, information security, and model usage policies
Amazon TextractExtracts text and data from documents (OCR)
Amazon SageMakerHosts custom ML models for document classification (UDOP)
Amazon Bedrock Knowledge BaseEnables semantic document querying (optional)
Bedrock Data Automation (BDA)Automates document processing workflows (Pattern 1)
ServiceUsageDeploymentRuntime
Amazon CognitoManages user authentication and authorization
AWS AppSyncProvides GraphQL API for the web UI
AWS WAFProtects web applications from web exploits (optional)
ServiceUsageDeploymentRuntime
Amazon CloudWatchProvides monitoring, logging, and alerting
AWS SNSDelivers operational alerts and notifications
AWS KMSManages encryption keys for secure data storage

For organizations with Service Control Policies (SCPs) that mandate permissions boundaries on all IAM roles, the solution provides comprehensive support through the PermissionsBoundaryArn parameter. This optional parameter can be specified during deployment to attach a permissions boundary to all IAM roles (both explicit roles and implicit roles created by AWS SAM functions).

Usage:

Terminal window
aws cloudformation deploy \
--template-file template.yaml \
--parameter-overrides PermissionsBoundaryArn=arn:aws:iam::123456789012:policy/MyPermissionsBoundary \
--capabilities CAPABILITY_IAM

When no permissions boundary is specified, roles deploy normally, ensuring backward compatibility.

Deploying this solution requires an IAM role/user with the following permissions:

  • cloudformation:* - Create and manage CloudFormation stacks
  • iam:* - Create and manage IAM roles and policies
  • lambda:* - Create and configure Lambda functions
  • states:* - Create and manage Step Functions state machines
  • s3:* - Create buckets and manage S3 resources
  • dynamodb:* - Create and configure DynamoDB tables
  • sqs:* - Create and configure SQS queues
  • events:* - Create and configure EventBridge rules
  • cloudfront:* - Create and configure CloudFront distributions
  • cognito-idp:* - Create and configure Cognito user pools
  • cognito-identity:* - Create and configure Cognito identity pools for AWS service access
  • appsync:* - Create and configure AppSync APIs
  • logs:* - Create and configure CloudWatch log groups
  • cloudwatch:* - Create and configure CloudWatch dashboards and alarms
  • sns:* - Create and configure SNS topics
  • bedrock:* - Create Bedrock resources (all patterns)
  • sagemaker:* - Create SageMaker endpoints (Pattern 3)
  • opensearch:* - Create OpenSearch domains (Knowledge Base feature)
  • kms:* - Create KMS keys for encryption
  • wafv2:* - Configure WAF rules (optional)

The solution creates various IAM roles to run different components of the system. Key role scopes include:

  • Queue Processing Role:

    • sqs:ReceiveMessage, sqs:DeleteMessage, sqs:GetQueueAttributes
    • dynamodb:GetItem, dynamodb:PutItem, dynamodb:UpdateItem
    • states:StartExecution
    • logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents
  • Step Functions Execution Role:

    • lambda:InvokeFunction
    • states:*
    • events:PutEvents
  • OCR Processing Role:

    • textract:AnalyzeDocument, textract:DetectDocumentText
    • s3:GetObject, s3:PutObject
    • logs:*
  • Classification Role:

    • sagemaker:InvokeEndpoint (Pattern 3)
    • bedrock:InvokeModel (Patterns 2 & 3)
    • bedrock:ApplyGuardrail (when Guardrails configured)
    • s3:GetObject, s3:PutObject
    • logs:*
  • Extraction Role:

    • bedrock:InvokeModel
    • bedrock:ApplyGuardrail (when Guardrails configured)
    • s3:GetObject, s3:PutObject
    • logs:*
  • BDA Integration Role (Pattern 1):

    • bedrock:InvokeDataAutomationAsync
    • s3:GetObject, s3:PutObject
    • dynamodb:GetItem, dynamodb:PutItem, dynamodb:UpdateItem
    • cloudwatch:PutMetricData
    • logs:*
  • AppSync Service Role:

    • dynamodb:GetItem, dynamodb:Query, dynamodb:Scan
    • s3:GetObject, s3:PutObject, s3:ListBucket
    • lambda:InvokeFunction
  • Cognito Authentication Role:

    • appsync:GraphQL
    • s3:GetObject (for UI assets and buckets)
    • ssm:GetParameter (for settings)
  • Knowledge Base Query Role:

    • bedrock:InvokeModel
    • bedrock:Retrieve
    • bedrock:RetrieveAndGenerate
    • bedrock:ApplyGuardrail (when Guardrails configured)
    • aoss:APIAccessAll (for OpenSearch Serverless access)
    • logs:*
  • Knowledge Base Service Role:

    • bedrock:InvokeModel
    • aoss:APIAccessAll
    • s3:ListBucket, s3:GetObject (when using S3 data source)
  • CloudWatch Dashboard Role:

    • cloudwatch:GetDashboard, cloudwatch:PutDashboard
    • logs:DescribeLogGroups
  • Workflow Tracking Role:

    • dynamodb:GetItem, dynamodb:PutItem, dynamodb:UpdateItem
    • cloudwatch:PutMetricData
    • logs:*
  • Evaluation Function Role:

    • s3:GetObject (from baseline bucket)
    • s3:PutObject, s3:GetObject (for output bucket)
    • dynamodb:GetItem, dynamodb:PutItem, dynamodb:UpdateItem
    • bedrock:InvokeModel (for LLM-based evaluations)
    • appsync:GraphQL (for updating evaluation results)
    • cloudwatch:PutMetricData
    • logs:*

For high-volume document processing, consider requesting quota increases for:

ServiceQuota to IncreaseTypical Default
Amazon BedrockOn-demand InvokeModel tokens per minuteVaries by model
Amazon BedrockOn-demand InvokeModel requests per minuteVaries by model
Amazon BedrockApplyGuardrail requests per minuteVaries by region
Amazon TextractDetectDocumentText / AnalyzeDocument transactions per second10-25 TPS
Amazon SageMakerNumber of endpoints per region2-10 endpoints
AWS LambdaConcurrent executions1,000 executions
AWS Step FunctionsState transitions per second2,000 transitions
Amazon SQSAPI requests per queueVery high by default
Amazon CloudWatchPutMetricData API requests per second150 requests/second
Bedrock Data AutomationConcurrent jobs (Pattern 1)Varies by region

When deploying this solution, consider the following security best practices:

  1. Encryption:

    • Enable SSE-KMS encryption for all S3 buckets
    • Use customer-managed CMKs for sensitive data
    • Enable encryption for DynamoDB tables
  2. Network Security:

    • Use CloudFront security features (geo-restrictions, HTTPS, etc.) or ALB security groups for VPC-based hosting
    • Configure AWS WAF to protect web interfaces
  3. Authentication:

    • Enforce MFA for admin users in Cognito
    • Set strong password policies
    • Limit admin access to necessary personnel
  4. IAM Best Practices:

    • Use least privilege principles for all roles
    • Regularly audit and rotate credentials
    • Enable CloudTrail logging for all API actions
  5. Content Safety & Control:

    • Configure Bedrock Guardrails with appropriate topic filters
    • Set up content blocking for sensitive information
    • Implement trace logging for guardrail activations
    • Use different guardrail configurations for different environments (dev/test/prod)
  6. Data Protection:

    • Implement lifecycle policies for S3 objects
    • Configure appropriate retention policies for logs and data
    • Consider data residency requirements when selecting regions