Skip to content

Amazon Bedrock Service Tiers

The GenAI IDP solution supports Amazon Bedrock service tiers through model ID suffixes, allowing you to optimize for performance and cost by selecting different service tiers for model inference operations.

Amazon Bedrock offers three service tiers for on-demand inference:

TierPerformanceCostBest For
PriorityFastest response timesPremium pricing (~25% more)Customer-facing workflows, real-time interactions
StandardConsistent performanceRegular pricingEveryday AI tasks, content generation
FlexVariable latencyDiscounted pricingBatch processing, evaluations, non-urgent workloads

Service tiers are specified using model ID suffixes:

Format: <base-model-id>[:<service-tier>]

Examples:

  • us.amazon.nova-2-lite-v1:0 → Standard tier (default, no suffix)
  • us.amazon.nova-2-lite-v1:0:flex → Flex tier
  • us.amazon.nova-2-lite-v1:0:priority → Priority tier
  • global.amazon.nova-2-lite-v1:0:flex → Global model with Flex tier

The following Nova 2 Lite models are available with service tier suffixes:

US Region Models:

  • us.amazon.nova-2-lite-v1:0 (Standard - default)
  • us.amazon.nova-2-lite-v1:0:flex (Flex tier)
  • us.amazon.nova-2-lite-v1:0:priority (Priority tier)

Global Models:

  • global.amazon.nova-2-lite-v1:0 (Standard - default)
  • global.amazon.nova-2-lite-v1:0:flex (Flex tier)
  • global.amazon.nova-2-lite-v1:0:priority (Priority tier)

Using Model IDs with Service Tier Suffixes

Section titled “Using Model IDs with Service Tier Suffixes”

Simply specify the model ID with the desired tier suffix in your configuration:

classification:
model: "us.amazon.nova-2-lite-v1:0:priority" # Fast classification
# ... other settings
extraction:
model: "us.amazon.nova-2-lite-v1:0:flex" # Cost-effective extraction
# ... other settings
assessment:
model: "us.amazon.nova-2-lite-v1:0" # Standard tier (no suffix)
# ... other settings

When you specify a model ID with a service tier suffix:

  1. The system parses the model ID to extract the base model and tier
  2. The base model ID (without suffix) is passed to the Bedrock API
  3. The extracted tier is passed as the serviceTier parameter
  4. Example: us.amazon.nova-2-lite-v1:0:flex becomes:
    • Model ID: us.amazon.nova-2-lite-v1:0
    • Service Tier: flex
  1. Navigate to the Configuration page
  2. In each operation section (Classification, Extraction, Assessment, Summarization):
    • Find the “Model” dropdown
    • Select a model with the desired tier suffix:
      • Models ending in :flex use Flex tier
      • Models ending in :priority use Priority tier
      • Models without suffix use Standard tier
  3. Changes save automatically

The model dropdown will show all available models including those with tier suffixes.

When to use:

  • Customer-facing document processing
  • Real-time classification needs
  • Interactive applications
  • Time-sensitive workflows

Example configuration:

classification:
model: "us.amazon.nova-2-lite-v1:0:priority"
# Fast classification for real-time user uploads

When to use:

  • General document processing
  • Balanced performance and cost
  • Most production workloads
  • Default choice when unsure

Example configuration:

extraction:
model: "us.amazon.nova-2-lite-v1:0"
# No suffix = standard tier

When to use:

  • Batch document processing
  • Evaluation and testing workflows
  • Non-urgent background jobs
  • Cost optimization scenarios

Example configuration:

extraction:
model: "us.amazon.nova-2-lite-v1:0:flex"
# Cost-effective for batch processing

You can use different tiers for different operations:

classification:
model: "us.amazon.nova-2-lite-v1:0:priority"
# Fast classification for user experience
extraction:
model: "us.amazon.nova-2-lite-v1:0:flex"
# Cost-effective extraction (can tolerate latency)
assessment:
model: "us.amazon.nova-2-lite-v1:0"
# Standard tier for assessment

Based on AWS documentation:

  • Priority: Up to 25% better OTPS (Output Tokens Per Second) latency vs Standard
  • Standard: Consistent baseline performance
  • Flex: Variable latency, suitable for batch workloads where speed is less critical

Actual performance varies by:

  • Model size and complexity
  • Request payload size
  • Current system load
  • AWS region
  • Priority: ~25% premium over Standard pricing
  • Standard: Baseline pricing (reference point)
  • Flex: Discounted pricing compared to Standard

Important: Always refer to the AWS Pricing Calculator for current pricing information.

Service tier information appears in CloudWatch logs:

Using service tier: flex
Extracted service tier 'flex' from model ID. Using base model ID: us.amazon.nova-2-lite-v1:0

CloudWatch metrics include service tier as a dimension, allowing you to:

  • Track usage by tier
  • Compare performance across tiers
  • Analyze cost by tier

Problem: Model with tier suffix not working

Solution: Verify the model ID format:

  • Correct: us.amazon.nova-2-lite-v1:0:flex
  • Incorrect: us.amazon.nova-2-lite-v1:0-flex
  • Incorrect: us.amazon.nova-2-lite-v1:0_flex

Problem: Service tier not being used

Solution: Check CloudWatch logs for:

  • “Extracted service tier” messages
  • “Using service tier” messages
  • Verify model ID includes tier suffix

Problem: Using unsupported tier suffix

Solution: Only flex and priority suffixes are supported. Standard tier is the default (no suffix needed).

  1. Start with Standard: Use standard tier (no suffix) as your baseline
  2. Optimize Selectively: Add tier suffixes only where needed
  3. Monitor Costs: Track spending by tier using CloudWatch metrics
  4. Test Performance: Measure actual latency differences for your workload
  5. Use Flex for Batch: Leverage Flex tier for non-urgent batch processing
  6. Reserve Priority: Use Priority tier only for latency-sensitive operations

Not all Bedrock models support all service tiers. Priority and Flex tiers are supported by:

  • Amazon Nova models
  • OpenAI models
  • Qwen models
  • DeepSeek models

If a model doesn’t support the specified tier, the Bedrock API will return an error.