Skip to content

Fine-Tuning and Deploying Amazon Nova Models

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0

Fine-Tuning and Deploying Amazon Nova Models

Section titled “Fine-Tuning and Deploying Amazon Nova Models”

This guide provides comprehensive step-by-step instructions for fine-tuning Amazon Nova models using Amazon Bedrock, creating provisioned throughput, and running inference for document classification tasks.

Set up AWS CLI and credentials:

Terminal window
aws configure

Your AWS account needs permissions for:

  • Amazon Bedrock (fine-tuning and inference)
  • Amazon S3 (data storage and access)
  • AWS IAM (role creation and management)

Install the required packages:

Terminal window
pip install boto3 pillow python-dotenv datasets tqdm

Currently supports Amazon Nova models:

  • Nova Lite (amazon.nova-lite-v1:0)
  • Nova Pro (amazon.nova-pro-v1:0)

Your dataset should be prepared in the Bedrock fine-tuning format. Each training example should be a JSON object with the following structure:

{
"schemaVersion": "bedrock-conversation-2024",
"system": [{
"text": "You are a document classification expert who can analyze and identify document types from images..."
}],
"messages": [
{
"role": "user",
"content": [
{
"text": "Task prompt with document type definitions..."
},
{
"image": {
"format": "png",
"source": {
"s3Location": {
"uri": "s3://bucket-name/path/to/image.png",
"bucketOwner": "123456789012"
}
}
}
}
]
},
{
"role": "assistant",
"content": [{
"text": "{\"type\": \"invoice\"}"
}]
}
]
}

Use the provided script to prepare a dataset from the RVL-CDIP document classification dataset:

Terminal window
python prepare_nova_finetuning_data.py \
--bucket-name my-finetuning-bucket \
--directory rvl-cdip-sampled \
--samples-per-label 100 \
--dataset chainyo/rvl-cdip \
--split train
  • --bucket-name: S3 bucket name for storing prepared data (required)
  • --directory: S3 directory prefix (default: nova-finetuning-data)
  • --samples-per-label: Number of samples per document class (default: 100)
  • --dataset: Hugging Face dataset name (default: chainyo/rvl-cdip)
  • --validation-split: Validation split ratio (default: 0.1)

Basic dataset preparation:

Terminal window
python prepare_nova_finetuning_data.py \
--bucket-name my-bucket \
--samples-per-label 50

Using a custom dataset: It should have the similar structure of RVL-CDIP

Terminal window
python prepare_nova_finetuning_data.py \
--bucket-name my-bucket \
--local-dataset /path/to/local/dataset \
--samples-per-label 75

With custom prompts:

Terminal window
python prepare_nova_finetuning_data.py \
--bucket-name my-bucket \
--samples-per-label 100 \
--system-prompt-file custom_system.txt \
--task-prompt-file custom_task.txt

After preparation, your S3 bucket will contain:

s3://my-finetuning-bucket/
├── rvl-cdip-sampled/
├── images/
│ ├── advertissement_1_uuid.png
│ ├── budget_1_uuid.png
│ └── ...
├── train.jsonl # Training data
├── validation.jsonl # Validation data
└── updated_dataset.json # Dataset metadata

The default configuration supports 16 document classes from RVL-CDIP:

ClassDescription
advertissementMarketing or promotional material
budgetFinancial documents with numerical data
emailElectronic correspondence
file_folderDocument organization structures
formStructured documents with fields
handwrittenDocuments with handwritten content
invoiceBilling documents
letterFormal correspondence
memoInternal business communications
news_articleJournalistic content
presentationSlide-based documents
questionnaireSurvey forms
resumeEmployment documents
scientific_publicationAcademic papers
scientific_reportTechnical research documents
specificationTechnical requirement documents

Before creating fine-tuning jobs, set up the required IAM role:

Terminal window
python create_finetuning_job.py \
--training-data-uri s3://my-bucket/data/train.jsonl \
--output-uri s3://my-bucket/output/ \
--job-name my-finetuning-job \
--create-role

This automatically creates an IAM role with necessary permissions for Bedrock fine-tuning and S3 access.

2.2. Create Fine-tuning Job with Separate Validation Data

Section titled “2.2. Create Fine-tuning Job with Separate Validation Data”
Terminal window
python create_finetuning_job.py \
--training-data-uri s3://my-bucket/data/train.jsonl \
--validation-data-uri s3://my-bucket/data/validation.jsonl \
--output-uri s3://my-bucket/output/ \
--job-name my-finetuning-job \
--model-name my-finetuned-model \
--role-arn arn:aws:iam::123456789012:role/BedrockFinetuningRole

2.3. Create Fine-tuning Job with Automatic Data Splitting

Section titled “2.3. Create Fine-tuning Job with Automatic Data Splitting”
Terminal window
python create_finetuning_job.py \
--training-data-uri s3://my-bucket/data/train.jsonl \
--output-uri s3://my-bucket/output/ \
--job-name my-auto-split-job \
--validation-split 0.2 \
--create-role
Terminal window
python create_finetuning_job.py \
--training-data-uri s3://my-bucket/data/train.jsonl \
--output-uri s3://my-bucket/output/ \
--job-name custom-job \
--create-role \
--epoch-count 3 \
--learning-rate 0.0001 \
--batch-size 1
  • Epoch Count: 1-5 (default: 2)
  • Learning Rate: 1e-6 to 1e-4 (default: 0.00001)
  • Batch Size: Typically 1 for Nova models

Check job status:

Terminal window
python create_finetuning_job.py \
--status-only \
--job-arn arn:aws:bedrock:us-east-1:123456789012:model-customization-job/job-id

Wait for completion with monitoring:

Terminal window
python create_finetuning_job.py \
--training-data-uri s3://my-bucket/data/train.jsonl \
--output-uri s3://my-bucket/output/ \
--job-name monitored-job \
--create-role \
--polling-interval 60 \
--max-wait-time 3600

Fine-tuning results are stored at:

s3://<output-bucket>/<job-name>/
├── training_artifacts/
│ └── step_wise_training_metrics.csv
├── validation_artifacts/
│ └── post_fine_tuning_validation/
└── model_artifacts/

Job details are saved locally as JSON:

{
"job_arn": "arn:aws:bedrock:us-east-1:123456789012:model-customization-job/...",
"job_name": "my-finetuning-job",
"status": "Completed",
"model_id": "arn:aws:bedrock:us-east-1:123456789012:custom-model/...",
"creation_time": "2024-01-01T12:00:00Z",
"end_time": "2024-01-01T13:30:00Z"
}

3.1. Create Provisioned Throughput from Job Details

Section titled “3.1. Create Provisioned Throughput from Job Details”
Terminal window
python create_provisioned_throughput.py \
--job-details-file finetuning_job_20241201_120000.json \
--provisioned-model-name my-provisioned-model \
--model-units 1

3.2. Create Provisioned Throughput from Model ID

Section titled “3.2. Create Provisioned Throughput from Model ID”
Terminal window
python create_provisioned_throughput.py \
--model-id arn:aws:bedrock:us-east-1:123456789012:custom-model/... \
--provisioned-model-name my-provisioned-model \
--model-units 2

3.3. Create Provisioned Throughput from Job ARN

Section titled “3.3. Create Provisioned Throughput from Job ARN”
Terminal window
python create_provisioned_throughput.py \
--job-arn arn:aws:bedrock:us-east-1:123456789012:model-customization-job/... \
--provisioned-model-name my-provisioned-model \
--model-units 1

Check provisioning status:

Terminal window
python create_provisioned_throughput.py \
--status-only \
--provisioned-model-arn arn:aws:bedrock:us-east-1:123456789012:provisioned-model/...
Terminal window
python create_provisioned_throughput.py --list-models
Use CaseRecommended UnitsNotes
Development/Testing1Sufficient for low-volume testing
Production (Low)1-2Up to 100 requests/minute
Production (Medium)3-5Up to 500 requests/minute
Production (High)5+1000+ requests/minute

With base model:

Terminal window
python inference_example.py \
--model-id us.amazon.nova-lite-v1:0 \
--image-path document.png

With fine-tuned provisioned model:

Terminal window
python inference_example.py \
--provisioned-model-arn arn:aws:bedrock:us-east-1:123456789012:provisioned-model/... \
--image-path document.png

Process multiple images:

Terminal window
python inference_example.py \
--model-id us.amazon.nova-lite-v1:0 \
--image-directory /path/to/images/ \
--output-file results.json

Evaluate accuracy with known labels:

Terminal window
python inference_example.py \
--model-id us.amazon.nova-lite-v1:0 \
--image-directory /path/to/images/ \
--ground-truth-file labels.json \
--output-file results_with_accuracy.json

Ground truth file format (labels.json):

{
"/path/to/image1.png": "invoice",
"/path/to/image2.png": "letter",
"/path/to/image3.png": "form"
}

Compare base model with fine-tuned model:

Terminal window
python inference_example.py \
--provisioned-model-arn arn:aws:bedrock:us-east-1:123456789012:provisioned-model/... \
--image-directory /path/to/images/ \
--compare-with-base \
--ground-truth-file labels.json \
--output-file comparison.json

Use custom system and task prompts:

Terminal window
python inference_example.py \
--model-id us.amazon.nova-lite-v1:0 \
--image-path document.png \
--system-prompt-file custom_system.txt \
--task-prompt-file custom_task.txt

Fine-tune inference behavior:

Terminal window
python inference_example.py \
--model-id us.amazon.nova-lite-v1:0 \
--image-path document.png \
--temperature 0.1 \
--top-k 10 \
--max-tokens 500 \
--verbose

The inference script automatically calculates performance metrics when ground truth is provided:

Model Results Summary:
=
Total Images: 100
Successful Inferences: 98
Success Rate: 98.00%
Correct Predictions: 85
Accuracy: 85.00%
Average Inference Time: 2.34s
Total Tokens Used: 12,500
Average Tokens per Image: 125.0

Results are saved in JSON format with detailed metrics:

{
"model_id": "arn:aws:bedrock:us-east-1:123456789012:provisioned-model/...",
"model_name": "Fine-tuned Model",
"results": [
{
"image_path": "/path/to/image.png",
"status": "success",
"prediction": "invoice",
"ground_truth": "invoice",
"correct": true,
"confidence": 1.0,
"inference_time_seconds": 2.1,
"input_tokens": 850,
"output_tokens": 15,
"total_tokens": 865
}
],
"metrics": {
"total_images": 100,
"successful_inferences": 98,
"success_rate": 0.98,
"correct_predictions": 85,
"accuracy": 0.85,
"average_inference_time_seconds": 2.34,
"total_tokens_used": 12500,
"average_tokens_per_image": 125.0
}
}

When comparing models, results include side-by-side metrics:

{
"comparison_type": "model_comparison",
"models": {
"Fine-tuned Model": {
"model_id": "arn:aws:bedrock:us-east-1:123456789012:provisioned-model/...",
"metrics": {
"accuracy": 0.87,
"average_inference_time_seconds": 2.1
}
},
"Base Model": {
"model_id": "us.amazon.nova-lite-v1:0",
"metrics": {
"accuracy": 0.72,
"average_inference_time_seconds": 1.8
}
}
}
}

Nova fine-tuning costs include:

  1. Fine-tuning Job Costs: Based on training time and data size
  2. Provisioned Throughput Costs: Hourly charges for reserved capacity
  3. Inference Costs: Per-token charges for on-demand inference

Data Preparation:

  • Start with smaller datasets (50-100 samples per class)
  • Use efficient image formats (PNG recommended)
  • Optimize hyperparameters to reduce training time

Provisioned Throughput:

  • Start with 1 model unit for testing
  • Scale based on actual usage patterns
  • Delete provisioned throughput when not needed

Inference:

  • Use efficient prompting to minimize token usage
  • Batch process multiple images when possible
  • Consider using base models for simple tasks

To avoid ongoing costs, delete provisioned throughput when not needed:

Terminal window
python create_provisioned_throughput.py \
--delete \
--provisioned-model-arn arn:aws:bedrock:us-east-1:123456789012:provisioned-model/...

⚠️ IMPORTANT: Provisioned throughput incurs costs even when not in use. Always delete when no longer needed.

  • Quality over Quantity: 50-100 high-quality examples per class often outperform 1000+ poor examples
  • Balanced Classes: Ensure roughly equal representation across document types
  • Image Quality: Use clear, high-resolution images (300+ DPI for scanned documents)
  • Representative Examples: Include diverse layouts and formats within each class
  • Start Simple: Begin with default hyperparameters
  • Incremental Tuning: Adjust one parameter at a time
  • Monitor Overfitting: Use validation data to prevent overfitting
  • Early Stopping: Stop training if validation metrics plateau
  • Gradual Rollout: Test with small traffic percentage before full deployment
  • Monitoring: Set up CloudWatch alarms for inference errors and latency
  • Fallback Strategy: Have base model as fallback for fine-tuned model failures
  • Cost Monitoring: Track token usage and provisioned throughput costs
  • Image Preprocessing: Resize images to optimal dimensions (typically 1024x1024 max)
  • Prompt Engineering: Use concise, specific prompts
  • Batch Processing: Process multiple documents together when possible
  • Caching: Cache results for repeated classifications

Fine-tuning Job Fails:

  • Check IAM role permissions for Bedrock and S3
  • Verify training data format (JSONL with correct schema)
  • Ensure S3 bucket accessibility from Bedrock service
  • Check hyperparameter ranges (epochs: 1-5, learning rate: 1e-6 to 1e-4)

Provisioned Throughput Creation Fails:

  • Ensure fine-tuning job completed successfully
  • Verify model ID is correct (use job details file)
  • Check account limits for provisioned throughput

Inference Errors:

  • Verify model ID/ARN is correct and accessible
  • Check image format and size (max 20MB)
  • Ensure proper AWS credentials and region configuration
  • Monitor CloudWatch logs for detailed error messages

Low Accuracy:

  • Review training data quality and labeling consistency
  • Increase dataset size or improve class balance
  • Adjust hyperparameters (try lower learning rate)
  • Verify prompt templates match training format

Enable verbose logging:

Terminal window
python inference_example.py \
--model-id us.amazon.nova-lite-v1:0 \
--image-path document.png \
--verbose

Check job logs:

Terminal window
python create_finetuning_job.py \
--status-only \
--job-arn <job-arn>

Monitor provisioning:

Terminal window
python create_provisioned_throughput.py \
--status-only \
--provisioned-model-arn <model-arn>

Slow Inference:

  • Check provisioned throughput status (should be “InService”)
  • Optimize image sizes and formats
  • Consider using multiple model units for higher throughput

High Costs:

  • Monitor token usage per inference
  • Optimize prompts to reduce token count
  • Delete unused provisioned throughput
  • Use batch processing for multiple documents
Terminal window
# 1. Prepare dataset
python prepare_nova_finetuning_data.py --bucket-name my-bucket --samples-per-label 100
# 2. Create fine-tuning job
python create_finetuning_job.py --training-data-uri s3://my-bucket/train.jsonl --job-name my-job --create-role
# 3. Create provisioned throughput
python create_provisioned_throughput.py --job-details-file job.json --provisioned-model-name my-model --model-units 1
# 4. Run inference
python inference_example.py --provisioned-model-arn <arn> --image-directory /path/to/images --output-file results.json
# 5. Clean up
python create_provisioned_throughput.py --delete --provisioned-model-arn <arn>

This documentation provides comprehensive guidance for fine-tuning Amazon Nova models for document classification. For additional support, refer to the AWS documentation and support resources.