Skip to content

IDP CLI - Command Line Interface for Batch Document Processing

IDP CLI - Command Line Interface for Batch Document Processing

Section titled “IDP CLI - Command Line Interface for Batch Document Processing”

A command-line tool for batch document processing with the GenAI IDP Accelerator.

Batch Processing - Process multiple documents from CSV/JSON manifests
📊 Live Progress Monitoring - Real-time updates with rich terminal UI
🔄 Resume Monitoring - Stop and resume monitoring without affecting processing
📁 Flexible Input - Support for local files and S3 references
🔍 Comprehensive Status - Track queued, running, completed, and failed documents
📈 Batch Analytics - Success rates, durations, and detailed error reporting
🎯 Evaluation Framework - Validate accuracy against baselines with detailed metrics
💬 Agent Chat - Interactive Agent Companion Chat from the terminal with Analytics, Error Analyzer, and more

Demo:

  • Python 3.12 or higher
  • AWS credentials configured (via AWS CLI or environment variables)
  • An active IDP Accelerator CloudFormation stack
Terminal window
make setup-venv
source .venv/bin/activate
Terminal window
cd lib/idp_cli_pkg
pip install -e ".[test]"

The CLI supports an optional --profile parameter to specify which AWS credentials profile to use:

Terminal window
idp-cli --profile my-profile <command> [options]
  • Can be placed anywhere in the command
  • Only affects that specific command execution
  • Automatically applies to all AWS SDK calls
  • If not specified, uses default AWS credentials

Examples:

Terminal window
# Profile before command
idp-cli --profile production deploy --stack-name my-stack ...
# Profile after command
idp-cli deploy --profile production --stack-name my-stack ...
# Profile at the end
idp-cli deploy --stack-name my-stack --profile production ...

Deploy a stack and process documents in 3 commands:

Section titled “Deploy a stack and process documents in 3 commands:”
Terminal window
# 1. Deploy stack (10-15 minutes)
idp-cli deploy \
--stack-name my-idp-stack \
--admin-email your.email@example.com \
--wait
# 2. Process documents from a local directory
idp-cli process \
--stack-name my-idp-stack \
--dir ./my-documents/ \
--monitor
# 3. Download results
idp-cli download-results \
--stack-name my-idp-stack \
--batch-id <batch-id-from-step-2> \
--output-dir ./results/

That’s it! Your documents are processed with OCR, classification, extraction, assessment, and summarization.

For evaluation workflows with accuracy metrics, see the Complete Evaluation Workflow section.


Deploy or update an IDP CloudFormation stack.

Usage:

Terminal window
idp-cli deploy [OPTIONS]

Required for New Stacks:

  • --stack-name: CloudFormation stack name
  • --admin-email: Admin user email

Optional Parameters:

  • --from-code: Deploy from local code by building and publishing artifacts (path to project root)
  • --template-url: URL to CloudFormation template in S3 (optional, auto-selected based on region)
  • --custom-config: Path to local config file or S3 URI
  • --max-concurrent: Maximum concurrent workflows (default: 100)
  • --log-level: Logging level (DEBUG, INFO, WARN, ERROR) (default: INFO)
  • --enable-hitl: Enable Human-in-the-Loop (true or false)
  • --parameters: Additional parameters as key=value,key2=value2
  • --wait: Wait for stack operation to complete
  • --no-rollback: Disable rollback on stack creation failure
  • --region: AWS region (optional, auto-detected)
  • --role-arn: CloudFormation service role ARN (optional)

Note: --from-code and --template-url are mutually exclusive. Use --from-code for development/testing from local source, or --template-url for production deployments.

Auto-Monitoring for In-Progress Operations:

If you run deploy on a stack that already has an operation in progress (CREATE, UPDATE, ROLLBACK), the command automatically switches to monitoring mode instead of failing. This is useful if you forgot to use --wait on the initial deploy - simply run the same command again to monitor progress:

Terminal window
# First run without --wait starts the deployment
$ idp-cli deploy --stack-name my-stack --admin-email user@example.com
Stack CREATE initiated successfully!
# Second run - automatically monitors the in-progress operation
$ idp-cli deploy --stack-name my-stack
Stack 'my-stack' has an operation in progress
Current status: CREATE_IN_PROGRESS
Switching to monitoring mode...
[Live progress display...]
Stack CREATE completed successfully!

Supported in-progress states: CREATE_IN_PROGRESS, UPDATE_IN_PROGRESS, DELETE_IN_PROGRESS, ROLLBACK_IN_PROGRESS, UPDATE_ROLLBACK_IN_PROGRESS, and cleanup states.

Examples:

Terminal window
# Create new stack
idp-cli deploy \
--stack-name my-idp \
--admin-email user@example.com \
--wait
# Update with custom config
idp-cli deploy \
--stack-name my-idp \
--custom-config ./updated-config.yaml \
--wait
# Update parameters
idp-cli deploy \
--stack-name my-idp \
--max-concurrent 200 \
--log-level DEBUG \
--wait
# Deploy with custom template URL (for regions not auto-supported)
idp-cli deploy \
--stack-name my-idp \
--admin-email user@example.com \
--template-url https://s3.eu-west-1.amazonaws.com/my-bucket/idp-main.yaml \
--region eu-west-1 \
--wait
# Deploy with CloudFormation service role and permissions boundary
idp-cli deploy \
--stack-name my-idp \
--admin-email user@example.com \
--role-arn arn:aws:iam::123456789012:role/IDP-Cloudformation-Service-Role \
--parameters "PermissionsBoundaryArn=arn:aws:iam::123456789012:policy/MyPermissionsBoundary" \
--wait
# Deploy from local source code (for development/testing)
idp-cli deploy \
--stack-name my-idp-dev \
--from-code . \
--admin-email user@example.com \
--wait
# Update existing stack from local code changes
idp-cli deploy \
--stack-name my-idp-dev \
--from-code . \
--wait
# Deploy with rollback disabled (useful for debugging failed deployments)
idp-cli deploy \
--stack-name my-idp \
--admin-email user@example.com \
--no-rollback \
--wait

Delete an IDP CloudFormation stack.

⚠️ WARNING: This permanently deletes all stack resources.

Usage:

Terminal window
idp-cli delete [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --force: Skip confirmation prompt
  • --empty-buckets: Empty S3 buckets before deletion (required if buckets contain data)
  • --force-delete-all: Force delete ALL remaining resources after CloudFormation deletion (S3 buckets, CloudWatch logs, DynamoDB tables)
  • --wait: Wait for deletion to complete (default: no-wait)
  • --region: AWS region (optional)

S3 Bucket Behavior:

  • LoggingBucket: DeletionPolicy: Retain - Always kept (unless using --force-delete-all)
  • All other buckets: DeletionPolicy: RetainExceptOnCreate - Deleted if empty
  • CloudFormation can ONLY delete S3 buckets if they’re empty
  • Use --empty-buckets to automatically empty buckets before deletion
  • Use --force-delete-all to delete ALL remaining resources after CloudFormation completes

Force Delete All Behavior:

The --force-delete-all flag performs a comprehensive cleanup AFTER CloudFormation deletion completes:

  1. CloudFormation Deletion Phase: Standard stack deletion
  2. Additional Resource Cleanup Phase (happens with --wait on all deletions and always with --force-delete-all): Removes stack-specific resources not tracked by CloudFormation:
    • CloudWatch Log Groups (Lambda functions, Glue crawlers)
    • AppSync APIs and their log groups
    • CloudFront distributions (two-phase cleanup - initiates disable, takes 15-20 minutes to propagate globally)
    • CloudFront Response Headers Policies (from previously deleted stacks)
    • IAM custom policies and permissions boundaries
    • CloudWatch Logs resource policies
  3. Retained Resource Cleanup Phase (only with --force-delete-all): Deletes remaining resources in order:
    • DynamoDB tables (disables PITR, then deletes)
    • CloudWatch Log Groups (matching stack name pattern)
    • S3 buckets (regular buckets first, LoggingBucket last)

Resources Always Cleaned Up (with --wait or --force-delete-all):

  • IAM custom policies (containing stack name)
  • IAM permissions boundary policies
  • CloudFront response header policies (custom)
  • CloudWatch Logs resource policies (stack-specific)
  • AppSync log groups
  • Additional log groups containing stack name
  • Gracefully handles missing/already-deleted resources

Resources Deleted Only by —force-delete-all:

  • All DynamoDB tables from stack
  • All CloudWatch Log Groups (retained by CloudFormation)
  • All S3 buckets including LoggingBucket
  • Handles nested stack resources automatically

Examples:

Terminal window
# Interactive deletion with confirmation
idp-cli delete --stack-name test-stack
# Automated deletion (CI/CD)
idp-cli delete --stack-name test-stack --force
# Delete with automatic bucket emptying
idp-cli delete --stack-name test-stack --empty-buckets --force
# Force delete ALL remaining resources (comprehensive cleanup)
idp-cli delete --stack-name test-stack --force-delete-all --force
# Delete without waiting
idp-cli delete --stack-name test-stack --force --no-wait

What you’ll see (standard deletion):

⚠️ WARNING: Stack Deletion
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Stack: test-stack
Region: us-east-1
S3 Buckets:
• InputBucket: 20 objects (45.3 MB)
• OutputBucket: 20 objects (123.7 MB)
• WorkingBucket: empty
⚠️ Buckets contain data!
This action cannot be undone.
Are you sure you want to delete this stack? [y/N]: _

What you’ll see (force-delete-all):

⚠️ WARNING: FORCE DELETE ALL RESOURCES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Stack: test-stack
Region: us-east-1
S3 Buckets:
• InputBucket: 20 objects (45.3 MB)
• OutputBucket: 20 objects (123.7 MB)
• LoggingBucket: 5000 objects (2.3 GB)
⚠️ FORCE DELETE ALL will remove:
• All S3 buckets (including LoggingBucket)
• All CloudWatch Log Groups
• All DynamoDB Tables
• Any other retained resources
This happens AFTER CloudFormation deletion completes
This action cannot be undone.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Are you ABSOLUTELY sure you want to force delete ALL resources? [y/N]: y
Deleting CloudFormation stack...
✓ Stack deleted successfully!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Starting force cleanup of retained resources...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Analyzing retained resources...
Found 4 retained resources:
• DynamoDB Tables: 0
• CloudWatch Logs: 0
• S3 Buckets: 3
⠋ Deleting S3 buckets... 3/3
✓ Cleanup phase complete!
Resources deleted:
• S3 Buckets: 3
- test-stack-inputbucket-abc123
- test-stack-outputbucket-def456
- test-stack-loggingbucket-ghi789
Stack 'test-stack' and all resources completely removed.

Use Cases:

  • Cleanup test/development environments to avoid charges
  • CI/CD pipelines that provision and teardown stacks
  • Automated testing with temporary stack creation
  • Complete removal of failed stacks with retained resources
  • Cleanup of stacks with LoggingBucket and CloudWatch logs

Important Notes:

  • --force-delete-all automatically includes --empty-buckets behavior
  • Cleanup phase runs even if CloudFormation deletion fails
  • Includes resources from nested stacks automatically
  • Safe to run - only deletes resources that weren’t deleted by CloudFormation
  • Progress bars show real-time deletion status

Auto-Monitoring for In-Progress Deletions:

If you run delete on a stack that already has a DELETE operation in progress, the command automatically switches to monitoring mode instead of failing. This is useful if you started a deletion without --wait - simply run the command again to monitor:

Terminal window
# First run without --wait starts the deletion
$ idp-cli delete --stack-name test-stack --force --no-wait
Stack DELETE initiated successfully!
# Second run - automatically monitors the in-progress deletion
$ idp-cli delete --stack-name test-stack
Stack 'test-stack' is already being deleted
Current status: DELETE_IN_PROGRESS
Switching to monitoring mode...
[Live progress display...]
Stack deleted successfully!

Canceling In-Progress Operations:

If a non-delete operation is in progress (CREATE, UPDATE), the delete command offers options to handle it:

Terminal window
$ idp-cli delete --stack-name test-stack
Stack 'test-stack' has an operation in progress: CREATE_IN_PROGRESS
Options:
1. Wait for CREATE to complete first
2. Cancel the CREATE and proceed with deletion
Do you want to cancel the CREATE and delete the stack? [yes/no/wait]: _
  • yes: Cancel the operation (if possible) and proceed with deletion
  • no: Exit without making changes
  • wait: Wait for the current operation to complete, then delete

With --force flag, the command automatically cancels the operation and proceeds with deletion:

Terminal window
# Force mode - automatically cancels and deletes
$ idp-cli delete --stack-name test-stack --force
Force mode: Canceling operation and proceeding with deletion...
Stack reached stable state: ROLLBACK_COMPLETE
Proceeding with stack deletion...

Note: CREATE operations cannot be cancelled directly - they must complete or roll back naturally. UPDATE operations can be cancelled immediately.


Process a batch of documents.

Usage:

Terminal window
idp-cli process [OPTIONS]
# or (deprecated alias)
idp-cli run-inference [OPTIONS]

Document Source (choose ONE):

  • --manifest: Path to manifest file (CSV or JSON)
  • --dir: Local directory containing documents
  • --s3-uri: S3 URI in InputBucket
  • --test-set: Test set ID from test set bucket

Options:

  • --stack-name (required): CloudFormation stack name
  • --batch-id: Custom batch ID (auto-generated if omitted, ignored with —test-set)
  • --batch-prefix: Prefix for auto-generated batch ID (default: cli-batch)
  • --file-pattern: File pattern for directory/S3 scanning (default: *.pdf)
  • --recursive/--no-recursive: Include subdirectories (default: recursive)
  • --number-of-files: Limit number of files to process
  • --config: Path to configuration YAML file (optional)
  • --config-version: Configuration version to use for processing (e.g., v1, v2)
  • --context: Context description for test run (used with —test-set, e.g., “Model v2.1”, “Production validation”)
  • --monitor: Monitor progress until completion
  • --refresh-interval: Seconds between status checks (default: 5)
  • --region: AWS region (optional)

Test Set Integration: For test runs to appear properly in the Test Studio UI, use either:

  • --test-set: Process test set directly by ID (recommended for test sets)
  • --manifest: Use manifest file with populated baseline_source column for evaluation tracking

Other options (--dir, --s3-uri) are for general document processing but won’t integrate with test studio tracking.

Examples:

Terminal window
# Process from local directory
idp-cli process \
--stack-name my-stack \
--dir ./documents/ \
--monitor
# Process from manifest with baselines (enables evaluation)
idp-cli process \
--stack-name my-stack \
--manifest documents-with-baselines.csv \
--monitor
# Process from manifest with limited files
idp-cli process \
--stack-name my-stack \
--manifest documents-with-baselines.csv \
--number-of-files 10 \
--monitor
# Process test set (integrates with Test Studio UI - use test set ID)
idp-cli process \
--stack-name my-stack \
--test-set fcc-example-test \
--monitor
# Process test set with limited files for quick testing
idp-cli process \
--stack-name my-stack \
--test-set fcc-example-test \
--number-of-files 5 \
--monitor
# Process test set with custom context (for tracking in Test Studio)
idp-cli process \
--stack-name my-stack \
--test-set fcc-example-test \
--context "Model v2.1 - improved prompts" \
--monitor
# Process S3 URI
idp-cli process \
--stack-name my-stack \
--s3-uri archive/2024/ \
--monitor
# Process with specific configuration version
idp-cli process \
--stack-name my-stack \
--dir ./documents/ \
--config-version v2 \
--monitor
# Process test set with configuration version
idp-cli process \
--stack-name my-stack \
--test-set fcc-example-test \
--config-version v1 \
--context "Testing with config v1" \
--monitor

Reprocess existing documents from a specific pipeline step.

Usage:

Terminal window
idp-cli reprocess [OPTIONS]
# or (deprecated alias)
idp-cli rerun-inference [OPTIONS]

Use Cases:

  • Test different classification or extraction configurations without re-running OCR
  • Fix classification errors and reprocess extraction
  • Iterate on prompt engineering rapidly

Options:

  • --stack-name (required): CloudFormation stack name
  • --step (required): Pipeline step to rerun from (classification or extraction)
  • Document Source (choose ONE):
    • --document-ids: Comma-separated document IDs
    • --batch-id: Batch ID to get all documents from
  • --force: Skip confirmation prompt (useful for automation)
  • --monitor: Monitor progress until completion
  • --refresh-interval: Seconds between status checks (default: 5)
  • --region: AWS region (optional)

Step Behavior:

  • classification: Clears page classifications and sections, reruns classification → extraction → assessment
  • extraction: Keeps classifications, clears extraction data, reruns extraction → assessment

Examples:

Terminal window
# Rerun classification for specific documents
idp-cli reprocess \
--stack-name my-stack \
--step classification \
--document-ids "batch-123/doc1.pdf,batch-123/doc2.pdf" \
--monitor
# Rerun extraction for entire batch
idp-cli reprocess \
--stack-name my-stack \
--step extraction \
--batch-id cli-batch-20251015-143000 \
--monitor
# Automated rerun (skip confirmation - perfect for CI/CD)
idp-cli reprocess \
--stack-name my-stack \
--step classification \
--batch-id test-set \
--force \
--monitor

What Gets Cleared:

StepClearsKeeps
classificationPage classifications, sections, extraction resultsOCR data (pages, images, text)
extractionSection extraction results, attributesOCR data, page classifications, section structure

Benefits:

  • Leverages existing OCR data (saves time and cost)
  • Rapid iteration on classification/extraction configurations
  • Perfect for prompt engineering experiments

Demo:


Check status of documents by batch ID, document ID, or search criteria.

Usage:

Terminal window
idp-cli status [OPTIONS]

Document Source (choose ONE):

  • --batch-id: Batch identifier or PK substring to search for (searches tracking table)
  • --document-id: Single document ID (check individual document)

Optional Filters and Display:

  • --object-status: Filter by status (COMPLETED, FAILED, QUEUED, RUNNING, PROCESSING)
  • --show-details: Show detailed document information in table format
  • --get-time: Calculate and display timing statistics (processing time, queue time, total time)
  • --include-metering: Include Lambda metering statistics (GB-seconds by stage) - requires --get-time

Other Options:

  • --stack-name (required): CloudFormation stack name
  • --wait: Wait for all documents to complete
  • --refresh-interval: Seconds between status checks (default: 5)
  • --format: Output format - table (default) or json
  • --region: AWS region (optional)

How —batch-id Works:

The --batch-id option performs a PK substring search in the DynamoDB tracking table. This means:

  • It searches for all documents where the PK (Primary Key) contains your search string
  • You can search for exact batch IDs: cli-batch-20251015-143000
  • You can search for partial matches: batch-123 finds all documents with “batch-123” in their path
  • You can search across multiple batches: invoice finds all documents with “invoice” in their name

Examples:

Terminal window
# Search for all documents in a batch (PK substring search)
idp-cli status \
--stack-name my-stack \
--batch-id cli-batch-20251015-143000
# Search for documents across batches with partial match
idp-cli status \
--stack-name my-stack \
--batch-id batch-123
# Search for completed documents only
idp-cli status \
--stack-name my-stack \
--batch-id batch-123 \
--object-status COMPLETED
# Search for failed documents with details
idp-cli status \
--stack-name my-stack \
--batch-id batch-123 \
--object-status FAILED \
--show-details
# Search with timing statistics
idp-cli status \
--stack-name my-stack \
--batch-id batch-123 \
--object-status COMPLETED \
--get-time
# Search with timing and Lambda metering data
idp-cli status \
--stack-name my-stack \
--batch-id test \
--object-status COMPLETED \
--get-time \
--include-metering
# Check single document status
idp-cli status \
--stack-name my-stack \
--document-id batch-123/invoice.pdf
# Monitor documents until completion
idp-cli status \
--stack-name my-stack \
--batch-id batch-123 \
--wait
# Get JSON output for scripting
idp-cli status \
--stack-name my-stack \
--batch-id batch-123 \
--format json

Timing Statistics:

When using --get-time, the command calculates:

  • Processing Time: WorkflowStartTime → CompletionTime (actual processing duration)
  • Queue Time: QueuedTime → WorkflowStartTime (time waiting in queue)
  • Total Time: QueuedTime → CompletionTime (end-to-end duration)

For each metric, you’ll see:

  • Average, Median, Min, Max, Standard Deviation, Total
  • ObjectKey for min/max values (helps identify outliers)

Lambda Metering:

When using --include-metering with --get-time, you’ll see GB-seconds usage by stage:

  • Assessment, OCR, Classification, Extraction, Summarization
  • Statistics: Average, Median, Min, Max, Std Dev, Total
  • Cost estimates based on AWS Lambda pricing ($0.0000166667 per GB-second)

Example Output with Timing:

Terminal window
$ idp-cli status --stack-name my-stack --batch-id test-batch --object-status COMPLETED --get-time
Searching for documents with PK containing 'test-batch'...
Found 25 matching documents
Timing Statistics:
Valid documents: 25
Processing Time (WorkflowStartTime CompletionTime):
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
Metric Value ObjectKey
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
Average 45.23s
Median 43.10s
Minimum 32.45s test-batch/small-doc.pdf
Maximum 78.90s test-batch/large-doc.pdf
Std Dev 12.34s
Total 18m 50.75s
└──────────┴─────────────┴──────────────────────────────┘

Programmatic Use:

The command returns exit codes for scripting:

  • 0 - Document(s) completed successfully
  • 1 - Document(s) failed
  • 2 - Document(s) still processing

JSON Output Format:

Terminal window
# Single document
$ idp-cli status --stack-name my-stack --document-id batch-123/invoice.pdf --format json
{
"document_id": "batch-123/invoice.pdf",
"status": "COMPLETED",
"duration": 125.4,
"start_time": "2025-01-01T10:30:45Z",
"end_time": "2025-01-01T10:32:50Z",
"num_sections": 2,
"exit_code": 0
}
# Table output includes final status summary
$ idp-cli status --stack-name my-stack --document-id batch-123/invoice.pdf
[status table]
FINAL STATUS: COMPLETED | Duration: 125.4s | Exit Code: 0

Scripting Examples:

#!/bin/bash
# Wait for document completion and check result
idp-cli status --stack-name prod --document-id batch-001/invoice.pdf --wait
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "Document processed successfully"
# Proceed with downstream processing
else
echo "Document processing failed"
exit 1
fi
#!/bin/bash
# Poll document status in script
while true; do
status=$(idp-cli status --stack-name prod --document-id batch-001/invoice.pdf --format json)
state=$(echo "$status" | jq -r '.status')
if [ "$state" = "COMPLETED" ]; then
echo "Processing complete!"
break
elif [ "$state" = "FAILED" ]; then
echo "Processing failed!"
exit 1
fi
sleep 5
done

Download processing results to local directory.

Usage:

Terminal window
idp-cli download-results [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --batch-id (required): Batch identifier
  • --output-dir (required): Local directory to download to
  • --file-types: File types to download (default: all)
    • Options: pages, sections, summary, evaluation, or all
  • --region: AWS region (optional)

Examples:

Terminal window
# Download all results
idp-cli download-results \
--stack-name my-stack \
--batch-id cli-batch-20251015-143000 \
--output-dir ./results/
# Download only extraction results
idp-cli download-results \
--stack-name my-stack \
--batch-id cli-batch-20251015-143000 \
--output-dir ./results/ \
--file-types sections
# Download evaluation results only
idp-cli download-results \
--stack-name my-stack \
--batch-id eval-batch-20251015 \
--output-dir ./eval-results/ \
--file-types evaluation

Output Structure:

./results/
└── cli-batch-20251015-143000/
└── invoice.pdf/
├── pages/
│ └── 1/
│ ├── image.jpg
│ ├── rawText.json
│ └── result.json
├── sections/
│ └── 1/
│ ├── result.json # Extracted structured data
│ └── summary.json
├── summary/
│ ├── fulltext.txt
│ └── summary.json
└── evaluation/ # Only present if baseline provided
├── report.json # Detailed metrics
└── report.md # Human-readable report

Delete documents and all associated data from the IDP system.

⚠️ WARNING: This action cannot be undone.

Usage:

Terminal window
idp-cli delete-documents [OPTIONS]

Document Selection (choose ONE):

  • --document-ids: Comma-separated list of document IDs (S3 object keys) to delete
  • --batch-id: Delete all documents in this batch
  • --pattern: Wildcard pattern to match document keys (e.g. "batch-123/*.pdf", "*invoice*")

Options:

  • --stack-name (required): CloudFormation stack name
  • --status-filter: Only delete documents with this status (use with —batch-id or —pattern)
    • Options: FAILED, COMPLETED, PROCESSING, QUEUED
  • --dry-run: Show what would be deleted without actually deleting
  • --force, -y: Skip confirmation prompt
  • --region: AWS region (optional)

What Gets Deleted:

  • Source files from input bucket
  • Processed outputs from output bucket
  • DynamoDB tracking records
  • List entries in tracking table

Examples:

Terminal window
# Delete specific documents by ID
idp-cli delete-documents \
--stack-name my-stack \
--document-ids "batch-123/doc1.pdf,batch-123/doc2.pdf"
# Delete all documents in a batch
idp-cli delete-documents \
--stack-name my-stack \
--batch-id cli-batch-20250123
# Delete only failed documents in a batch
idp-cli delete-documents \
--stack-name my-stack \
--batch-id cli-batch-20250123 \
--status-filter FAILED
# Dry run to see what would be deleted
idp-cli delete-documents \
--stack-name my-stack \
--batch-id cli-batch-20250123 \
--dry-run
# Delete documents matching a wildcard pattern
idp-cli delete-documents \
--stack-name my-stack \
--pattern "batch-123/*.pdf"
# Delete all failed invoice documents across batches
idp-cli delete-documents \
--stack-name my-stack \
--pattern "*invoice*" \
--status-filter FAILED
# Dry run with pattern to preview matches
idp-cli delete-documents \
--stack-name my-stack \
--pattern "*2024*" \
--dry-run
# Force delete without confirmation
idp-cli delete-documents \
--stack-name my-stack \
--document-ids "batch-123/doc1.pdf" \
--force

Output Example:

Connecting to stack: my-stack
Getting documents for batch: cli-batch-20250123
Found 15 document(s) in batch
(filtered by status: FAILED)
⚠️ Documents to be deleted:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• cli-batch-20250123/doc1.pdf
• cli-batch-20250123/doc2.pdf
• cli-batch-20250123/doc3.pdf
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Delete 3 document(s) permanently? [y/N]: y
✓ Successfully deleted 3 document(s)

Use Cases:

  • Clean up failed documents after fixing issues
  • Remove test documents from a batch
  • Free up storage by removing old processed documents
  • Prepare for reprocessing by removing previous results

Generate a manifest file from directory or S3 URI, or create a test set in the test set bucket.

Usage:

Terminal window
idp-cli generate-manifest [OPTIONS]

Options:

  • Source (choose ONE):
    • --dir: Local directory to scan
    • --s3-uri: S3 URI to scan
  • --baseline-dir: Baseline directory for automatic matching (only with —dir)
  • --output: Output manifest file path (CSV) - optional when using —test-set
  • --file-pattern: File pattern (default: *.pdf)
  • --recursive/--no-recursive: Include subdirectories (default: recursive)
  • --region: AWS region (optional)
  • Test Set Creation:
    • --test-set: Test set name - creates folder in test set bucket and uploads files
    • --stack-name: CloudFormation stack name (required with —test-set)

Examples:

Terminal window
# Generate from directory
idp-cli generate-manifest \
--dir ./documents/ \
--output manifest.csv
# Generate with automatic baseline matching
idp-cli generate-manifest \
--dir ./documents/ \
--baseline-dir ./validated-baselines/ \
--output manifest-with-baselines.csv
# Create test set and upload files (no manifest needed - use test set name)
idp-cli generate-manifest \
--dir ./documents/ \
--baseline-dir ./baselines/ \
--test-set "fcc example test" \
--stack-name IDP
# Create test set with manifest output
idp-cli generate-manifest \
--dir ./documents/ \
--baseline-dir ./baselines/ \
--test-set "fcc example test" \
--stack-name IDP \
--output test-manifest.csv

Test Set Creation: When using --test-set, the command:

  1. Requires --stack-name, --baseline-dir, and --dir
  2. Uploads input files to s3://test-set-bucket/{test-set-id}/input/
  3. Uploads baseline files to s3://test-set-bucket/{test-set-id}/baseline/
  4. Creates proper test set structure for evaluation workflows
  5. Test set will be auto-detected by the Test Studio UI

Process the created test set:

Terminal window
# Using test set ID (from UI or after creation)
idp-cli process --stack-name IDP --test-set fcc-example-test --monitor
# Or using S3 URI to process input files directly
idp-cli run-inference --stack-name IDP --s3-uri s3://test-set-bucket/fcc-example-test/input/
# Or using manifest if generated
idp-cli run-inference --stack-name IDP --manifest test-manifest.csv

Validate a manifest file without processing.

Usage:

Terminal window
idp-cli validate-manifest [OPTIONS]

Options:

  • --manifest (required): Path to manifest file to validate (CSV or JSON)

Examples:

Terminal window
# Validate a CSV manifest
idp-cli validate-manifest --manifest documents.csv
# Validate a JSON manifest
idp-cli validate-manifest --manifest documents.json

List recent batch processing jobs.

Usage:

Terminal window
idp-cli list-batches [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --limit: Maximum number of batches to list (default: 10)
  • --region: AWS region (optional)

Examples:

Terminal window
# List last 10 batches (default)
idp-cli list-batches --stack-name my-stack
# List last 5 batches
idp-cli list-batches --stack-name my-stack --limit 5
# List with specific region
idp-cli list-batches --stack-name my-stack --limit 20 --region us-west-2

This workflow demonstrates how to process documents, manually validate results, and then reprocess with evaluation to measure accuracy.

Deploy an IDP stack if you haven’t already:

Terminal window
idp-cli deploy \
--stack-name eval-testing \
--admin-email your.email@example.com \
--max-concurrent 50 \
--wait

What happens: CloudFormation creates ~120 resources including S3 buckets, Lambda functions, Step Functions, and DynamoDB tables. This takes 10-15 minutes.


Step 2: Initial Processing from Local Directory

Section titled “Step 2: Initial Processing from Local Directory”

Process your test documents to generate initial extraction results:

Terminal window
# Prepare test documents
mkdir -p ~/test-documents
cp /path/to/your/invoice.pdf ~/test-documents/
cp /path/to/your/w2.pdf ~/test-documents/
cp /path/to/your/paystub.pdf ~/test-documents/
# Process documents
idp-cli run-inference \
--stack-name eval-testing \
--dir ~/test-documents/ \
--batch-id initial-run \
--monitor

What happens: Documents are uploaded to S3, processed through OCR, classification, extraction, assessment, and summarization. Results are stored in OutputBucket.

Monitor output:

✓ Uploaded 3 documents to InputBucket
✓ Sent 3 messages to processing queue
Monitoring Batch: initial-run
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Status Summary
─────────────────────────────────────
✓ Completed 3 100%
⏸ Queued 0 0%
✗ Failed 0 0%

Download the extraction results (sections) for manual review:

Terminal window
idp-cli download-results \
--stack-name eval-testing \
--batch-id initial-run \
--output-dir ~/initial-results/ \
--file-types sections

Result structure:

~/initial-results/initial-run/
├── invoice.pdf/
│ └── sections/
│ └── 1/
│ └── result.json # Extracted data to validate
├── w2.pdf/
│ └── sections/
│ └── 1/
│ └── result.json
└── paystub.pdf/
└── sections/
└── 1/
└── result.json

Step 4: Manual Validation & Baseline Preparation

Section titled “Step 4: Manual Validation & Baseline Preparation”

Review and correct the extraction results to create validated baselines.

4.1 Review extraction results:

Terminal window
# View extracted data for invoice
cat ~/initial-results/initial-run/invoice.pdf/sections/1/result.json | jq .
# Example output:
{
"attributes": {
"Invoice Number": "INV-2024-001",
"Invoice Date": "2024-01-15",
"Total Amount": "$1,250.00",
"Vendor Name": "Acme Corp"
}
}

4.2 Validate and correct:

Compare extracted values against the actual documents. If you find errors, create corrected baseline files:

Terminal window
# Create baseline directory structure
mkdir -p ~/validated-baselines/invoice.pdf/sections/1/
mkdir -p ~/validated-baselines/w2.pdf/sections/1/
mkdir -p ~/validated-baselines/paystub.pdf/sections/1/
# Copy and edit result files
cp ~/initial-results/initial-run/invoice.pdf/sections/1/result.json \
~/validated-baselines/invoice.pdf/sections/1/result.json
# Edit the baseline to correct any errors
vi ~/validated-baselines/invoice.pdf/sections/1/result.json
# Repeat for other documents...

Baseline directory structure:

~/validated-baselines/
├── invoice.pdf/
│ └── sections/
│ └── 1/
│ └── result.json # Corrected/validated data
├── w2.pdf/
│ └── sections/
│ └── 1/
│ └── result.json
└── paystub.pdf/
└── sections/
└── 1/
└── result.json

Step 5: Create Manifest with Baseline References

Section titled “Step 5: Create Manifest with Baseline References”

Create a manifest that links each document to its validated baseline:

Terminal window
cat > ~/evaluation-manifest.csv << EOF
document_path,baseline_source
/home/user/test-documents/invoice.pdf,/home/user/validated-baselines/invoice.pdf/
/home/user/test-documents/w2.pdf,/home/user/validated-baselines/w2.pdf/
/home/user/test-documents/paystub.pdf,/home/user/validated-baselines/paystub.pdf/
EOF

Manifest format:

  • document_path: Path to original document
  • baseline_source: Path to directory containing validated sections

Alternative using auto-matching:

Terminal window
# Generate manifest with automatic baseline matching
idp-cli generate-manifest \
--dir ~/test-documents/ \
--baseline-dir ~/validated-baselines/ \
--output ~/evaluation-manifest.csv

Reprocess documents with the baseline-enabled manifest. The accelerator will automatically run evaluation:

Terminal window
idp-cli run-inference \
--stack-name eval-testing \
--manifest ~/evaluation-manifest.csv \
--batch-id eval-run-001 \
--monitor

What happens:

  1. Documents are processed through the pipeline as before
  2. Evaluation step is automatically triggered because baselines are provided
  3. The evaluation module compares extracted values against baseline values
  4. Detailed metrics are calculated per attribute and per document

Processing time: Similar to initial run, plus ~5-10 seconds per document for evaluation.


Step 7: Download and Review Evaluation Results

Section titled “Step 7: Download and Review Evaluation Results”

Download the evaluation results to analyze accuracy:

✓ Synchronous Evaluation: Evaluation runs as the final step in the workflow before completion. When a document shows status “COMPLETE”, all processing including evaluation is finished - results are immediately available for download.

Terminal window
# Download evaluation results (no waiting needed)
idp-cli download-results \
--stack-name eval-testing \
--batch-id eval-run-001 \
--output-dir ~/eval-results/ \
--file-types evaluation
# Verify evaluation data is present
ls -la ~/eval-results/eval-run-001/invoice.pdf/evaluation/
# Should show: report.json and report.md

Review evaluation report:

Terminal window
# View detailed evaluation metrics
cat ~/eval-results/eval-run-001/invoice.pdf/evaluation/report.json | jq .
**View human-readable report:**
```bash
# Markdown report with visual formatting
cat ~/eval-results/eval-run-001/invoice.pdf/evaluation/report.md
---
## Evaluation Analytics
The IDP Accelerator provides multiple ways to analyze evaluation results across batches and at scale.
### Query Aggregated Results with Athena
The accelerator automatically stores evaluation metrics in Athena tables for SQL-based analysis.
**Available Tables:**
- `evaluation_results` - Per-document evaluation metrics
- `evaluation_attributes` - Per-attribute scores
- `evaluation_summary` - Aggregated statistics
**Example Queries:**
```sql
-- Overall accuracy across all batches
SELECT
AVG(overall_accuracy) as avg_accuracy,
COUNT(*) as total_documents,
SUM(CASE WHEN overall_accuracy >= 0.95 THEN 1 ELSE 0 END) as high_accuracy_count
FROM evaluation_results
WHERE batch_id LIKE 'eval-run-%';
-- Attribute-level accuracy
SELECT
attribute_name,
AVG(score) as avg_score,
COUNT(*) as total_occurrences,
SUM(CASE WHEN match = true THEN 1 ELSE 0 END) as correct_count
FROM evaluation_attributes
GROUP BY attribute_name
ORDER BY avg_score DESC;
-- Compare accuracy across different configurations
SELECT
batch_id,
AVG(overall_accuracy) as accuracy,
COUNT(*) as doc_count
FROM evaluation_results
WHERE batch_id IN ('config-v1', 'config-v2', 'config-v3')
GROUP BY batch_id;

Access Athena:

Terminal window
# Get Athena database name from stack outputs
aws cloudformation describe-stacks \
--stack-name eval-testing \
--query 'Stacks[0].Outputs[?OutputKey==`ReportingDatabase`].OutputValue' \
--output text
# Query via AWS Console or CLI
aws athena start-query-execution \
--query-string "SELECT * FROM evaluation_results LIMIT 10" \
--result-configuration OutputLocation=s3://your-results-bucket/

For detailed Athena table schemas and query examples, see:


The IDP web UI provides an Agent Analytics feature for visual analysis of evaluation results.

Access the UI:

  1. Get web UI URL from stack outputs:
Terminal window
aws cloudformation describe-stacks \
--stack-name eval-testing \
--query 'Stacks[0].Outputs[?OutputKey==`ApplicationWebURL`].OutputValue' \
--output text
  1. Login with admin credentials (from deployment email)

  2. Navigate to AnalyticsAgent Analytics

Available Analytics:

  • Accuracy Trends - Track accuracy over time across batches
  • Attribute Heatmaps - Visualize which attributes perform best/worst
  • Batch Comparisons - Compare different configurations side-by-side
  • Error Analysis - Identify common error patterns
  • Confidence Correlation - Analyze relationship between assessment confidence and accuracy

Key Features:

  • Interactive charts and visualizations
  • Filter by batch, date range, document type, or attribute
  • Export results to CSV for further analysis
  • Drill-down to individual document details

For complete Agent Analytics documentation, see:


Required Field:

  • document_path: Local file path or full S3 URI (s3://bucket/key)

Optional Field:

  • baseline_source: Path or S3 URI to validated baseline for evaluation

Note: Document IDs are auto-generated from filenames (e.g., invoice.pdfinvoice)

Examples:

document_path
/home/user/docs/invoice.pdf
/home/user/docs/w2.pdf
s3://external-bucket/statement.pdf
document_path,baseline_source
/local/invoice.pdf,s3://baselines/invoice/
/local/w2.pdf,/local/validated-baselines/w2/
s3://docs/statement.pdf,s3://baselines/statement/
[
{
"document_path": "/local/invoice.pdf",
"baseline_source": "s3://baselines/invoice/"
},
{
"document_path": "s3://bucket/w2.pdf",
"baseline_source": "/local/baselines/w2/"
}
]

Document Type (Auto-detected):

  • s3://... → S3 file (copied to InputBucket)
  • Absolute/relative path → Local file (uploaded to InputBucket)

Document ID (Auto-generated):

  • From filename without extension
  • Example: invoice-2024.pdfinvoice-2024
  • Subdirectories preserved: W2s/john.pdfW2s/john

Important:

  • ⚠️ Duplicate filenames not allowed
  • ✅ Use directory structure for organization (e.g., clientA/invoice.pdf, clientB/invoice.pdf)
  • ✅ S3 URIs can reference any bucket (automatically copied)

Test different extraction prompts or configurations:

Terminal window
# Test with configuration v1
idp-cli deploy --stack-name my-stack --custom-config ./config-v1.yaml --wait
idp-cli run-inference --stack-name my-stack --dir ./test-set/ --batch-id config-v1 --monitor
# Download and analyze results
idp-cli download-results --stack-name my-stack --batch-id config-v1 --output-dir ./results-v1/
# Test with configuration v2
idp-cli deploy --stack-name my-stack --custom-config ./config-v2.yaml --wait
idp-cli run-inference --stack-name my-stack --dir ./test-set/ --batch-id config-v2 --monitor
# Compare in Athena
# SELECT batch_id, AVG(overall_accuracy) FROM evaluation_results
# WHERE batch_id IN ('config-v1', 'config-v2') GROUP BY batch_id;

Process thousands of documents efficiently:

Terminal window
# Generate manifest for large dataset
idp-cli generate-manifest \
--dir ./production-documents/ \
--output large-batch-manifest.csv
# Validate before processing
idp-cli validate-manifest --manifest large-batch-manifest.csv
# Process in background (no --monitor flag)
idp-cli run-inference \
--stack-name production-stack \
--manifest large-batch-manifest.csv \
--batch-id production-batch-001
# Check status later
idp-cli status \
--stack-name production-stack \
--batch-id production-batch-001

Integrate into automated pipelines:

#!/bin/bash
# ci-test.sh - Automated accuracy testing
# Run processing with evaluation
idp-cli run-inference \
--stack-name ci-stack \
--manifest test-suite-with-baselines.csv \
--batch-id ci-test-$BUILD_ID \
--monitor
# Download evaluation results
idp-cli download-results \
--stack-name ci-stack \
--batch-id ci-test-$BUILD_ID \
--output-dir ./ci-results/ \
--file-types evaluation
# Parse results and fail if accuracy below threshold
python check_accuracy.py ./ci-results/ --min-accuracy 0.90
# Exit code 0 if passed, 1 if failed
exit $?

Stop all running workflows for a stack. Useful for halting processing during development or when issues are detected.

Usage:

Terminal window
idp-cli stop-workflows [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --skip-purge: Skip purging the SQS queue
  • --skip-stop: Skip stopping Step Function executions
  • --region: AWS region (optional)

Examples:

Terminal window
# Stop all workflows (purge queue + stop executions)
idp-cli stop-workflows --stack-name my-stack
# Only purge the queue (don't stop running executions)
idp-cli stop-workflows --stack-name my-stack --skip-stop
# Only stop executions (don't purge queue)
idp-cli stop-workflows --stack-name my-stack --skip-purge

Run load tests by copying files to the input bucket at specified rates.

Usage:

Terminal window
idp-cli load-test [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --source-file (required): Source file to copy (local path or s3://bucket/key)
  • --rate: Files per minute (default: 100)
  • --duration: Duration in minutes (default: 1)
  • --schedule: CSV schedule file (minute,count) - overrides —rate and —duration
  • --dest-prefix: Destination prefix in input bucket (default: load-test)
  • --config-version: Configuration version to use for processing (default: active version)
  • --region: AWS region (optional)

Examples:

Terminal window
# Constant rate: 100 files/minute for 5 minutes
idp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --rate 100 --duration 5
# High volume: 2500 files/minute for 1 minute
idp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --rate 2500
# Use schedule file for variable rates
idp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --schedule schedule.csv
# Use S3 source file
idp-cli load-test --stack-name my-stack --source-file s3://my-bucket/test.pdf --rate 500
# Load test with a specific config version
idp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --rate 100 --config-version v2

Schedule File Format (CSV):

minute,count
1,100
2,200
3,500
4,1000
5,500

See lib/idp_cli_pkg/examples/load-test-schedule.csv for a sample schedule file.


Remove residual AWS resources left behind from deleted IDP CloudFormation stacks.

⚠️ CAUTION: This command permanently deletes AWS resources. Always run with --dry-run first.

Intended Use: This command is designed for development and test accounts where IDP stacks are frequently created and deleted, and where the consequences of accidentally deleting resources or data are low. Do not use this command in production accounts where data retention is critical. For production cleanup, manually review and delete resources through the AWS Console.

Usage:

Terminal window
idp-cli remove-deleted-stack-resources [OPTIONS]

How It Works:

This command safely identifies and removes ONLY resources belonging to IDP stacks that have been deleted:

  1. Multi-region Stack Discovery - Scans CloudFormation in multiple regions (us-east-1, us-west-2, eu-central-1 by default)
  2. IDP Stack Identification - Identifies IDP stacks by their Description (“AWS GenAI IDP Accelerator”) or naming patterns (IDP-*, PATTERN1/2/3)
  3. Active Stack Protection - Tracks both ACTIVE and DELETED stacks; resources from active stacks are NEVER touched
  4. Safe Cleanup - Only targets resources belonging to stacks in DELETE_COMPLETE state

Safety Features:

  • Resources from ACTIVE stacks are protected and skipped
  • Resources from UNKNOWN stacks (not verified as IDP) are skipped
  • Interactive confirmation for each resource (unless —yes)
  • Options: y=yes, n=no, a=yes to all of type, s=skip all of type
  • —dry-run mode shows exactly what would be deleted

Resources Cleaned:

  • CloudFront distributions and response header policies
  • CloudWatch log groups
  • AppSync APIs
  • IAM policies
  • CloudWatch Logs resource policy entries
  • S3 buckets (automatically emptied before deletion)
  • DynamoDB tables (PITR disabled before deletion)

Note: This command targets resources that remain in AWS after IDP stacks have already been deleted. These are typically resources with RetainOnDelete policies or non-empty S3 buckets that CloudFormation couldn’t delete. All resources are identified by their naming pattern and verified against the deleted stack registry before deletion.

Options:

  • --region: Primary AWS region for regional resources (default: us-west-2)
  • --profile: AWS profile to use
  • --dry-run: Preview changes without making them (RECOMMENDED first step)
  • --yes, -y: Auto-approve all deletions (skip confirmations)
  • --check-stack-regions: Comma-separated regions to check for stacks (default: us-east-1,us-west-2,eu-central-1)

Examples:

Terminal window
# RECOMMENDED: Always dry-run first to see what would be deleted
idp-cli remove-deleted-stack-resources --dry-run
# Interactive cleanup with confirmations for each resource
idp-cli remove-deleted-stack-resources
# Use specific AWS profile
idp-cli remove-deleted-stack-resources --profile my-profile
# Auto-approve all deletions (USE WITH CAUTION)
idp-cli remove-deleted-stack-resources --yes
# Check additional regions for stacks
idp-cli remove-deleted-stack-resources --check-stack-regions us-east-1,us-west-2,eu-central-1,eu-west-1

CloudFront Two-Phase Cleanup:

CloudFront requires distributions to be disabled before deletion:

  1. First run: Disables orphaned distributions (you confirm each)
  2. Wait 15-20 minutes for CloudFront global propagation
  3. Second run: Deletes the previously disabled distributions

Interactive Confirmation:

Delete orphaned CloudFront distribution?
Resource: E1H6W47Z36CQE2 (exists in AWS)
Originally from stack: IDP-P2-DevTest1
Stack status: DELETE_COMPLETE (stack no longer exists)
Stack was in region: us-west-2
Options: y=yes, n=no, a=yes to all CloudFront distribution, s=skip all CloudFront distribution
Delete? [y/n/a/s]:

Important Limitation - 90-Day Window:

CloudFormation only retains deleted stack information for approximately 90 days. After this period, stacks in DELETE_COMPLETE status are removed from the CloudFormation API.

This means:

  • Resources from stacks deleted within the past 90 days → Identified and offered for cleanup
  • Resources from stacks deleted more than 90 days ago → Not identified (silently skipped)

Best Practice: Run remove-deleted-stack-resources promptly after deleting IDP stacks to ensure complete cleanup. For maximum effectiveness, run this command within 90 days of stack deletion.


Generate an IDP configuration template from system defaults.

Usage:

Terminal window
idp-cli config-create [OPTIONS]

Options:

  • --features: Feature set (default: min)
    • min: classification, extraction, classes only (simplest)
    • core: min + ocr, assessment
    • all: all sections with full defaults
    • Or comma-separated list: "classification,extraction,summarization"
  • --output, -o: Output file path (default: stdout)
  • --include-prompts: Include full prompt templates (default: stripped for readability)
  • --no-comments: Omit explanatory header comments

Examples:

Terminal window
# Generate minimal config to stdout
idp-cli config-create
# Generate full config with all sections
idp-cli config-create --features all --output full-config.yaml
# Custom section selection
idp-cli config-create --features "classification,extraction,summarization" --output config.yaml

Validate a configuration file against system defaults and Pydantic models.

Usage:

Terminal window
idp-cli config-validate [OPTIONS]

Options:

  • --config-file, -f (required): Path to configuration file to validate
  • --show-merged: Show the full merged configuration
  • --strict: Fail validation if config contains unknown or deprecated fields

Examples:

Terminal window
# Validate a config file
idp-cli config-validate --config-file ./my-config.yaml
# Show full merged config
idp-cli config-validate --config-file ./config.yaml --show-merged
# Strict mode (fails if config has unknown or deprecated fields — useful for CI/CD)
idp-cli config-validate --config-file ./config.yaml --strict

Download configuration from a deployed IDP stack.

Usage:

Terminal window
idp-cli config-download [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --output, -o: Output file path (default: stdout)
  • --format: Output format - full (default) or minimal (only differences from defaults)
  • --config-version: Configuration version to download (e.g., v1, v2). If not specified, downloads active version
  • --region: AWS region (optional)

Examples:

Terminal window
# Download full config from active version
idp-cli config-download --stack-name my-stack --output config.yaml
# Download specific version
idp-cli config-download --stack-name my-stack --config-version v2 --output config.yaml
# Download minimal config (only customizations)
idp-cli config-download --stack-name my-stack --format minimal --output config.yaml
# Print to stdout
idp-cli config-download --stack-name my-stack

Upload a configuration file to a deployed IDP stack.

Usage:

Terminal window
idp-cli config-upload [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --config-file, -f (required): Path to configuration file (YAML or JSON)
  • --validate/--no-validate: Validate config before uploading (default: validate)
  • --config-version (required): Configuration version to update (e.g., default, v1, v2). If the version doesn’t exist, it will be created automatically.
  • --version-description: Description for the configuration version (used when creating new versions)
  • --region: AWS region (optional)

Examples:

Terminal window
# Upload config to active version
idp-cli config-upload --stack-name my-stack --config-file ./config.yaml --config-version default
# Update existing version
idp-cli config-upload --stack-name my-stack --config-file ./config.yaml --config-version Production
# Create new version with description
idp-cli config-upload --stack-name my-stack --config-file ./config.yaml --config-version NewVersion --version-description "Test configuration for new feature"
# Skip validation (use with caution)
idp-cli config-upload --stack-name my-stack --config-file ./config.yaml --no-validate

List all configuration versions in a deployed IDP stack.

Usage:

Terminal window
idp-cli config-list [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --region: AWS region (optional)

Examples:

Terminal window
# List all configuration versions
idp-cli config-list --stack-name my-stack

Output: Shows a table with version names, active status, creation/update timestamps, and descriptions.


Activate a configuration version in a deployed IDP stack.

Automatic BDA Sync: If the configuration version has use_bda enabled, this command will automatically sync the configuration to BDA (Bedrock Data Automation) before activation. This ensures BDA blueprints are up-to-date and matches the UI behavior.

Usage:

Terminal window
idp-cli config-activate [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --config-version (required): Configuration version to activate
  • --region: AWS region (optional)

Examples:

Terminal window
# Activate a specific version
idp-cli config-activate --stack-name my-stack --config-version v2
# Activate default version
idp-cli config-activate --stack-name my-stack --config-version default

Behavior:

  1. Validates the configuration version exists
  2. If use_bda is enabled in the configuration:
    • Syncs IDP document classes to BDA blueprints
    • Creates a new BDA project if none exists
    • Updates BDA sync status
  3. Activates the configuration version
  4. All new document processing will use this configuration

Note: If BDA sync fails (when use_bda is enabled), the activation will be aborted to prevent processing errors.

**Notes:**
- Sets the specified version as active for all new document processing
- Version must exist (use `config-list` to see available versions)
---
### `config-delete`
Delete a configuration version from a deployed IDP stack.
**Usage:**
```bash
idp-cli config-delete [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --config-version (required): Configuration version to delete
  • --force: Skip confirmation prompt
  • --region: AWS region (optional)

Examples:

Terminal window
# Delete a version with confirmation
idp-cli config-delete --stack-name my-stack --config-version old-version
# Delete without confirmation prompt
idp-cli config-delete --stack-name my-stack --config-version old-version --force

Restrictions:

  • Cannot delete the ‘default’ configuration version
  • Cannot delete currently active versions (activate another version first)
  • Includes confirmation prompt unless --force is used

What Happens:

  1. Loads and parses your YAML or JSON config file
  2. Validates against system defaults (unless --no-validate)
  3. If version exists: Updates the existing version with the uploaded configuration (saved as a complete snapshot)
  4. If version doesn’t exist: Creates a new version with the uploaded configuration
  5. Uploads to the stack’s ConfigurationTable in DynamoDB
  6. Configuration is immediately available for document processing

Configuration Versioning:

  • Existing version: Saves the uploaded configuration as the full version snapshot
  • New version: Creates a new independent version with the uploaded configuration
  • Version descriptions: Can be added to new versions for better organization

For full details on configuration versioning, see configuration-versions.md.

This uses the same mechanism as the Web UI configuration management system.


Discover document class schemas from sample documents using Amazon Bedrock.

Two modes:

  • Stack-connected (--stack-name): Uses stack’s discovery config and saves schema to DynamoDB configuration
  • Local (no --stack-name): Uses system default Bedrock settings, prints schema to stdout without saving

Ground truth matching: Ground truth files (-g) are auto-matched to documents (-d) by filename stem. For example, invoice.pdf matches invoice.json. Unmatched documents run without ground truth.

Output behavior:

  • Single document: -o writes the schema to the specified file
  • Batch + -o is a directory (or has no extension): writes one {class_name}.json per schema
  • Batch + -o is a file: writes all schemas as a JSON array
Terminal window
# Single document (local mode — no stack needed)
idp-cli discover -d ./invoice.pdf
# With ground truth (matched by filename stem)
idp-cli discover -d ./invoice.pdf -g ./invoice.json
# Save schema to file
idp-cli discover -d ./form.pdf -o ./form-schema.json
# With class name hint (guides LLM to use specific class name)
idp-cli discover -d ./form.pdf --class-hint "W2 Tax Form"
# Batch with auto-matched ground truth
idp-cli discover -d ./invoice.pdf -d ./w2.pdf -g ./invoice.json -g ./w2.json
# Batch output to directory (one file per schema)
idp-cli discover -d ./invoice.pdf -d ./w2.pdf -o ./schemas/
# Batch output to single file (JSON array)
idp-cli discover -d ./invoice.pdf -d ./w2.pdf -o ./all-schemas.json
# Multi-section: discover specific page ranges from a single PDF
idp-cli discover -d ./lending_package.pdf \
--page-range "1-2" --page-label "Cover Letter" \
--page-range "3-5" --page-label "W2 Form" \
--page-range "6-8" --page-label "Bank Statement" \
-o ./schemas/
# Auto-detect sections then discover each
idp-cli discover -d ./lending_package.pdf --auto-detect -o ./schemas/
# Only detect section boundaries (no discovery)
idp-cli discover -d ./lending_package.pdf --auto-detect --detect-only
# Auto-detect with output to file
idp-cli discover -d ./lending_package.pdf --auto-detect --detect-only -o sections.json
# Stack mode (saves to config)
idp-cli discover --stack-name my-stack -d ./invoice.pdf --config-version v2
OptionDescription
--stack-nameCloudFormation stack name (optional — omit for local mode)
-d, --documentPath to document file (required, repeatable for batch)
-g, --ground-truthPath to JSON ground truth file(s) (repeatable, auto-matched by filename stem)
--config-versionConfig version to save to (stack mode only)
-o, --outputOutput path: file (single/JSON array) or directory (one file per schema)
--class-hintHint for the document class name (e.g., “W2 Form”). The LLM will use this as $id.
--page-rangePage range to discover (e.g., “1-3”). Repeatable for multi-section. Requires PDF.
--page-labelLabel for corresponding --page-range (e.g., “W2 Form”). Used as class name hint per range.
--auto-detectAuto-detect document section boundaries using AI, then discover each section.
--detect-onlyOnly detect section boundaries (use with --auto-detect). Prints boundaries without running discovery.
--regionAWS region

Discover document classes from a collection of documents using embedding-based clustering and agentic analysis.

Unlike discover (which analyzes one document at a time), discover-multidoc analyzes a directory of mixed documents to automatically identify document types, cluster similar documents, and generate JSON Schemas for each discovered class.

Requires: pip install idp-common[multi_document_discovery] (scikit-learn, scipy, numpy, strands-agents)

Note: Requires at least 2 documents per expected class. Clusters with fewer than 2 documents are filtered as noise. For discovering schemas from individual documents, use discover instead.

Usage:

Terminal window
idp-cli discover-multidoc [OPTIONS]

Options:

OptionDescription
--dirDirectory containing documents to analyze (recursive scan)
-d, --documentIndividual document files (repeatable: -d doc1.pdf -d doc2.png)
--embedding-modelBedrock embedding model ID (default: us.cohere.embed-v4:0)
--analysis-modelBedrock LLM for cluster analysis (default: us.anthropic.claude-sonnet-4-6)
-o, --outputOutput directory for discovered JSON schemas
--stack-nameCloudFormation stack name (required for --save-to-config)
--config-versionConfiguration version to save schemas to
--save-to-configSave discovered schemas to the stack’s configuration
--regionAWS region

Examples:

Terminal window
# Discover from a directory of documents
idp-cli discover-multidoc --dir ./samples/
# Discover with explicit files
idp-cli discover-multidoc -d doc1.pdf -d doc2.png -d doc3.jpg
# Save schemas to output directory
idp-cli discover-multidoc --dir ./samples/ -o ./schemas/
# Save to stack configuration
idp-cli discover-multidoc --dir ./samples/ --save-to-config \
--stack-name IDP --config-version v2
# Use custom models
idp-cli discover-multidoc --dir ./samples/ \
--embedding-model us.amazon.titan-embed-image-v1 \
--analysis-model us.anthropic.claude-sonnet-4-6

Pipeline stages (shown in Rich progress output):

  1. Document scan — Finds PDF, PNG, JPG, TIFF files in the directory
  2. Embedding — Generates image embeddings via Bedrock (Cohere Embed v4)
  3. Clustering — KMeans + silhouette analysis to find optimal number of clusters
  4. Analysis — Strands agent analyzes each cluster to identify the document class and generate a JSON Schema
  5. Reflection — Agent generates a summary report of all discovered classes

Output: Results table showing cluster ID, classification, document count, field count, and status. Optionally writes individual JSON schema files and a reflection report.


Synchronize IDP document class schemas with BDA (Bedrock Data Automation) blueprints.

Usage:

Terminal window
idp-cli config-sync-bda [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --direction: Sync direction — bidirectional (default), bda-to-idp, or idp-to-bda
  • --mode: Sync mode — replace (default, full alignment) or merge (additive, don’t delete)
  • --config-version: Configuration version to sync (default: active version)
  • --region: AWS region (optional)

Examples:

Terminal window
# Bidirectional sync (default)
idp-cli config-sync-bda --stack-name my-stack
# Import BDA blueprints into IDP config
idp-cli config-sync-bda --stack-name my-stack --direction bda-to-idp
# Push IDP classes to BDA blueprints
idp-cli config-sync-bda --stack-name my-stack --direction idp-to-bda
# Merge mode (additive — don't remove existing items)
idp-cli config-sync-bda --stack-name my-stack --direction bda-to-idp --mode merge
# Sync specific config version
idp-cli config-sync-bda --stack-name my-stack --config-version v2

Interactive Agent Companion Chat from the terminal. Provides access to the full multi-agent orchestrator including Analytics, Error Analyzer, Code Intelligence, and any configured External MCP Agents.

The chat command runs the same orchestrator as the Web UI’s Agent Companion Chat, but locally in your terminal — with real-time streaming and multi-turn conversation support.

Usage:

Terminal window
idp-cli chat [OPTIONS]

Options:

  • --stack-name (required): CloudFormation stack name
  • --region: AWS region (optional)
  • --prompt: Single-shot prompt — sends one message, prints the response, and exits. Useful for scripts and CI/CD.
  • --enable-code-intelligence: Enable the Code Intelligence Agent (disabled by default because it uses external third-party services)

Examples:

Terminal window
# Interactive mode — multi-turn conversation
idp-cli chat --stack-name my-stack
# Single-shot mode — for scripts and automation
idp-cli chat --stack-name my-stack --prompt "What is the avg accuracy for the last test run?"
# With Code Intelligence enabled
idp-cli chat --stack-name my-stack --enable-code-intelligence
# Pipe output in scripts
idp-cli chat --stack-name my-stack --prompt "How many documents failed today?" 2>/dev/null

Interactive session example:

IDP Agent Chat
Stack: my-stack
✓ Ready Agents: Analytics Agent · Error Analyzer Agent · Code Intelligence Agent
Type /quit to exit
You: What is the avg accuracy for test run Fake-W2-Tax-Forms-20260320?
⟶ Analytics Agent
The average accuracy for test run Fake-W2-Tax-Forms-20260320 is 0.867 (86.7%) across 95 documents.
You: Break that down by document type
⟶ Analytics Agent
...
You: /quit
Goodbye.

SDK usage:

from idp_sdk import IDPClient
client = IDPClient(stack_name="my-stack")
# Single message
resp = client.chat.send_message("How many documents were processed today?")
print(resp.response)
# Multi-turn conversation
resp2 = client.chat.send_message("Break down by type", session_id=resp.session_id)
print(resp2.response)

Prerequisites:

  • Requires idp_common[agents] to be installed: pip install -e 'lib/idp_common_pkg[agents]'
  • Requires Amazon Bedrock model access (Claude or Nova models)
  • Stack must be deployed with Agent Companion Chat resources (DynamoDB tables, Athena database)

Error: Stack 'my-stack' is not in a valid state

Solution:

Terminal window
# Verify stack exists
aws cloudformation describe-stacks --stack-name my-stack

Error: Access Denied when uploading files

Solution: Ensure AWS credentials have permissions for:

  • S3: PutObject, GetObject on InputBucket/OutputBucket
  • SQS: SendMessage on DocumentQueue
  • Lambda: InvokeFunction on LookupFunction
  • CloudFormation: DescribeStacks, ListStackResources

Error: Duplicate filenames found

Solution: Ensure unique filenames or use directory structure:

document_path
./clientA/invoice.pdf
./clientB/invoice.pdf

Issue: Evaluation results missing even with baselines

Checklist:

  1. Verify baseline_source column exists in manifest
  2. Confirm baseline paths are correct and accessible
  3. Check baseline directory has correct structure (sections/1/result.json)
  4. Review CloudWatch logs for EvaluationFunction

Issue: Cannot retrieve document status

Solution:

Terminal window
# Verify LookupFunction exists
aws lambda get-function --function-name <LookupFunctionName>
# Check CloudWatch logs
aws logs tail /aws/lambda/<LookupFunctionName> --follow

Run the test suite:

Terminal window
cd lib/idp_cli_pkg
pytest

Run specific tests:

Terminal window
pytest tests/test_manifest_parser.py -v

For issues or questions:

  • Check CloudWatch logs for Lambda functions
  • Review AWS Console for resource status
  • Open an issue on GitHub