IDP CLI - Command Line Interface for Batch Document Processing
IDP CLI - Command Line Interface for Batch Document Processing
Section titled “IDP CLI - Command Line Interface for Batch Document Processing”A command-line tool for batch document processing with the GenAI IDP Accelerator.
Features
Section titled “Features”✨ Batch Processing - Process multiple documents from CSV/JSON manifests
📊 Live Progress Monitoring - Real-time updates with rich terminal UI
🔄 Resume Monitoring - Stop and resume monitoring without affecting processing
📁 Flexible Input - Support for local files and S3 references
🔍 Comprehensive Status - Track queued, running, completed, and failed documents
📈 Batch Analytics - Success rates, durations, and detailed error reporting
🎯 Evaluation Framework - Validate accuracy against baselines with detailed metrics
💬 Agent Chat - Interactive Agent Companion Chat from the terminal with Analytics, Error Analyzer, and more
Demo:
Table of Contents
Section titled “Table of Contents”- Installation
- Quick Start
- Commands Reference
- deploy
- delete
- process
- reprocess
- status
- download-results
- delete-documents
- generate-manifest
- validate-manifest
- list-batches
- stop-workflows
- load-test
- discover
- discover-multidoc
- remove-deleted-stack-resources
- config-create
- config-validate
- config-download
- config-upload
- config-list
- config-activate
- config-delete
- chat
- Complete Evaluation Workflow
- Evaluation Analytics
- Manifest Format Reference
- Advanced Usage
- Troubleshooting
Installation
Section titled “Installation”Prerequisites
Section titled “Prerequisites”- Python 3.12 or higher
- AWS credentials configured (via AWS CLI or environment variables)
- An active IDP Accelerator CloudFormation stack
Install from source
Section titled “Install from source”make setup-venvsource .venv/bin/activateInstall with test dependencies
Section titled “Install with test dependencies”cd lib/idp_cli_pkgpip install -e ".[test]"Quick Start
Section titled “Quick Start”Global Options
Section titled “Global Options”The CLI supports an optional --profile parameter to specify which AWS credentials profile to use:
idp-cli --profile my-profile <command> [options]- Can be placed anywhere in the command
- Only affects that specific command execution
- Automatically applies to all AWS SDK calls
- If not specified, uses default AWS credentials
Examples:
# Profile before commandidp-cli --profile production deploy --stack-name my-stack ...
# Profile after commandidp-cli deploy --profile production --stack-name my-stack ...
# Profile at the endidp-cli deploy --stack-name my-stack --profile production ...Deploy a stack and process documents in 3 commands:
Section titled “Deploy a stack and process documents in 3 commands:”# 1. Deploy stack (10-15 minutes)idp-cli deploy \ --stack-name my-idp-stack \ --admin-email your.email@example.com \ --wait
# 2. Process documents from a local directoryidp-cli process \ --stack-name my-idp-stack \ --dir ./my-documents/ \ --monitor
# 3. Download resultsidp-cli download-results \ --stack-name my-idp-stack \ --batch-id <batch-id-from-step-2> \ --output-dir ./results/That’s it! Your documents are processed with OCR, classification, extraction, assessment, and summarization.
For evaluation workflows with accuracy metrics, see the Complete Evaluation Workflow section.
Commands Reference
Section titled “Commands Reference”deploy
Section titled “deploy”Deploy or update an IDP CloudFormation stack.
Usage:
idp-cli deploy [OPTIONS]Required for New Stacks:
--stack-name: CloudFormation stack name--admin-email: Admin user email
Optional Parameters:
--from-code: Deploy from local code by building and publishing artifacts (path to project root)--template-url: URL to CloudFormation template in S3 (optional, auto-selected based on region)--custom-config: Path to local config file or S3 URI--max-concurrent: Maximum concurrent workflows (default: 100)--log-level: Logging level (DEBUG,INFO,WARN,ERROR) (default: INFO)--enable-hitl: Enable Human-in-the-Loop (trueorfalse)--parameters: Additional parameters askey=value,key2=value2--wait: Wait for stack operation to complete--no-rollback: Disable rollback on stack creation failure--region: AWS region (optional, auto-detected)--role-arn: CloudFormation service role ARN (optional)
Note: --from-code and --template-url are mutually exclusive. Use --from-code for development/testing from local source, or --template-url for production deployments.
Auto-Monitoring for In-Progress Operations:
If you run deploy on a stack that already has an operation in progress (CREATE, UPDATE, ROLLBACK), the command automatically switches to monitoring mode instead of failing. This is useful if you forgot to use --wait on the initial deploy - simply run the same command again to monitor progress:
# First run without --wait starts the deployment$ idp-cli deploy --stack-name my-stack --admin-email user@example.com✓ Stack CREATE initiated successfully!
# Second run - automatically monitors the in-progress operation$ idp-cli deploy --stack-name my-stackStack 'my-stack' has an operation in progressCurrent status: CREATE_IN_PROGRESS
Switching to monitoring mode...
[Live progress display...]
✓ Stack CREATE completed successfully!Supported in-progress states: CREATE_IN_PROGRESS, UPDATE_IN_PROGRESS, DELETE_IN_PROGRESS, ROLLBACK_IN_PROGRESS, UPDATE_ROLLBACK_IN_PROGRESS, and cleanup states.
Examples:
# Create new stackidp-cli deploy \ --stack-name my-idp \ --admin-email user@example.com \ --wait
# Update with custom configidp-cli deploy \ --stack-name my-idp \ --custom-config ./updated-config.yaml \ --wait
# Update parametersidp-cli deploy \ --stack-name my-idp \ --max-concurrent 200 \ --log-level DEBUG \ --wait
# Deploy with custom template URL (for regions not auto-supported)idp-cli deploy \ --stack-name my-idp \ --admin-email user@example.com \ --template-url https://s3.eu-west-1.amazonaws.com/my-bucket/idp-main.yaml \ --region eu-west-1 \ --wait
# Deploy with CloudFormation service role and permissions boundaryidp-cli deploy \ --stack-name my-idp \ --admin-email user@example.com \ --role-arn arn:aws:iam::123456789012:role/IDP-Cloudformation-Service-Role \ --parameters "PermissionsBoundaryArn=arn:aws:iam::123456789012:policy/MyPermissionsBoundary" \ --wait
# Deploy from local source code (for development/testing)idp-cli deploy \ --stack-name my-idp-dev \ --from-code . \ --admin-email user@example.com \ --wait
# Update existing stack from local code changesidp-cli deploy \ --stack-name my-idp-dev \ --from-code . \ --wait
# Deploy with rollback disabled (useful for debugging failed deployments)idp-cli deploy \ --stack-name my-idp \ --admin-email user@example.com \ --no-rollback \ --waitdelete
Section titled “delete”Delete an IDP CloudFormation stack.
⚠️ WARNING: This permanently deletes all stack resources.
Usage:
idp-cli delete [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--force: Skip confirmation prompt--empty-buckets: Empty S3 buckets before deletion (required if buckets contain data)--force-delete-all: Force delete ALL remaining resources after CloudFormation deletion (S3 buckets, CloudWatch logs, DynamoDB tables)--wait: Wait for deletion to complete (default: no-wait)--region: AWS region (optional)
S3 Bucket Behavior:
- LoggingBucket:
DeletionPolicy: Retain- Always kept (unless using--force-delete-all) - All other buckets:
DeletionPolicy: RetainExceptOnCreate- Deleted if empty - CloudFormation can ONLY delete S3 buckets if they’re empty
- Use
--empty-bucketsto automatically empty buckets before deletion - Use
--force-delete-allto delete ALL remaining resources after CloudFormation completes
Force Delete All Behavior:
The --force-delete-all flag performs a comprehensive cleanup AFTER CloudFormation deletion completes:
- CloudFormation Deletion Phase: Standard stack deletion
- Additional Resource Cleanup Phase (happens with
--waiton all deletions and always with--force-delete-all): Removes stack-specific resources not tracked by CloudFormation:- CloudWatch Log Groups (Lambda functions, Glue crawlers)
- AppSync APIs and their log groups
- CloudFront distributions (two-phase cleanup - initiates disable, takes 15-20 minutes to propagate globally)
- CloudFront Response Headers Policies (from previously deleted stacks)
- IAM custom policies and permissions boundaries
- CloudWatch Logs resource policies
- Retained Resource Cleanup Phase (only with
--force-delete-all): Deletes remaining resources in order:- DynamoDB tables (disables PITR, then deletes)
- CloudWatch Log Groups (matching stack name pattern)
- S3 buckets (regular buckets first, LoggingBucket last)
Resources Always Cleaned Up (with --wait or --force-delete-all):
- IAM custom policies (containing stack name)
- IAM permissions boundary policies
- CloudFront response header policies (custom)
- CloudWatch Logs resource policies (stack-specific)
- AppSync log groups
- Additional log groups containing stack name
- Gracefully handles missing/already-deleted resources
Resources Deleted Only by —force-delete-all:
- All DynamoDB tables from stack
- All CloudWatch Log Groups (retained by CloudFormation)
- All S3 buckets including LoggingBucket
- Handles nested stack resources automatically
Examples:
# Interactive deletion with confirmationidp-cli delete --stack-name test-stack
# Automated deletion (CI/CD)idp-cli delete --stack-name test-stack --force
# Delete with automatic bucket emptyingidp-cli delete --stack-name test-stack --empty-buckets --force
# Force delete ALL remaining resources (comprehensive cleanup)idp-cli delete --stack-name test-stack --force-delete-all --force
# Delete without waitingidp-cli delete --stack-name test-stack --force --no-waitWhat you’ll see (standard deletion):
⚠️ WARNING: Stack Deletion━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Stack: test-stackRegion: us-east-1
S3 Buckets: • InputBucket: 20 objects (45.3 MB) • OutputBucket: 20 objects (123.7 MB) • WorkingBucket: empty
⚠️ Buckets contain data!This action cannot be undone.
Are you sure you want to delete this stack? [y/N]: _What you’ll see (force-delete-all):
⚠️ WARNING: FORCE DELETE ALL RESOURCES━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Stack: test-stackRegion: us-east-1
S3 Buckets: • InputBucket: 20 objects (45.3 MB) • OutputBucket: 20 objects (123.7 MB) • LoggingBucket: 5000 objects (2.3 GB)
⚠️ FORCE DELETE ALL will remove: • All S3 buckets (including LoggingBucket) • All CloudWatch Log Groups • All DynamoDB Tables • Any other retained resources
This happens AFTER CloudFormation deletion completes
This action cannot be undone.━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Are you ABSOLUTELY sure you want to force delete ALL resources? [y/N]: y
Deleting CloudFormation stack...✓ Stack deleted successfully!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Starting force cleanup of retained resources...━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Analyzing retained resources...Found 4 retained resources: • DynamoDB Tables: 0 • CloudWatch Logs: 0 • S3 Buckets: 3
⠋ Deleting S3 buckets... 3/3
✓ Cleanup phase complete!
Resources deleted: • S3 Buckets: 3 - test-stack-inputbucket-abc123 - test-stack-outputbucket-def456 - test-stack-loggingbucket-ghi789
Stack 'test-stack' and all resources completely removed.Use Cases:
- Cleanup test/development environments to avoid charges
- CI/CD pipelines that provision and teardown stacks
- Automated testing with temporary stack creation
- Complete removal of failed stacks with retained resources
- Cleanup of stacks with LoggingBucket and CloudWatch logs
Important Notes:
--force-delete-allautomatically includes--empty-bucketsbehavior- Cleanup phase runs even if CloudFormation deletion fails
- Includes resources from nested stacks automatically
- Safe to run - only deletes resources that weren’t deleted by CloudFormation
- Progress bars show real-time deletion status
Auto-Monitoring for In-Progress Deletions:
If you run delete on a stack that already has a DELETE operation in progress, the command automatically switches to monitoring mode instead of failing. This is useful if you started a deletion without --wait - simply run the command again to monitor:
# First run without --wait starts the deletion$ idp-cli delete --stack-name test-stack --force --no-wait✓ Stack DELETE initiated successfully!
# Second run - automatically monitors the in-progress deletion$ idp-cli delete --stack-name test-stackStack 'test-stack' is already being deletedCurrent status: DELETE_IN_PROGRESS
Switching to monitoring mode...
[Live progress display...]
✓ Stack deleted successfully!Canceling In-Progress Operations:
If a non-delete operation is in progress (CREATE, UPDATE), the delete command offers options to handle it:
$ idp-cli delete --stack-name test-stackStack 'test-stack' has an operation in progress: CREATE_IN_PROGRESS
Options: 1. Wait for CREATE to complete first 2. Cancel the CREATE and proceed with deletion
Do you want to cancel the CREATE and delete the stack? [yes/no/wait]: _- yes: Cancel the operation (if possible) and proceed with deletion
- no: Exit without making changes
- wait: Wait for the current operation to complete, then delete
With --force flag, the command automatically cancels the operation and proceeds with deletion:
# Force mode - automatically cancels and deletes$ idp-cli delete --stack-name test-stack --forceForce mode: Canceling operation and proceeding with deletion...
✓ Stack reached stable state: ROLLBACK_COMPLETE
Proceeding with stack deletion...Note: CREATE operations cannot be cancelled directly - they must complete or roll back naturally. UPDATE operations can be cancelled immediately.
process / run-inference
Section titled “process / run-inference”Process a batch of documents.
Usage:
idp-cli process [OPTIONS]# or (deprecated alias)idp-cli run-inference [OPTIONS]Document Source (choose ONE):
--manifest: Path to manifest file (CSV or JSON)--dir: Local directory containing documents--s3-uri: S3 URI in InputBucket--test-set: Test set ID from test set bucket
Options:
--stack-name(required): CloudFormation stack name--batch-id: Custom batch ID (auto-generated if omitted, ignored with —test-set)--batch-prefix: Prefix for auto-generated batch ID (default:cli-batch)--file-pattern: File pattern for directory/S3 scanning (default:*.pdf)--recursive/--no-recursive: Include subdirectories (default: recursive)--number-of-files: Limit number of files to process--config: Path to configuration YAML file (optional)--config-version: Configuration version to use for processing (e.g., v1, v2)--context: Context description for test run (used with —test-set, e.g., “Model v2.1”, “Production validation”)--monitor: Monitor progress until completion--refresh-interval: Seconds between status checks (default: 5)--region: AWS region (optional)
Test Set Integration: For test runs to appear properly in the Test Studio UI, use either:
--test-set: Process test set directly by ID (recommended for test sets)--manifest: Use manifest file with populated baseline_source column for evaluation tracking
Other options (--dir, --s3-uri) are for general document processing but won’t integrate with test studio tracking.
Examples:
# Process from local directoryidp-cli process \ --stack-name my-stack \ --dir ./documents/ \ --monitor
# Process from manifest with baselines (enables evaluation)idp-cli process \ --stack-name my-stack \ --manifest documents-with-baselines.csv \ --monitor
# Process from manifest with limited filesidp-cli process \ --stack-name my-stack \ --manifest documents-with-baselines.csv \ --number-of-files 10 \ --monitor
# Process test set (integrates with Test Studio UI - use test set ID)idp-cli process \ --stack-name my-stack \ --test-set fcc-example-test \ --monitor
# Process test set with limited files for quick testingidp-cli process \ --stack-name my-stack \ --test-set fcc-example-test \ --number-of-files 5 \ --monitor
# Process test set with custom context (for tracking in Test Studio)idp-cli process \ --stack-name my-stack \ --test-set fcc-example-test \ --context "Model v2.1 - improved prompts" \ --monitor
# Process S3 URIidp-cli process \ --stack-name my-stack \ --s3-uri archive/2024/ \ --monitor
# Process with specific configuration versionidp-cli process \ --stack-name my-stack \ --dir ./documents/ \ --config-version v2 \ --monitor
# Process test set with configuration versionidp-cli process \ --stack-name my-stack \ --test-set fcc-example-test \ --config-version v1 \ --context "Testing with config v1" \ --monitorreprocess / rerun-inference
Section titled “reprocess / rerun-inference”Reprocess existing documents from a specific pipeline step.
Usage:
idp-cli reprocess [OPTIONS]# or (deprecated alias)idp-cli rerun-inference [OPTIONS]Use Cases:
- Test different classification or extraction configurations without re-running OCR
- Fix classification errors and reprocess extraction
- Iterate on prompt engineering rapidly
Options:
--stack-name(required): CloudFormation stack name--step(required): Pipeline step to rerun from (classificationorextraction)- Document Source (choose ONE):
--document-ids: Comma-separated document IDs--batch-id: Batch ID to get all documents from
--force: Skip confirmation prompt (useful for automation)--monitor: Monitor progress until completion--refresh-interval: Seconds between status checks (default: 5)--region: AWS region (optional)
Step Behavior:
classification: Clears page classifications and sections, reruns classification → extraction → assessmentextraction: Keeps classifications, clears extraction data, reruns extraction → assessment
Examples:
# Rerun classification for specific documentsidp-cli reprocess \ --stack-name my-stack \ --step classification \ --document-ids "batch-123/doc1.pdf,batch-123/doc2.pdf" \ --monitor
# Rerun extraction for entire batchidp-cli reprocess \ --stack-name my-stack \ --step extraction \ --batch-id cli-batch-20251015-143000 \ --monitor
# Automated rerun (skip confirmation - perfect for CI/CD)idp-cli reprocess \ --stack-name my-stack \ --step classification \ --batch-id test-set \ --force \ --monitorWhat Gets Cleared:
| Step | Clears | Keeps |
|---|---|---|
classification | Page classifications, sections, extraction results | OCR data (pages, images, text) |
extraction | Section extraction results, attributes | OCR data, page classifications, section structure |
Benefits:
- Leverages existing OCR data (saves time and cost)
- Rapid iteration on classification/extraction configurations
- Perfect for prompt engineering experiments
Demo:
status
Section titled “status”Check status of documents by batch ID, document ID, or search criteria.
Usage:
idp-cli status [OPTIONS]Document Source (choose ONE):
--batch-id: Batch identifier or PK substring to search for (searches tracking table)--document-id: Single document ID (check individual document)
Optional Filters and Display:
--object-status: Filter by status (COMPLETED, FAILED, QUEUED, RUNNING, PROCESSING)--show-details: Show detailed document information in table format--get-time: Calculate and display timing statistics (processing time, queue time, total time)--include-metering: Include Lambda metering statistics (GB-seconds by stage) - requires--get-time
Other Options:
--stack-name(required): CloudFormation stack name--wait: Wait for all documents to complete--refresh-interval: Seconds between status checks (default: 5)--format: Output format -table(default) orjson--region: AWS region (optional)
How —batch-id Works:
The --batch-id option performs a PK substring search in the DynamoDB tracking table. This means:
- It searches for all documents where the PK (Primary Key) contains your search string
- You can search for exact batch IDs:
cli-batch-20251015-143000 - You can search for partial matches:
batch-123finds all documents with “batch-123” in their path - You can search across multiple batches:
invoicefinds all documents with “invoice” in their name
Examples:
# Search for all documents in a batch (PK substring search)idp-cli status \ --stack-name my-stack \ --batch-id cli-batch-20251015-143000
# Search for documents across batches with partial matchidp-cli status \ --stack-name my-stack \ --batch-id batch-123
# Search for completed documents onlyidp-cli status \ --stack-name my-stack \ --batch-id batch-123 \ --object-status COMPLETED
# Search for failed documents with detailsidp-cli status \ --stack-name my-stack \ --batch-id batch-123 \ --object-status FAILED \ --show-details
# Search with timing statisticsidp-cli status \ --stack-name my-stack \ --batch-id batch-123 \ --object-status COMPLETED \ --get-time
# Search with timing and Lambda metering dataidp-cli status \ --stack-name my-stack \ --batch-id test \ --object-status COMPLETED \ --get-time \ --include-metering
# Check single document statusidp-cli status \ --stack-name my-stack \ --document-id batch-123/invoice.pdf
# Monitor documents until completionidp-cli status \ --stack-name my-stack \ --batch-id batch-123 \ --wait
# Get JSON output for scriptingidp-cli status \ --stack-name my-stack \ --batch-id batch-123 \ --format jsonTiming Statistics:
When using --get-time, the command calculates:
- Processing Time: WorkflowStartTime → CompletionTime (actual processing duration)
- Queue Time: QueuedTime → WorkflowStartTime (time waiting in queue)
- Total Time: QueuedTime → CompletionTime (end-to-end duration)
For each metric, you’ll see:
- Average, Median, Min, Max, Standard Deviation, Total
- ObjectKey for min/max values (helps identify outliers)
Lambda Metering:
When using --include-metering with --get-time, you’ll see GB-seconds usage by stage:
- Assessment, OCR, Classification, Extraction, Summarization
- Statistics: Average, Median, Min, Max, Std Dev, Total
- Cost estimates based on AWS Lambda pricing ($0.0000166667 per GB-second)
Example Output with Timing:
$ idp-cli status --stack-name my-stack --batch-id test-batch --object-status COMPLETED --get-time
Searching for documents with PK containing 'test-batch'...✓ Found 25 matching documents
Timing Statistics: Valid documents: 25
Processing Time (WorkflowStartTime → CompletionTime):┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ Metric ┃ Value ┃ ObjectKey ┃┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│ Average │ 45.23s │ ││ Median │ 43.10s │ ││ Minimum │ 32.45s │ test-batch/small-doc.pdf ││ Maximum │ 78.90s │ test-batch/large-doc.pdf ││ Std Dev │ 12.34s │ ││ Total │ 18m 50.75s │ │└──────────┴─────────────┴──────────────────────────────┘Programmatic Use:
The command returns exit codes for scripting:
0- Document(s) completed successfully1- Document(s) failed2- Document(s) still processing
JSON Output Format:
# Single document$ idp-cli status --stack-name my-stack --document-id batch-123/invoice.pdf --format json{ "document_id": "batch-123/invoice.pdf", "status": "COMPLETED", "duration": 125.4, "start_time": "2025-01-01T10:30:45Z", "end_time": "2025-01-01T10:32:50Z", "num_sections": 2, "exit_code": 0}
# Table output includes final status summary$ idp-cli status --stack-name my-stack --document-id batch-123/invoice.pdf[status table]
FINAL STATUS: COMPLETED | Duration: 125.4s | Exit Code: 0Scripting Examples:
#!/bin/bash# Wait for document completion and check resultidp-cli status --stack-name prod --document-id batch-001/invoice.pdf --waitexit_code=$?
if [ $exit_code -eq 0 ]; then echo "Document processed successfully" # Proceed with downstream processingelse echo "Document processing failed" exit 1fi#!/bin/bash# Poll document status in scriptwhile true; do status=$(idp-cli status --stack-name prod --document-id batch-001/invoice.pdf --format json) state=$(echo "$status" | jq -r '.status')
if [ "$state" = "COMPLETED" ]; then echo "Processing complete!" break elif [ "$state" = "FAILED" ]; then echo "Processing failed!" exit 1 fi
sleep 5donedownload-results
Section titled “download-results”Download processing results to local directory.
Usage:
idp-cli download-results [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--batch-id(required): Batch identifier--output-dir(required): Local directory to download to--file-types: File types to download (default:all)- Options:
pages,sections,summary,evaluation, orall
- Options:
--region: AWS region (optional)
Examples:
# Download all resultsidp-cli download-results \ --stack-name my-stack \ --batch-id cli-batch-20251015-143000 \ --output-dir ./results/
# Download only extraction resultsidp-cli download-results \ --stack-name my-stack \ --batch-id cli-batch-20251015-143000 \ --output-dir ./results/ \ --file-types sections
# Download evaluation results onlyidp-cli download-results \ --stack-name my-stack \ --batch-id eval-batch-20251015 \ --output-dir ./eval-results/ \ --file-types evaluationOutput Structure:
./results/└── cli-batch-20251015-143000/ └── invoice.pdf/ ├── pages/ │ └── 1/ │ ├── image.jpg │ ├── rawText.json │ └── result.json ├── sections/ │ └── 1/ │ ├── result.json # Extracted structured data │ └── summary.json ├── summary/ │ ├── fulltext.txt │ └── summary.json └── evaluation/ # Only present if baseline provided ├── report.json # Detailed metrics └── report.md # Human-readable reportdelete-documents
Section titled “delete-documents”Delete documents and all associated data from the IDP system.
⚠️ WARNING: This action cannot be undone.
Usage:
idp-cli delete-documents [OPTIONS]Document Selection (choose ONE):
--document-ids: Comma-separated list of document IDs (S3 object keys) to delete--batch-id: Delete all documents in this batch--pattern: Wildcard pattern to match document keys (e.g."batch-123/*.pdf","*invoice*")
Options:
--stack-name(required): CloudFormation stack name--status-filter: Only delete documents with this status (use with —batch-id or —pattern)- Options:
FAILED,COMPLETED,PROCESSING,QUEUED
- Options:
--dry-run: Show what would be deleted without actually deleting--force,-y: Skip confirmation prompt--region: AWS region (optional)
What Gets Deleted:
- Source files from input bucket
- Processed outputs from output bucket
- DynamoDB tracking records
- List entries in tracking table
Examples:
# Delete specific documents by IDidp-cli delete-documents \ --stack-name my-stack \ --document-ids "batch-123/doc1.pdf,batch-123/doc2.pdf"
# Delete all documents in a batchidp-cli delete-documents \ --stack-name my-stack \ --batch-id cli-batch-20250123
# Delete only failed documents in a batchidp-cli delete-documents \ --stack-name my-stack \ --batch-id cli-batch-20250123 \ --status-filter FAILED
# Dry run to see what would be deletedidp-cli delete-documents \ --stack-name my-stack \ --batch-id cli-batch-20250123 \ --dry-run
# Delete documents matching a wildcard patternidp-cli delete-documents \ --stack-name my-stack \ --pattern "batch-123/*.pdf"
# Delete all failed invoice documents across batchesidp-cli delete-documents \ --stack-name my-stack \ --pattern "*invoice*" \ --status-filter FAILED
# Dry run with pattern to preview matchesidp-cli delete-documents \ --stack-name my-stack \ --pattern "*2024*" \ --dry-run
# Force delete without confirmationidp-cli delete-documents \ --stack-name my-stack \ --document-ids "batch-123/doc1.pdf" \ --forceOutput Example:
Connecting to stack: my-stackGetting documents for batch: cli-batch-20250123Found 15 document(s) in batch (filtered by status: FAILED)
⚠️ Documents to be deleted:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • cli-batch-20250123/doc1.pdf • cli-batch-20250123/doc2.pdf • cli-batch-20250123/doc3.pdf━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Delete 3 document(s) permanently? [y/N]: y
✓ Successfully deleted 3 document(s)Use Cases:
- Clean up failed documents after fixing issues
- Remove test documents from a batch
- Free up storage by removing old processed documents
- Prepare for reprocessing by removing previous results
generate-manifest
Section titled “generate-manifest”Generate a manifest file from directory or S3 URI, or create a test set in the test set bucket.
Usage:
idp-cli generate-manifest [OPTIONS]Options:
- Source (choose ONE):
--dir: Local directory to scan--s3-uri: S3 URI to scan
--baseline-dir: Baseline directory for automatic matching (only with —dir)--output: Output manifest file path (CSV) - optional when using —test-set--file-pattern: File pattern (default:*.pdf)--recursive/--no-recursive: Include subdirectories (default: recursive)--region: AWS region (optional)- Test Set Creation:
--test-set: Test set name - creates folder in test set bucket and uploads files--stack-name: CloudFormation stack name (required with —test-set)
Examples:
# Generate from directoryidp-cli generate-manifest \ --dir ./documents/ \ --output manifest.csv
# Generate with automatic baseline matchingidp-cli generate-manifest \ --dir ./documents/ \ --baseline-dir ./validated-baselines/ \ --output manifest-with-baselines.csv
# Create test set and upload files (no manifest needed - use test set name)idp-cli generate-manifest \ --dir ./documents/ \ --baseline-dir ./baselines/ \ --test-set "fcc example test" \ --stack-name IDP
# Create test set with manifest outputidp-cli generate-manifest \ --dir ./documents/ \ --baseline-dir ./baselines/ \ --test-set "fcc example test" \ --stack-name IDP \ --output test-manifest.csvTest Set Creation:
When using --test-set, the command:
- Requires
--stack-name,--baseline-dir, and--dir - Uploads input files to
s3://test-set-bucket/{test-set-id}/input/ - Uploads baseline files to
s3://test-set-bucket/{test-set-id}/baseline/ - Creates proper test set structure for evaluation workflows
- Test set will be auto-detected by the Test Studio UI
Process the created test set:
# Using test set ID (from UI or after creation)idp-cli process --stack-name IDP --test-set fcc-example-test --monitor
# Or using S3 URI to process input files directlyidp-cli run-inference --stack-name IDP --s3-uri s3://test-set-bucket/fcc-example-test/input/
# Or using manifest if generatedidp-cli run-inference --stack-name IDP --manifest test-manifest.csvvalidate-manifest
Section titled “validate-manifest”Validate a manifest file without processing.
Usage:
idp-cli validate-manifest [OPTIONS]Options:
--manifest(required): Path to manifest file to validate (CSV or JSON)
Examples:
# Validate a CSV manifestidp-cli validate-manifest --manifest documents.csv
# Validate a JSON manifestidp-cli validate-manifest --manifest documents.jsonlist-batches
Section titled “list-batches”List recent batch processing jobs.
Usage:
idp-cli list-batches [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--limit: Maximum number of batches to list (default: 10)--region: AWS region (optional)
Examples:
# List last 10 batches (default)idp-cli list-batches --stack-name my-stack
# List last 5 batchesidp-cli list-batches --stack-name my-stack --limit 5
# List with specific regionidp-cli list-batches --stack-name my-stack --limit 20 --region us-west-2Complete Evaluation Workflow
Section titled “Complete Evaluation Workflow”This workflow demonstrates how to process documents, manually validate results, and then reprocess with evaluation to measure accuracy.
Step 1: Deploy Your Stack
Section titled “Step 1: Deploy Your Stack”Deploy an IDP stack if you haven’t already:
idp-cli deploy \ --stack-name eval-testing \ --admin-email your.email@example.com \ --max-concurrent 50 \ --waitWhat happens: CloudFormation creates ~120 resources including S3 buckets, Lambda functions, Step Functions, and DynamoDB tables. This takes 10-15 minutes.
Step 2: Initial Processing from Local Directory
Section titled “Step 2: Initial Processing from Local Directory”Process your test documents to generate initial extraction results:
# Prepare test documentsmkdir -p ~/test-documentscp /path/to/your/invoice.pdf ~/test-documents/cp /path/to/your/w2.pdf ~/test-documents/cp /path/to/your/paystub.pdf ~/test-documents/
# Process documentsidp-cli run-inference \ --stack-name eval-testing \ --dir ~/test-documents/ \ --batch-id initial-run \ --monitorWhat happens: Documents are uploaded to S3, processed through OCR, classification, extraction, assessment, and summarization. Results are stored in OutputBucket.
Monitor output:
✓ Uploaded 3 documents to InputBucket✓ Sent 3 messages to processing queue
Monitoring Batch: initial-run━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Status Summary ───────────────────────────────────── ✓ Completed 3 100% ⏸ Queued 0 0% ✗ Failed 0 0%Step 3: Download Extraction Results
Section titled “Step 3: Download Extraction Results”Download the extraction results (sections) for manual review:
idp-cli download-results \ --stack-name eval-testing \ --batch-id initial-run \ --output-dir ~/initial-results/ \ --file-types sectionsResult structure:
~/initial-results/initial-run/├── invoice.pdf/│ └── sections/│ └── 1/│ └── result.json # Extracted data to validate├── w2.pdf/│ └── sections/│ └── 1/│ └── result.json└── paystub.pdf/ └── sections/ └── 1/ └── result.jsonStep 4: Manual Validation & Baseline Preparation
Section titled “Step 4: Manual Validation & Baseline Preparation”Review and correct the extraction results to create validated baselines.
4.1 Review extraction results:
# View extracted data for invoicecat ~/initial-results/initial-run/invoice.pdf/sections/1/result.json | jq .
# Example output:{ "attributes": { "Invoice Number": "INV-2024-001", "Invoice Date": "2024-01-15", "Total Amount": "$1,250.00", "Vendor Name": "Acme Corp" }}4.2 Validate and correct:
Compare extracted values against the actual documents. If you find errors, create corrected baseline files:
# Create baseline directory structuremkdir -p ~/validated-baselines/invoice.pdf/sections/1/mkdir -p ~/validated-baselines/w2.pdf/sections/1/mkdir -p ~/validated-baselines/paystub.pdf/sections/1/
# Copy and edit result filescp ~/initial-results/initial-run/invoice.pdf/sections/1/result.json \ ~/validated-baselines/invoice.pdf/sections/1/result.json
# Edit the baseline to correct any errorsvi ~/validated-baselines/invoice.pdf/sections/1/result.json
# Repeat for other documents...Baseline directory structure:
~/validated-baselines/├── invoice.pdf/│ └── sections/│ └── 1/│ └── result.json # Corrected/validated data├── w2.pdf/│ └── sections/│ └── 1/│ └── result.json└── paystub.pdf/ └── sections/ └── 1/ └── result.jsonStep 5: Create Manifest with Baseline References
Section titled “Step 5: Create Manifest with Baseline References”Create a manifest that links each document to its validated baseline:
cat > ~/evaluation-manifest.csv << EOFdocument_path,baseline_source/home/user/test-documents/invoice.pdf,/home/user/validated-baselines/invoice.pdf//home/user/test-documents/w2.pdf,/home/user/validated-baselines/w2.pdf//home/user/test-documents/paystub.pdf,/home/user/validated-baselines/paystub.pdf/EOFManifest format:
document_path: Path to original documentbaseline_source: Path to directory containing validated sections
Alternative using auto-matching:
# Generate manifest with automatic baseline matchingidp-cli generate-manifest \ --dir ~/test-documents/ \ --baseline-dir ~/validated-baselines/ \ --output ~/evaluation-manifest.csvStep 6: Process with Evaluation Enabled
Section titled “Step 6: Process with Evaluation Enabled”Reprocess documents with the baseline-enabled manifest. The accelerator will automatically run evaluation:
idp-cli run-inference \ --stack-name eval-testing \ --manifest ~/evaluation-manifest.csv \ --batch-id eval-run-001 \ --monitorWhat happens:
- Documents are processed through the pipeline as before
- Evaluation step is automatically triggered because baselines are provided
- The evaluation module compares extracted values against baseline values
- Detailed metrics are calculated per attribute and per document
Processing time: Similar to initial run, plus ~5-10 seconds per document for evaluation.
Step 7: Download and Review Evaluation Results
Section titled “Step 7: Download and Review Evaluation Results”Download the evaluation results to analyze accuracy:
✓ Synchronous Evaluation: Evaluation runs as the final step in the workflow before completion. When a document shows status “COMPLETE”, all processing including evaluation is finished - results are immediately available for download.
# Download evaluation results (no waiting needed)idp-cli download-results \ --stack-name eval-testing \ --batch-id eval-run-001 \ --output-dir ~/eval-results/ \ --file-types evaluation
# Verify evaluation data is presentls -la ~/eval-results/eval-run-001/invoice.pdf/evaluation/# Should show: report.json and report.mdReview evaluation report:
# View detailed evaluation metricscat ~/eval-results/eval-run-001/invoice.pdf/evaluation/report.json | jq .
**View human-readable report:**
```bash# Markdown report with visual formattingcat ~/eval-results/eval-run-001/invoice.pdf/evaluation/report.md
---
## Evaluation Analytics
The IDP Accelerator provides multiple ways to analyze evaluation results across batches and at scale.
### Query Aggregated Results with Athena
The accelerator automatically stores evaluation metrics in Athena tables for SQL-based analysis.
**Available Tables:**- `evaluation_results` - Per-document evaluation metrics- `evaluation_attributes` - Per-attribute scores- `evaluation_summary` - Aggregated statistics
**Example Queries:**
```sql-- Overall accuracy across all batchesSELECT AVG(overall_accuracy) as avg_accuracy, COUNT(*) as total_documents, SUM(CASE WHEN overall_accuracy >= 0.95 THEN 1 ELSE 0 END) as high_accuracy_countFROM evaluation_resultsWHERE batch_id LIKE 'eval-run-%';
-- Attribute-level accuracySELECT attribute_name, AVG(score) as avg_score, COUNT(*) as total_occurrences, SUM(CASE WHEN match = true THEN 1 ELSE 0 END) as correct_countFROM evaluation_attributesGROUP BY attribute_nameORDER BY avg_score DESC;
-- Compare accuracy across different configurationsSELECT batch_id, AVG(overall_accuracy) as accuracy, COUNT(*) as doc_countFROM evaluation_resultsWHERE batch_id IN ('config-v1', 'config-v2', 'config-v3')GROUP BY batch_id;Access Athena:
# Get Athena database name from stack outputsaws cloudformation describe-stacks \ --stack-name eval-testing \ --query 'Stacks[0].Outputs[?OutputKey==`ReportingDatabase`].OutputValue' \ --output text
# Query via AWS Console or CLIaws athena start-query-execution \ --query-string "SELECT * FROM evaluation_results LIMIT 10" \ --result-configuration OutputLocation=s3://your-results-bucket/For detailed Athena table schemas and query examples, see:
../docs/reporting-database.md- Complete Athena table reference../docs/evaluation.md- Evaluation methodology and metrics
Use Agent Analytics in the Web UI
Section titled “Use Agent Analytics in the Web UI”The IDP web UI provides an Agent Analytics feature for visual analysis of evaluation results.
Access the UI:
- Get web UI URL from stack outputs:
aws cloudformation describe-stacks \ --stack-name eval-testing \ --query 'Stacks[0].Outputs[?OutputKey==`ApplicationWebURL`].OutputValue' \ --output text-
Login with admin credentials (from deployment email)
-
Navigate to Analytics → Agent Analytics
Available Analytics:
- Accuracy Trends - Track accuracy over time across batches
- Attribute Heatmaps - Visualize which attributes perform best/worst
- Batch Comparisons - Compare different configurations side-by-side
- Error Analysis - Identify common error patterns
- Confidence Correlation - Analyze relationship between assessment confidence and accuracy
Key Features:
- Interactive charts and visualizations
- Filter by batch, date range, document type, or attribute
- Export results to CSV for further analysis
- Drill-down to individual document details
For complete Agent Analytics documentation, see:
../docs/agent-analysis.md- Agent Analytics user guide
Manifest Format Reference
Section titled “Manifest Format Reference”CSV Format
Section titled “CSV Format”Required Field:
document_path: Local file path or full S3 URI (s3://bucket/key)
Optional Field:
baseline_source: Path or S3 URI to validated baseline for evaluation
Note: Document IDs are auto-generated from filenames (e.g., invoice.pdf → invoice)
Examples:
document_path/home/user/docs/invoice.pdf/home/user/docs/w2.pdfs3://external-bucket/statement.pdfdocument_path,baseline_source/local/invoice.pdf,s3://baselines/invoice//local/w2.pdf,/local/validated-baselines/w2/s3://docs/statement.pdf,s3://baselines/statement/JSON Format
Section titled “JSON Format”[ { "document_path": "/local/invoice.pdf", "baseline_source": "s3://baselines/invoice/" }, { "document_path": "s3://bucket/w2.pdf", "baseline_source": "/local/baselines/w2/" }]Path Rules
Section titled “Path Rules”Document Type (Auto-detected):
s3://...→ S3 file (copied to InputBucket)- Absolute/relative path → Local file (uploaded to InputBucket)
Document ID (Auto-generated):
- From filename without extension
- Example:
invoice-2024.pdf→invoice-2024 - Subdirectories preserved:
W2s/john.pdf→W2s/john
Important:
- ⚠️ Duplicate filenames not allowed
- ✅ Use directory structure for organization (e.g.,
clientA/invoice.pdf,clientB/invoice.pdf) - ✅ S3 URIs can reference any bucket (automatically copied)
Advanced Usage
Section titled “Advanced Usage”Iterative Configuration Testing
Section titled “Iterative Configuration Testing”Test different extraction prompts or configurations:
# Test with configuration v1idp-cli deploy --stack-name my-stack --custom-config ./config-v1.yaml --waitidp-cli run-inference --stack-name my-stack --dir ./test-set/ --batch-id config-v1 --monitor
# Download and analyze resultsidp-cli download-results --stack-name my-stack --batch-id config-v1 --output-dir ./results-v1/
# Test with configuration v2idp-cli deploy --stack-name my-stack --custom-config ./config-v2.yaml --waitidp-cli run-inference --stack-name my-stack --dir ./test-set/ --batch-id config-v2 --monitor
# Compare in Athena# SELECT batch_id, AVG(overall_accuracy) FROM evaluation_results# WHERE batch_id IN ('config-v1', 'config-v2') GROUP BY batch_id;Large-Scale Batch Processing
Section titled “Large-Scale Batch Processing”Process thousands of documents efficiently:
# Generate manifest for large datasetidp-cli generate-manifest \ --dir ./production-documents/ \ --output large-batch-manifest.csv
# Validate before processingidp-cli validate-manifest --manifest large-batch-manifest.csv
# Process in background (no --monitor flag)idp-cli run-inference \ --stack-name production-stack \ --manifest large-batch-manifest.csv \ --batch-id production-batch-001
# Check status lateridp-cli status \ --stack-name production-stack \ --batch-id production-batch-001CI/CD Integration
Section titled “CI/CD Integration”Integrate into automated pipelines:
#!/bin/bash# ci-test.sh - Automated accuracy testing
# Run processing with evaluationidp-cli run-inference \ --stack-name ci-stack \ --manifest test-suite-with-baselines.csv \ --batch-id ci-test-$BUILD_ID \ --monitor
# Download evaluation resultsidp-cli download-results \ --stack-name ci-stack \ --batch-id ci-test-$BUILD_ID \ --output-dir ./ci-results/ \ --file-types evaluation
# Parse results and fail if accuracy below thresholdpython check_accuracy.py ./ci-results/ --min-accuracy 0.90
# Exit code 0 if passed, 1 if failedexit $?stop-workflows
Section titled “stop-workflows”Stop all running workflows for a stack. Useful for halting processing during development or when issues are detected.
Usage:
idp-cli stop-workflows [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--skip-purge: Skip purging the SQS queue--skip-stop: Skip stopping Step Function executions--region: AWS region (optional)
Examples:
# Stop all workflows (purge queue + stop executions)idp-cli stop-workflows --stack-name my-stack
# Only purge the queue (don't stop running executions)idp-cli stop-workflows --stack-name my-stack --skip-stop
# Only stop executions (don't purge queue)idp-cli stop-workflows --stack-name my-stack --skip-purgeload-test
Section titled “load-test”Run load tests by copying files to the input bucket at specified rates.
Usage:
idp-cli load-test [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--source-file(required): Source file to copy (local path or s3://bucket/key)--rate: Files per minute (default: 100)--duration: Duration in minutes (default: 1)--schedule: CSV schedule file (minute,count) - overrides —rate and —duration--dest-prefix: Destination prefix in input bucket (default: load-test)--config-version: Configuration version to use for processing (default: active version)--region: AWS region (optional)
Examples:
# Constant rate: 100 files/minute for 5 minutesidp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --rate 100 --duration 5
# High volume: 2500 files/minute for 1 minuteidp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --rate 2500
# Use schedule file for variable ratesidp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --schedule schedule.csv
# Use S3 source fileidp-cli load-test --stack-name my-stack --source-file s3://my-bucket/test.pdf --rate 500
# Load test with a specific config versionidp-cli load-test --stack-name my-stack --source-file samples/invoice.pdf --rate 100 --config-version v2Schedule File Format (CSV):
minute,count1,1002,2003,5004,10005,500See lib/idp_cli_pkg/examples/load-test-schedule.csv for a sample schedule file.
remove-deleted-stack-resources
Section titled “remove-deleted-stack-resources”Remove residual AWS resources left behind from deleted IDP CloudFormation stacks.
⚠️ CAUTION: This command permanently deletes AWS resources. Always run with --dry-run first.
Intended Use: This command is designed for development and test accounts where IDP stacks are frequently created and deleted, and where the consequences of accidentally deleting resources or data are low. Do not use this command in production accounts where data retention is critical. For production cleanup, manually review and delete resources through the AWS Console.
Usage:
idp-cli remove-deleted-stack-resources [OPTIONS]How It Works:
This command safely identifies and removes ONLY resources belonging to IDP stacks that have been deleted:
- Multi-region Stack Discovery - Scans CloudFormation in multiple regions (us-east-1, us-west-2, eu-central-1 by default)
- IDP Stack Identification - Identifies IDP stacks by their Description (“AWS GenAI IDP Accelerator”) or naming patterns (IDP-*, PATTERN1/2/3)
- Active Stack Protection - Tracks both ACTIVE and DELETED stacks; resources from active stacks are NEVER touched
- Safe Cleanup - Only targets resources belonging to stacks in DELETE_COMPLETE state
Safety Features:
- Resources from ACTIVE stacks are protected and skipped
- Resources from UNKNOWN stacks (not verified as IDP) are skipped
- Interactive confirmation for each resource (unless —yes)
- Options: y=yes, n=no, a=yes to all of type, s=skip all of type
- —dry-run mode shows exactly what would be deleted
Resources Cleaned:
- CloudFront distributions and response header policies
- CloudWatch log groups
- AppSync APIs
- IAM policies
- CloudWatch Logs resource policy entries
- S3 buckets (automatically emptied before deletion)
- DynamoDB tables (PITR disabled before deletion)
Note: This command targets resources that remain in AWS after IDP stacks have already been deleted. These are typically resources with RetainOnDelete policies or non-empty S3 buckets that CloudFormation couldn’t delete. All resources are identified by their naming pattern and verified against the deleted stack registry before deletion.
Options:
--region: Primary AWS region for regional resources (default: us-west-2)--profile: AWS profile to use--dry-run: Preview changes without making them (RECOMMENDED first step)--yes,-y: Auto-approve all deletions (skip confirmations)--check-stack-regions: Comma-separated regions to check for stacks (default: us-east-1,us-west-2,eu-central-1)
Examples:
# RECOMMENDED: Always dry-run first to see what would be deletedidp-cli remove-deleted-stack-resources --dry-run
# Interactive cleanup with confirmations for each resourceidp-cli remove-deleted-stack-resources
# Use specific AWS profileidp-cli remove-deleted-stack-resources --profile my-profile
# Auto-approve all deletions (USE WITH CAUTION)idp-cli remove-deleted-stack-resources --yes
# Check additional regions for stacksidp-cli remove-deleted-stack-resources --check-stack-regions us-east-1,us-west-2,eu-central-1,eu-west-1CloudFront Two-Phase Cleanup:
CloudFront requires distributions to be disabled before deletion:
- First run: Disables orphaned distributions (you confirm each)
- Wait 15-20 minutes for CloudFront global propagation
- Second run: Deletes the previously disabled distributions
Interactive Confirmation:
Delete orphaned CloudFront distribution? Resource: E1H6W47Z36CQE2 (exists in AWS) Originally from stack: IDP-P2-DevTest1 Stack status: DELETE_COMPLETE (stack no longer exists) Stack was in region: us-west-2
Options: y=yes, n=no, a=yes to all CloudFront distribution, s=skip all CloudFront distributionDelete? [y/n/a/s]:Important Limitation - 90-Day Window:
CloudFormation only retains deleted stack information for approximately 90 days. After this period, stacks in DELETE_COMPLETE status are removed from the CloudFormation API.
This means:
- Resources from stacks deleted within the past 90 days → Identified and offered for cleanup
- Resources from stacks deleted more than 90 days ago → Not identified (silently skipped)
Best Practice: Run remove-deleted-stack-resources promptly after deleting IDP stacks to ensure complete cleanup. For maximum effectiveness, run this command within 90 days of stack deletion.
config-create
Section titled “config-create”Generate an IDP configuration template from system defaults.
Usage:
idp-cli config-create [OPTIONS]Options:
--features: Feature set (default:min)min: classification, extraction, classes only (simplest)core: min + ocr, assessmentall: all sections with full defaults- Or comma-separated list:
"classification,extraction,summarization"
--output,-o: Output file path (default: stdout)--include-prompts: Include full prompt templates (default: stripped for readability)--no-comments: Omit explanatory header comments
Examples:
# Generate minimal config to stdoutidp-cli config-create
# Generate full config with all sectionsidp-cli config-create --features all --output full-config.yaml
# Custom section selectionidp-cli config-create --features "classification,extraction,summarization" --output config.yamlconfig-validate
Section titled “config-validate”Validate a configuration file against system defaults and Pydantic models.
Usage:
idp-cli config-validate [OPTIONS]Options:
--config-file,-f(required): Path to configuration file to validate--show-merged: Show the full merged configuration--strict: Fail validation if config contains unknown or deprecated fields
Examples:
# Validate a config fileidp-cli config-validate --config-file ./my-config.yaml
# Show full merged configidp-cli config-validate --config-file ./config.yaml --show-merged
# Strict mode (fails if config has unknown or deprecated fields — useful for CI/CD)idp-cli config-validate --config-file ./config.yaml --strictconfig-download
Section titled “config-download”Download configuration from a deployed IDP stack.
Usage:
idp-cli config-download [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--output,-o: Output file path (default: stdout)--format: Output format -full(default) orminimal(only differences from defaults)--config-version: Configuration version to download (e.g., v1, v2). If not specified, downloads active version--region: AWS region (optional)
Examples:
# Download full config from active versionidp-cli config-download --stack-name my-stack --output config.yaml
# Download specific versionidp-cli config-download --stack-name my-stack --config-version v2 --output config.yaml
# Download minimal config (only customizations)idp-cli config-download --stack-name my-stack --format minimal --output config.yaml
# Print to stdoutidp-cli config-download --stack-name my-stackconfig-upload
Section titled “config-upload”Upload a configuration file to a deployed IDP stack.
Usage:
idp-cli config-upload [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--config-file,-f(required): Path to configuration file (YAML or JSON)--validate/--no-validate: Validate config before uploading (default: validate)--config-version(required): Configuration version to update (e.g.,default,v1,v2). If the version doesn’t exist, it will be created automatically.--version-description: Description for the configuration version (used when creating new versions)--region: AWS region (optional)
Examples:
# Upload config to active versionidp-cli config-upload --stack-name my-stack --config-file ./config.yaml --config-version default
# Update existing versionidp-cli config-upload --stack-name my-stack --config-file ./config.yaml --config-version Production
# Create new version with descriptionidp-cli config-upload --stack-name my-stack --config-file ./config.yaml --config-version NewVersion --version-description "Test configuration for new feature"
# Skip validation (use with caution)idp-cli config-upload --stack-name my-stack --config-file ./config.yaml --no-validateconfig-list
Section titled “config-list”List all configuration versions in a deployed IDP stack.
Usage:
idp-cli config-list [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--region: AWS region (optional)
Examples:
# List all configuration versionsidp-cli config-list --stack-name my-stackOutput: Shows a table with version names, active status, creation/update timestamps, and descriptions.
config-activate
Section titled “config-activate”Activate a configuration version in a deployed IDP stack.
Automatic BDA Sync: If the configuration version has use_bda enabled, this command will automatically sync the configuration to BDA (Bedrock Data Automation) before activation. This ensures BDA blueprints are up-to-date and matches the UI behavior.
Usage:
idp-cli config-activate [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--config-version(required): Configuration version to activate--region: AWS region (optional)
Examples:
# Activate a specific versionidp-cli config-activate --stack-name my-stack --config-version v2
# Activate default versionidp-cli config-activate --stack-name my-stack --config-version defaultBehavior:
- Validates the configuration version exists
- If
use_bdais enabled in the configuration:- Syncs IDP document classes to BDA blueprints
- Creates a new BDA project if none exists
- Updates BDA sync status
- Activates the configuration version
- All new document processing will use this configuration
Note: If BDA sync fails (when use_bda is enabled), the activation will be aborted to prevent processing errors.
**Notes:**- Sets the specified version as active for all new document processing- Version must exist (use `config-list` to see available versions)
---
### `config-delete`
Delete a configuration version from a deployed IDP stack.
**Usage:**```bashidp-cli config-delete [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--config-version(required): Configuration version to delete--force: Skip confirmation prompt--region: AWS region (optional)
Examples:
# Delete a version with confirmationidp-cli config-delete --stack-name my-stack --config-version old-version
# Delete without confirmation promptidp-cli config-delete --stack-name my-stack --config-version old-version --forceRestrictions:
- Cannot delete the ‘default’ configuration version
- Cannot delete currently active versions (activate another version first)
- Includes confirmation prompt unless
--forceis used
What Happens:
- Loads and parses your YAML or JSON config file
- Validates against system defaults (unless
--no-validate) - If version exists: Updates the existing version with the uploaded configuration (saved as a complete snapshot)
- If version doesn’t exist: Creates a new version with the uploaded configuration
- Uploads to the stack’s ConfigurationTable in DynamoDB
- Configuration is immediately available for document processing
Configuration Versioning:
- Existing version: Saves the uploaded configuration as the full version snapshot
- New version: Creates a new independent version with the uploaded configuration
- Version descriptions: Can be added to new versions for better organization
For full details on configuration versioning, see configuration-versions.md.
This uses the same mechanism as the Web UI configuration management system.
discover
Section titled “discover”Discover document class schemas from sample documents using Amazon Bedrock.
Two modes:
- Stack-connected (
--stack-name): Uses stack’s discovery config and saves schema to DynamoDB configuration - Local (no
--stack-name): Uses system default Bedrock settings, prints schema to stdout without saving
Ground truth matching: Ground truth files (-g) are auto-matched to documents (-d) by filename stem. For example, invoice.pdf matches invoice.json. Unmatched documents run without ground truth.
Output behavior:
- Single document:
-owrites the schema to the specified file - Batch +
-ois a directory (or has no extension): writes one{class_name}.jsonper schema - Batch +
-ois a file: writes all schemas as a JSON array
# Single document (local mode — no stack needed)idp-cli discover -d ./invoice.pdf
# With ground truth (matched by filename stem)idp-cli discover -d ./invoice.pdf -g ./invoice.json
# Save schema to fileidp-cli discover -d ./form.pdf -o ./form-schema.json
# With class name hint (guides LLM to use specific class name)idp-cli discover -d ./form.pdf --class-hint "W2 Tax Form"
# Batch with auto-matched ground truthidp-cli discover -d ./invoice.pdf -d ./w2.pdf -g ./invoice.json -g ./w2.json
# Batch output to directory (one file per schema)idp-cli discover -d ./invoice.pdf -d ./w2.pdf -o ./schemas/
# Batch output to single file (JSON array)idp-cli discover -d ./invoice.pdf -d ./w2.pdf -o ./all-schemas.json
# Multi-section: discover specific page ranges from a single PDFidp-cli discover -d ./lending_package.pdf \ --page-range "1-2" --page-label "Cover Letter" \ --page-range "3-5" --page-label "W2 Form" \ --page-range "6-8" --page-label "Bank Statement" \ -o ./schemas/
# Auto-detect sections then discover eachidp-cli discover -d ./lending_package.pdf --auto-detect -o ./schemas/
# Only detect section boundaries (no discovery)idp-cli discover -d ./lending_package.pdf --auto-detect --detect-only
# Auto-detect with output to fileidp-cli discover -d ./lending_package.pdf --auto-detect --detect-only -o sections.json
# Stack mode (saves to config)idp-cli discover --stack-name my-stack -d ./invoice.pdf --config-version v2| Option | Description |
|---|---|
--stack-name | CloudFormation stack name (optional — omit for local mode) |
-d, --document | Path to document file (required, repeatable for batch) |
-g, --ground-truth | Path to JSON ground truth file(s) (repeatable, auto-matched by filename stem) |
--config-version | Config version to save to (stack mode only) |
-o, --output | Output path: file (single/JSON array) or directory (one file per schema) |
--class-hint | Hint for the document class name (e.g., “W2 Form”). The LLM will use this as $id. |
--page-range | Page range to discover (e.g., “1-3”). Repeatable for multi-section. Requires PDF. |
--page-label | Label for corresponding --page-range (e.g., “W2 Form”). Used as class name hint per range. |
--auto-detect | Auto-detect document section boundaries using AI, then discover each section. |
--detect-only | Only detect section boundaries (use with --auto-detect). Prints boundaries without running discovery. |
--region | AWS region |
discover-multidoc
Section titled “discover-multidoc”Discover document classes from a collection of documents using embedding-based clustering and agentic analysis.
Unlike discover (which analyzes one document at a time), discover-multidoc analyzes a directory of mixed documents to automatically identify document types, cluster similar documents, and generate JSON Schemas for each discovered class.
Requires: pip install idp-common[multi_document_discovery] (scikit-learn, scipy, numpy, strands-agents)
Note: Requires at least 2 documents per expected class. Clusters with fewer than 2 documents are filtered as noise. For discovering schemas from individual documents, use discover instead.
Usage:
idp-cli discover-multidoc [OPTIONS]Options:
| Option | Description |
|---|---|
--dir | Directory containing documents to analyze (recursive scan) |
-d, --document | Individual document files (repeatable: -d doc1.pdf -d doc2.png) |
--embedding-model | Bedrock embedding model ID (default: us.cohere.embed-v4:0) |
--analysis-model | Bedrock LLM for cluster analysis (default: us.anthropic.claude-sonnet-4-6) |
-o, --output | Output directory for discovered JSON schemas |
--stack-name | CloudFormation stack name (required for --save-to-config) |
--config-version | Configuration version to save schemas to |
--save-to-config | Save discovered schemas to the stack’s configuration |
--region | AWS region |
Examples:
# Discover from a directory of documentsidp-cli discover-multidoc --dir ./samples/
# Discover with explicit filesidp-cli discover-multidoc -d doc1.pdf -d doc2.png -d doc3.jpg
# Save schemas to output directoryidp-cli discover-multidoc --dir ./samples/ -o ./schemas/
# Save to stack configurationidp-cli discover-multidoc --dir ./samples/ --save-to-config \ --stack-name IDP --config-version v2
# Use custom modelsidp-cli discover-multidoc --dir ./samples/ \ --embedding-model us.amazon.titan-embed-image-v1 \ --analysis-model us.anthropic.claude-sonnet-4-6Pipeline stages (shown in Rich progress output):
- Document scan — Finds PDF, PNG, JPG, TIFF files in the directory
- Embedding — Generates image embeddings via Bedrock (Cohere Embed v4)
- Clustering — KMeans + silhouette analysis to find optimal number of clusters
- Analysis — Strands agent analyzes each cluster to identify the document class and generate a JSON Schema
- Reflection — Agent generates a summary report of all discovered classes
Output: Results table showing cluster ID, classification, document count, field count, and status. Optionally writes individual JSON schema files and a reflection report.
config-sync-bda
Section titled “config-sync-bda”Synchronize IDP document class schemas with BDA (Bedrock Data Automation) blueprints.
Usage:
idp-cli config-sync-bda [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--direction: Sync direction —bidirectional(default),bda-to-idp, oridp-to-bda--mode: Sync mode —replace(default, full alignment) ormerge(additive, don’t delete)--config-version: Configuration version to sync (default: active version)--region: AWS region (optional)
Examples:
# Bidirectional sync (default)idp-cli config-sync-bda --stack-name my-stack
# Import BDA blueprints into IDP configidp-cli config-sync-bda --stack-name my-stack --direction bda-to-idp
# Push IDP classes to BDA blueprintsidp-cli config-sync-bda --stack-name my-stack --direction idp-to-bda
# Merge mode (additive — don't remove existing items)idp-cli config-sync-bda --stack-name my-stack --direction bda-to-idp --mode merge
# Sync specific config versionidp-cli config-sync-bda --stack-name my-stack --config-version v2Interactive Agent Companion Chat from the terminal. Provides access to the full multi-agent orchestrator including Analytics, Error Analyzer, Code Intelligence, and any configured External MCP Agents.
The chat command runs the same orchestrator as the Web UI’s Agent Companion Chat, but locally in your terminal — with real-time streaming and multi-turn conversation support.
Usage:
idp-cli chat [OPTIONS]Options:
--stack-name(required): CloudFormation stack name--region: AWS region (optional)--prompt: Single-shot prompt — sends one message, prints the response, and exits. Useful for scripts and CI/CD.--enable-code-intelligence: Enable the Code Intelligence Agent (disabled by default because it uses external third-party services)
Examples:
# Interactive mode — multi-turn conversationidp-cli chat --stack-name my-stack
# Single-shot mode — for scripts and automationidp-cli chat --stack-name my-stack --prompt "What is the avg accuracy for the last test run?"
# With Code Intelligence enabledidp-cli chat --stack-name my-stack --enable-code-intelligence
# Pipe output in scriptsidp-cli chat --stack-name my-stack --prompt "How many documents failed today?" 2>/dev/nullInteractive session example:
IDP Agent ChatStack: my-stack
✓ Ready Agents: Analytics Agent · Error Analyzer Agent · Code Intelligence AgentType /quit to exit
You: What is the avg accuracy for test run Fake-W2-Tax-Forms-20260320?⟶ Analytics AgentThe average accuracy for test run Fake-W2-Tax-Forms-20260320 is 0.867 (86.7%) across 95 documents.
You: Break that down by document type⟶ Analytics Agent...
You: /quitGoodbye.SDK usage:
from idp_sdk import IDPClient
client = IDPClient(stack_name="my-stack")
# Single messageresp = client.chat.send_message("How many documents were processed today?")print(resp.response)
# Multi-turn conversationresp2 = client.chat.send_message("Break down by type", session_id=resp.session_id)print(resp2.response)Prerequisites:
- Requires
idp_common[agents]to be installed:pip install -e 'lib/idp_common_pkg[agents]' - Requires Amazon Bedrock model access (Claude or Nova models)
- Stack must be deployed with Agent Companion Chat resources (DynamoDB tables, Athena database)
Troubleshooting
Section titled “Troubleshooting”Stack Not Found
Section titled “Stack Not Found”Error: Stack 'my-stack' is not in a valid state
Solution:
# Verify stack existsaws cloudformation describe-stacks --stack-name my-stackPermission Denied
Section titled “Permission Denied”Error: Access Denied when uploading files
Solution: Ensure AWS credentials have permissions for:
- S3: PutObject, GetObject on InputBucket/OutputBucket
- SQS: SendMessage on DocumentQueue
- Lambda: InvokeFunction on LookupFunction
- CloudFormation: DescribeStacks, ListStackResources
Manifest Validation Failed
Section titled “Manifest Validation Failed”Error: Duplicate filenames found
Solution: Ensure unique filenames or use directory structure:
document_path./clientA/invoice.pdf./clientB/invoice.pdfEvaluation Not Running
Section titled “Evaluation Not Running”Issue: Evaluation results missing even with baselines
Checklist:
- Verify
baseline_sourcecolumn exists in manifest - Confirm baseline paths are correct and accessible
- Check baseline directory has correct structure (
sections/1/result.json) - Review CloudWatch logs for EvaluationFunction
Monitoring Shows “UNKNOWN” Status
Section titled “Monitoring Shows “UNKNOWN” Status”Issue: Cannot retrieve document status
Solution:
# Verify LookupFunction existsaws lambda get-function --function-name <LookupFunctionName>
# Check CloudWatch logsaws logs tail /aws/lambda/<LookupFunctionName> --followTesting
Section titled “Testing”Run the test suite:
cd lib/idp_cli_pkgpytestRun specific tests:
pytest tests/test_manifest_parser.py -vSupport
Section titled “Support”For issues or questions:
- Check CloudWatch logs for Lambda functions
- Review AWS Console for resource status
- Open an issue on GitHub