Enhanced Evaluation Reporting (sticker-eval v0.1.4+)
Enhanced Evaluation Reporting (sticker-eval v0.1.4+)
Section titled “Enhanced Evaluation Reporting (sticker-eval v0.1.4+)”This document describes the enhanced evaluation reporting features available in IDP v0.4.9+ using sticker-eval v0.1.4.
Overview
Section titled “Overview”The evaluation module now leverages sticker-eval v0.1.4’s fine-grain field comparison feature (from GitHub Issue #48 and PR #51) to provide:
- Detailed nested object match information alongside aggregate scores
- Interactive controls to filter and explore evaluation results
- Field-by-field comparison details for arrays and complex objects
Key Features
Section titled “Key Features”1. Nested Field Comparison Details
Section titled “1. Nested Field Comparison Details”For complex attributes (nested objects, arrays), the evaluation now captures detailed field-by-field comparison information:
{ "name": "LineItems", "score": 0.88, // Aggregate score "matched": false, "field_comparison_details": [ { "expected_key": "LineItems[0].Description", "expected_value": "Service A", "actual_key": "LineItems[0].Description", "actual_value": "Service A", "match": true, "score": 1.0, "weighted_score": 2.0 }, { "expected_key": "LineItems[1].Description", "expected_value": "Service B", "actual_key": "LineItems[1].Description", "actual_value": "Service C", "match": false, "score": 0.75, "weighted_score": 1.5 } // ... more comparisons ]}2. Interactive Markdown Reports
Section titled “2. Interactive Markdown Reports”The markdown reports now include interactive HTML controls:
🔍 Show Only Unmatched
Section titled “🔍 Show Only Unmatched”Filter the attribute table to show only rows where matches failed, providing a compact view highlighting problematic fields.
<button onclick="toggleUnmatchedOnly()">🔍 Show Only Unmatched</button>➕➖ Expand/Collapse All Details
Section titled “➕➖ Expand/Collapse All Details”Expand or collapse all nested field comparison details at once.
<button onclick="expandAllDetails()">➕ Expand All Details</button><button onclick="collapseAllDetails()">➖ Collapse All Details</button>📋 Expandable Nested Details
Section titled “📋 Expandable Nested Details”Each attribute with nested comparisons has an expandable section:
<details> <summary>🔍 View 6 Nested Field Comparisons</summary> <!-- Detailed comparison table --></details>3. Aggregate Score Annotations
Section titled “3. Aggregate Score Annotations”Aggregate scores for complex objects are clearly marked:
- Visual indicator:
<span class="aggregate-score">0.88</span> - Text annotation:
(aggregate)appears next to the score - Color coding: Blue styling distinguishes aggregate from simple field scores
Report Structure
Section titled “Report Structure”JSON Report
Section titled “JSON Report”The JSON report (results.json) includes:
{ "document_id": "doc-123", "overall_metrics": { ... }, "section_results": [ { "section_id": "section-001", "document_class": "Invoice", "metrics": { ... }, "attributes": [ { "name": "AttributeName", "expected": "...", "actual": "...", "matched": true, "score": 0.95, "field_comparison_details": [ // NEW in v0.1.4 { /* detailed comparison */ } ] } ] } ]}Markdown Report
Section titled “Markdown Report”The markdown report (report.md) includes:
- Interactive Controls - Filter and navigation buttons
- Summary Section - High-level metrics with visual indicators
- Section Details - Per-section metrics and attributes
- Attribute Table - Enhanced with:
- Row classes for filtering (
matched-row,unmatched-row) - Aggregate score annotations
- Expandable nested details for complex fields
- Row classes for filtering (
- Evaluation Methods - Documentation of comparison methods
Usage Example
Section titled “Usage Example”from idp_common.evaluation.service import EvaluationService
# Initialize serviceeval_service = EvaluationService(region="us-east-1", config=config)
# Evaluate document (field_comparisons automatically enabled)result_doc = eval_service.evaluate_document( actual_document=actual_doc, expected_document=expected_doc, store_results=True # Generates both JSON and Markdown)
# Access detailed comparisons programmaticallyfor section in result_doc.evaluation_result.section_results: for attr in section.attributes: if attr.field_comparison_details: print(f"Attribute: {attr.name}") print(f"Aggregate Score: {attr.score}") print(f"Nested Comparisons: {len(attr.field_comparison_details)}")
for detail in attr.field_comparison_details: if not detail['match']: print(f" Mismatch: {detail['expected_key']}") print(f" Expected: {detail['expected_value']}") print(f" Actual: {detail['actual_value']}") print(f" Score: {detail['score']}")Viewing Interactive Reports
Section titled “Viewing Interactive Reports”GitHub
Section titled “GitHub”GitHub’s markdown renderer supports HTML, so the interactive controls will work when viewing the report in:
- Pull requests
- Issue comments
- Repository files
VS Code
Section titled “VS Code”Install a markdown extension that supports HTML:
- Markdown Preview Enhanced (recommended)
- Markdown All in One
Web Browser
Section titled “Web Browser”Open the .md file directly in a browser:
open test_evaluation_report.mdJupyter Notebooks
Section titled “Jupyter Notebooks”Use IPython.display.Markdown:
from IPython.display import Markdown, display
with open('evaluation/report.md', 'r') as f: display(Markdown(f.read()))Configuration
Section titled “Configuration”No additional configuration required! The enhancement automatically activates when using sticker-eval v0.1.4+.
The feature is enabled in lib/idp_common_pkg/idp_common/evaluation/service.py:
# Compare using Stickler with field_comparisons enabledstickler_result = expected_instance.compare_with( actual_instance, document_field_comparisons=True, # Enables detailed comparison)Benefits
Section titled “Benefits”1. Better Debugging
Section titled “1. Better Debugging”- Quickly identify which specific nested fields are causing mismatches
- See exact values that differ within complex objects
- Understand Hungarian matching results for arrays
2. Compact Problem View
Section titled “2. Compact Problem View”- Filter to show only unmatched rows
- Focus attention on fields requiring investigation
- Reduce cognitive load when reviewing large reports
3. Complete Context
Section titled “3. Complete Context”- Aggregate scores provide high-level overview
- Nested details provide granular diagnostics
- Both perspectives available in single report
4. Production Ready
Section titled “4. Production Ready”- JSON structure fully captures all comparison data
- Can be consumed by analytics tools
- Markdown provides human-readable interface
Technical Details
Section titled “Technical Details”Data Model Changes
Section titled “Data Model Changes”AttributeEvaluationResult now includes:
@dataclassclass AttributeEvaluationResult: # ... existing fields ... field_comparison_details: Optional[List[Dict[str, Any]]] = NoneField Comparison Structure
Section titled “Field Comparison Structure”Each comparison in field_comparison_details:
{ "expected_key": "path.to.field", # Dot/bracket notation "expected_value": "actual value", "actual_key": "path.to.field", "actual_value": "actual value", "match": true, # Boolean match result "score": 0.95, # Similarity score (0.0-1.0) "weighted_score": 1.9, # score * field_weight "reason": "explanation" # Human-readable reason}Grouping Logic
Section titled “Grouping Logic”Field comparisons are grouped by root field name:
LineItems[0].Description→ grouped underLineItemsAddress.City→ grouped underAddress- Simple fields have no grouping (single comparison or none)
Backward Compatibility
Section titled “Backward Compatibility”The enhancement is fully backward compatible:
- ✅ Existing API unchanged
- ✅ JSON reports remain consumable by old code (new field is optional)
- ✅ Markdown reports viewable in any viewer (controls degrade gracefully)
- ✅ No configuration changes required
Examples
Section titled “Examples”See test_evaluation_enhancements.py for complete working examples demonstrating:
- Nested object comparisons
- Array item comparisons
- Aggregate score calculations
- Interactive report generation
Run the test:
python test_evaluation_enhancements.pyThis generates test_evaluation_report.md demonstrating all features.
Future Enhancements
Section titled “Future Enhancements”Potential future improvements:
- Export to CSV with nested details flattened
- Comparison history tracking across runs
- Threshold recommendations based on field mismatch patterns
- Visual diff viewer for nested structures