InsuranceLake Deployment Validation

  1. Transfer the sample claim data to the Collect bucket (Source system: SyntheticData, Table: ClaimData).
    aws s3 cp resources/syntheticgeneral-claim-data.csv s3://<Collect S3 bucket>/SyntheticGeneralData/ClaimData/
    
  2. Transfer the sample policy data to the Collect bucket (Source system: SyntheticData, Table: PolicyData).
    aws s3 cp resources/syntheticgeneral-policy-data.csv s3://<Collect S3 bucket>/SyntheticGeneralData/PolicyData/
    
  3. Upon successful transfer of the file, an event notification from S3 will trigger the state-machine-trigger Lambda function.

  4. The Lambda function will insert a record into the DynamoDB table {environment}-{resource_name_prefix}-etl-job-audit to track job start status.

  5. The Lambda function will also trigger the Step Functions State Machine. The State Machine execution name will be <filename>-<YYYYMMDDHHMMSSxxxxxx> and have the required metadata as input parameters.

  6. The State Machine will trigger the AWS Glue job for Collect to Cleanse data processing.

  7. The Collect to Cleanse AWS Glue job will execute the transformation logic defined in configuration files.

  8. The AWS Glue job will load the data into the Cleanse bucket using the provided metadata. The data will be stored in S3 as s3://{environment}-{resource_name_prefix}-{account}-{region}-cleanse/syntheticgeneraldata/claimdata/year=YYYY/month=MM/day=DD in Apache Parquet format.

  9. The AWS Glue job will create or update the AWS Glue Catalog table using the table name passed as a parameter based on the folder name (PolicyData and ClaimData).

  10. After the Collect to Cleanse AWS Glue job completes, the State Machine will trigger the Cleanse to Consume AWS Glue job.

  11. The Cleanse to Consume AWS Glue job will execute the SQL logic defined in configuration files.

  12. The Cleanse to Consume AWS Glue job will store the resulting data set in S3 as s3://{environment}-{resource_name_prefix}-{account}-{region}-consume/syntheticgeneraldata/claimdata/year=YYYY/month=MM/day=DD in Apache Parquet format.

  13. The Cleanse to Consume AWS Glue job will create or update the AWS Glue Catalog table.

  14. After successful completion of the Cleanse to Consume AWS Glue job, the State Machine will trigger the etl-job-auditor Lambda function to update the DynamoDB table {environment}-{resource_name_prefix}-etl-job-audit with the latest status.

  15. An Amazon Simple Notification Service (Amazon SNS) notification will be sent to all subscribed users.

  16. To validate the data load, use Athena and execute the following query:

     select * from syntheticgeneraldata_consume.policydata limit 100
    

Back to top

Copyright Amazon.com and its affiliates; all rights reserved. This file is Amazon Web Services Content and may not be duplicated or distributed without permission.

Page last modified: Sep 26 2024.