InsuranceLake Quickstart Guide

If you’d like to get started quickly transforming some sample raw insurance data, running SQL on the resulting dataset, and without worrying about CI/CD, follow the steps in this section.

Contents

Python/CDK Basics

  1. Open the AWS Console in the us-east-2 (Ohio) Region.

    InsuranceLake uses us-east-2 by default. To change the Region, refer to the Quickstart with CI/CD.

  2. Select AWS CloudShell at the bottom of the page and wait for a few seconds until it is available for use.
  3. Ensure you are using the latest version of the AWS SDK for Node.js and AWS CDK.
     sudo npm install -g aws-lib aws-cdk
    
  4. Clone the repositories.
     git clone https://github.com/aws-solutions-library-samples/aws-insurancelake-infrastructure.git
     git clone https://github.com/aws-solutions-library-samples/aws-insurancelake-etl.git
    
  5. Use a terminal or command prompt and change the working directory to the location of the infrastructure code.
     cd aws-insurancelake-infrastructure
    
  6. Create a Python virtual environment.

    In CloudShell your home directory is limited to 1 GB of persistent storage. To ensure we have enough storage to download and install the required Python packages, you will use CloudShell’s temporary storage, located in /tmp, which has a larger capacity.

     python3 -m venv /tmp/.venv
    
  7. Activate the virtual environment.
     source /tmp/.venv/bin/activate
    
  8. Install required Python libraries.

    You may see a warning stating that a newer version is available; it is safe to ignore this for the Quickstart.

     pip install -r requirements.txt
    
  9. Bootstrap CDK in your AWS account.
     cdk bootstrap
    

Deploy the Application

  1. Confirm you are still in the aws-insurancelake-infrastructure directory.
  2. Deploy infrastructure resources in the development environment (one stack).
     cdk deploy Dev-InsuranceLakeInfrastructurePipeline/Dev/InsuranceLakeInfrastructureS3BucketZones
    
  3. Review and accept AWS Identity and Access Management (IAM) credential creation for the S3 bucket stack.
    • Wait for deployment to finish (approximately 5 minutes).
  4. Copy the S3 bucket name for the Collect bucket to use later.
    • Bucket name will be in the form: dev-insurancelake-<AWS Account ID>-<Region>-collect.
  5. Switch the working directory to the location of the etl code.
     cd ../aws-insurancelake-etl
    
  6. Deploy the ETL resources in the development environment (four stacks).
     cdk deploy Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlDynamoDb Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlGlue Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlStepFunctions Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlAthenaHelper
    
    • Wait for approximately 1 minute for DynamoDB deployment to finish.
  7. Review and accept IAM credential creation for the AWS Glue jobs stack.
    • Wait approximately 3 minutes for deployment to finish.
  8. Review and accept IAM credential creation for the Step Functions stack.
    • Wait approximately 7 minutes for deployment of Step Functions and Athena Helper stacks to finish.

Try out the ETL Process

  1. Populate the DynamoDB lookup table with sample lookup data.
     resources/load_dynamodb_lookup_table.py SyntheticGeneralData dev-insurancelake-etl-value-lookup resources/syntheticgeneral_lookup_data.json
    
  2. Transfer the sample claim data to the Collect bucket.
     aws s3 cp resources/syntheticgeneral-claim-data.csv s3://<Collect S3 bucket>/SyntheticGeneralData/ClaimData/
    
  3. Transfer the sample policy data to the Collect bucket.
     aws s3 cp resources/syntheticgeneral-policy-data.csv s3://<Collect S3 bucket>/SyntheticGeneralData/PolicyData/
    
  4. Open Step Functions in the AWS Console and select dev-insurancelake-etl-state-machine. Step Functions Selecting State Machine
  5. Open the state machine execution in progress and monitor the status until complete. Step Functions Selecting Running Execution
  6. Open Athena in the AWS Console.
  7. Select Launch Query Editor, and change the Workgroup to insurancelake.
  8. Run the following query to view a sample of prepared data in the consume bucket:
     select * from syntheticgeneraldata_consume.policydata limit 100
    

Next Steps


Back to top

Copyright Amazon.com and its affiliates; all rights reserved. This file is Amazon Web Services Content and may not be duplicated or distributed without permission.

Page last modified: Oct 15 2024.