InsuranceLake Quickstart Guide
If you’d like to get started quickly transforming some sample raw insurance data, running SQL on the resulting dataset, and without worrying about CI/CD, follow the steps in this section.
Contents
Python/CDK Basics
- Open the AWS Console in the - us-east-2 (Ohio)Region.- InsuranceLake uses - us-east-2by default. To change the Region, refer to steps 5 - 8 in the Quickstart with CI/CD.
- Select AWS CloudShellat the bottom of the page and wait for a few seconds until it is available for use.
- Ensure you are using the latest version of the AWS SDK for Node.js and AWS CDK.sudo npm install -g aws-lib aws-cdk
- Clone the repositories.git clone https://github.com/aws-solutions-library-samples/aws-insurancelake-infrastructure.git git clone https://github.com/aws-solutions-library-samples/aws-insurancelake-etl.git
- Use a terminal or command prompt and change the working directory to the location of the infrastructure code.cd aws-insurancelake-infrastructure
- Create a Python virtual environment. - In CloudShell your home directory is limited to 1 GB of persistent storage. To ensure we have enough storage to download and install the required Python packages, you will use CloudShell’s temporary storage, located in - /tmp, which has a larger capacity.- python3 -m venv /tmp/.venv
- Activate the virtual environment.source /tmp/.venv/bin/activate
- Install required Python libraries. - You may see a warning stating that a newer version is available; it is safe to ignore this for the Quickstart. - pip install -r requirements.txt
- Bootstrap CDK in your AWS account.cdk bootstrap
Deploy the Application
- Confirm you are still in the aws-insurancelake-infrastructuredirectory (for example, use thepwdcommand).
- Deploy infrastructure resources in the development environment (one stack).cdk deploy Dev-InsuranceLakeInfrastructurePipeline/Dev/InsuranceLakeInfrastructureS3BucketZones
- Review and accept AWS Identity and Access Management (IAM) credential creation for the S3 bucket stack.- Wait for deployment to finish (approximately 5 minutes).
 
- Copy the S3 bucket name for the Collect bucket to use later.- Bucket name will be in the form: dev-insurancelake-<AWS Account ID>-<Region>-collect.
 
- Bucket name will be in the form: 
- Switch the working directory to the location of the etl code.cd ../aws-insurancelake-etl
- Deploy the ETL resources in the development environment (four stacks).cdk deploy Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlDynamoDb Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlGlue Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlStepFunctions Dev-InsuranceLakeEtlPipeline/Dev/InsuranceLakeEtlAthenaHelper- Wait for approximately 1 minute for DynamoDB deployment to finish.
 
- Review and accept IAM credential creation for the AWS Glue jobs stack.- Wait approximately 3 minutes for deployment to finish.
 
- Review and accept IAM credential creation for the Step Functions stack.- Wait approximately 7 minutes for deployment of Step Functions and Athena Helper stacks to finish.
 
Try out the ETL Process
- Populate the DynamoDB lookup table with sample lookup data.resources/load_dynamodb_lookup_table.py SyntheticGeneralData dev-insurancelake-etl-value-lookup resources/syntheticgeneral_lookup_data.json
- Transfer the sample claim data to the Collect bucket.export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) aws s3 cp resources/syntheticgeneral-claim-data.csv s3://dev-insurancelake-${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}-collect/SyntheticGeneralData/ClaimData/
- Open Step Functions in the AWS Console and select dev-insurancelake-etl-state-machine. 
- Open the state machine execution in progress and monitor the status until complete.  
- Transfer the sample policy data to the Collect bucket.aws s3 cp resources/syntheticgeneral-policy-data.csv s3://dev-insurancelake-${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}-collect/SyntheticGeneralData/PolicyData/
- Use the back button to return to the list of state machine executions. You should see a second execution running.
- Open the state machine execution in progress and monitor the status until complete.
- Open Athena in the AWS Console.
- Select Launch Query Editor, and change the Workgroup toinsurancelake.
- Run the following query to view a sample of prepared data in the consume bucket:select * from syntheticgeneraldata_consume.policydata limit 100
Next Steps
- Take the InsuranceLake Deep Dive Workshop.- You may skip to the Modify and test a transform step, as the prior steps overlap with the Quickstart instructions.
 
- Try out loading your own data.
- Try the Quickstart with CI/CD.
- Dive deeper with the included user documentation.
- Contact your AWS account team for a solution deep dive, workshops, or AWS Professional Services support.