Skip to content

Configuration Reference

The following settings can be adjusted in ./infrastructure/config.yaml for your use case

Stack Options

WORKLOAD_NAME

  • Description: The name of the workload that will deployed. This name will be used as a prefix for for any component deployed into your AWS Account.

  • Type: String

  • Example: "GameAnalyticsPipeline"

Data Platform Options

The following table shows unsupported configurations when options in this section are enabled

Control Setting Exception
INGEST_MODE DIRECT_BATCH
  • DATA_STACK cannot be set to REDSHIFT
  • REAL_TIME_ANALYTICS cannot be set to true
  • Settings for STREAM_PROVISIONED and STREAM_SHARD_COUNT are ignored since no stream is deployed
DATA_STACK REDSHIFT
  • ENABLE_APACHE_ICEBERG_SUPPORT cannot be set to true
REAL_TIME_ANALYTICS true
  • INGEST_MODE must be set to KINESIS_DATA_STREAMS

INGEST_MODE

  • Description: Controls the ingestion method for events recieved from the API. When set to "KINESIS_DATA_STREAMS" events are ingested into a real-time Kinesis Data Stream for live analytics. When set to "DIRECT_BATCH" events are ingested into an Amazon Data Firehose for near-real-time batch ingestion to a data lake.

  • Type: String

  • Example: "KINESIS_DATA_STREAMS", "DIRECT_BATCH"

REAL_TIME_ANALYTICS

  • Description: Whether or not to enable the Real-Time component/module of the guidance. It is recommended to set this value to true when first deploying this sample code for testing, as this setting will allow you to verify if streaming analytics is required for your use case. This setting can be changed at a later time, and the guidance re-deployed through CI/CD.

  • Type: Boolean

  • Example: true

DATA_STACK

  • Description: Controls the data stack that event data is saved to for analysis. When set to "DATA_LAKE", raw events are saved to a data lake in S3 and cataloged using Glue Data Catalog. When set to "REDSHIFT" events are using the streaming ingestion feature of Redshift.

  • Type: String

  • Example: "DATA_LAKE", "REDSHIFT"

  • Do not change this configuration after the stack is deployed

ENABLE_APACHE_ICEBERG_SUPPORT

  • Description: Whether or not to enable Apache Iceberg support in place of Apache Hive tables. When set to true, the raw events table will be configured as an Apache Iceberg table and the Firehose will be reconfigured to send data as Iceberg transactions. Enabling this option comes with considerations for Firehose.

  • Type: Boolean

  • Example: true

  • Do not change this configuration after the stack is deployed. If you would like to enable Iceberg, we recommend deploying a new stack in parallel and migrating existing data.

Real-Time Analytics Options

These options are used for when INGEST_MODE is set to KINESIS_DATA_STREAMS

STREAM_PROVISIONED

  • Description: The Kinesis stream capacity mode. When set to true, the stream will be created with the number of shards specified in STREAM_SHARD_COUNT. When set to false, the number of shards will be scaled automatically to handle throughput and the STREAM_SHARD_COUNT setting will be ignored. This value can be changed at a later time and re-deployed through CI/CD. For information about determining the capacity mode required for your use case, refer to Choose the data stream capacity mode in the Amazon Kinesis Data Streams Developer Guide.

STREAM_SHARD_COUNT

  • Description: The number of Kinesis shards, or sequence of data records, to use for the data stream. The default value has been set to 1 for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. For information about determining the shards required for your use case, refer to Amazon Kinesis Data Streams Terminology and Concepts in the Amazon Kinesis Data Streams Developer Guide.

  • Type: Integer

  • Example: 1

  • Type: Boolean

  • Example: true

Data Storage Controls

EVENTS_DATABASE

  • Description: Specifies the name of the AWS Glue database that contains the glue tables when DATA_STACK is set to "DATA_LAKE". Specifies the name of the Redshift Serverless database when DATA_STACK is set to "REDSHIFT".

  • Type: String (1-255 characters)

  • Example: "game_analytics"

  • Limitations: For compatibility with tools, the name should consist of lowercase letters, numbers, and underscores and start with a letter.

  • Do not change this configuration after the stack is deployed

RAW_EVENTS_TABLE

  • Description: The name of the of the AWS Glue table within which all new/raw data is cataloged.

  • Type: String (1-255 characters)

  • Example: "raw_events"

  • Limitations: For compatibility with tools, the name should consist of lowercase letters, numbers, and underscores and start with a letter.

  • Do not change this configuration after the stack is deployed

RAW_EVENTS_PREFIX

  • Description: The prefix for new/raw data files stored in S3.

  • Type: String

  • Example: "raw_events"

  • Do not change this configuration after the stack is deployed

PROCESSED_EVENTS_PREFIX

  • Description: The prefix processed data files stored in S3.

  • Type: String

  • Example: "processed_events"

  • Do not change this configuration after the stack is deployed

GLUE_TMP_PREFIX

  • Description: The name of the temporary data store for AWS Glue.

  • Type: String

  • Example: "glueetl-tmp"

Development Options

API_STAGE_NAME

  • Description: The name of the REST API stage for the Amazon API Gateway configuration endpoint for sending telemetry data to the pipeline. This provides an integration option for applications that cannot integrate with Amazon Kinesis directly. The API also provides configuration endpoints for admins to use for registering their game applications with the guidance, and generating API keys for developers to use when sending events to the REST API. The default value is set to live.

  • Type: String

  • Example: "live"

DEV_MODE

  • Description: Whether or not to enable developer mode. This mode will ensure synthetic data, and shorter retention times are enabled. It is recommended that you set the value to true when first deploying the sample code for testing, as this setting will enable S3 versioning, and won't delete S3 buckets on teardown. This setting can be changed at a later time, and the infrastructure re-deployed through CI/CD.

  • Type: Boolean

  • Example: true

S3_BACKUP_MODE

  • Description: Whether or not to enable Kinesis Data Firehose to send a backup of new/raw data to S3. The default value has been set to false for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD.

  • Type: Boolean

  • Example: false

Monitoring Options

EMAIL_ADDRESS

  • Description: The email address to receive operational notifications, and delivered by CloudWatch.

  • Type: String

  • Example: "user@example.com"

CLOUDWATCH_RETENTION_DAYS

  • Description: The default number of days in which Amazon CloudWatch stores all the logs. The default value has been set to 30 for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD.

  • Type: Integer

  • Example: 30

Version Options

CDK_VERSION

  • Description: The version of the CDK installed in your environment. To see the current version of the CDK, run the cdk --version command. The guidance has been tested using CDK version 2.92.0 of the CDK. If you are using a different version of the CDK, ensure that this version is also reflected in the ./infrastructure/package.json file.

  • Type: String

  • Example: "2.92.0"

NODE_VERSION

  • Description: The version of NodeJS being used. The default value is set to "latest", and should only be changed this if you require a specific version.

  • Type: String

  • Example: "latest"

PYTHON_VESION

  • Description: The version of Python being used. The default value is set to "3.8", and should only be changed if you require a specific version.

  • Type: String

  • Example: "3.8"