Configuration Reference¶
The following settings can be adjusted in ./infrastructure/config.yaml
for your use case
Stack Options¶
WORKLOAD_NAME
-
Description: The name of the workload that will deployed. This name will be used as a prefix for for any component deployed into your AWS Account.
-
Type: String
-
Example:
"GameAnalyticsPipeline"
Data Platform Options¶
The following table shows unsupported configurations when options in this section are enabled
Control | Setting | Exception |
---|---|---|
INGEST_MODE |
DIRECT_BATCH |
|
DATA_STACK |
REDSHIFT |
|
REAL_TIME_ANALYTICS |
true |
|
INGEST_MODE
-
Description: Controls the ingestion method for events recieved from the API. When set to
"KINESIS_DATA_STREAMS"
events are ingested into a real-time Kinesis Data Stream for live analytics. When set to"DIRECT_BATCH"
events are ingested into an Amazon Data Firehose for near-real-time batch ingestion to a data lake. -
Type: String
-
Example:
"KINESIS_DATA_STREAMS"
,"DIRECT_BATCH"
REAL_TIME_ANALYTICS
-
Description: Whether or not to enable the Real-Time component/module of the guidance. It is recommended to set this value to
true
when first deploying this sample code for testing, as this setting will allow you to verify if streaming analytics is required for your use case. This setting can be changed at a later time, and the guidance re-deployed through CI/CD. -
Type: Boolean
-
Example:
true
DATA_STACK
-
Description: Controls the data stack that event data is saved to for analysis. When set to
"DATA_LAKE"
, raw events are saved to a data lake in S3 and cataloged using Glue Data Catalog. When set to"REDSHIFT"
events are using the streaming ingestion feature of Redshift. -
Type: String
-
Example:
"DATA_LAKE"
,"REDSHIFT"
-
Do not change this configuration after the stack is deployed
ENABLE_APACHE_ICEBERG_SUPPORT
-
Description: Whether or not to enable Apache Iceberg support in place of Apache Hive tables. When set to
true
, the raw events table will be configured as an Apache Iceberg table and the Firehose will be reconfigured to send data as Iceberg transactions. Enabling this option comes with considerations for Firehose. -
Type: Boolean
-
Example:
true
-
Do not change this configuration after the stack is deployed. If you would like to enable Iceberg, we recommend deploying a new stack in parallel and migrating existing data.
Real-Time Analytics Options¶
These options are used for when INGEST_MODE
is set to KINESIS_DATA_STREAMS
STREAM_PROVISIONED
- Description: The Kinesis stream capacity mode. When set to
true
, the stream will be created with the number of shards specified inSTREAM_SHARD_COUNT
. When set tofalse
, the number of shards will be scaled automatically to handle throughput and theSTREAM_SHARD_COUNT
setting will be ignored. This value can be changed at a later time and re-deployed through CI/CD. For information about determining the capacity mode required for your use case, refer to Choose the data stream capacity mode in the Amazon Kinesis Data Streams Developer Guide.
STREAM_SHARD_COUNT
-
Description: The number of Kinesis shards, or sequence of data records, to use for the data stream. The default value has been set to
1
for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. For information about determining the shards required for your use case, refer to Amazon Kinesis Data Streams Terminology and Concepts in the Amazon Kinesis Data Streams Developer Guide. -
Type: Integer
-
Example:
1
-
Type: Boolean
-
Example:
true
Data Storage Controls¶
EVENTS_DATABASE
-
Description: Specifies the name of the AWS Glue database that contains the glue tables when
DATA_STACK
is set to"DATA_LAKE"
. Specifies the name of the Redshift Serverless database whenDATA_STACK
is set to"REDSHIFT"
. -
Type: String (1-255 characters)
-
Example:
"game_analytics"
-
Limitations: For compatibility with tools, the name should consist of lowercase letters, numbers, and underscores and start with a letter.
-
Do not change this configuration after the stack is deployed
RAW_EVENTS_TABLE
-
Description: The name of the of the AWS Glue table within which all new/raw data is cataloged.
-
Type: String (1-255 characters)
-
Example:
"raw_events"
-
Limitations: For compatibility with tools, the name should consist of lowercase letters, numbers, and underscores and start with a letter.
-
Do not change this configuration after the stack is deployed
RAW_EVENTS_PREFIX
-
Description: The prefix for new/raw data files stored in S3.
-
Type: String
-
Example:
"raw_events"
-
Do not change this configuration after the stack is deployed
PROCESSED_EVENTS_PREFIX
-
Description: The prefix processed data files stored in S3.
-
Type: String
-
Example:
"processed_events"
-
Do not change this configuration after the stack is deployed
GLUE_TMP_PREFIX
-
Description: The name of the temporary data store for AWS Glue.
-
Type: String
-
Example:
"glueetl-tmp"
Development Options¶
API_STAGE_NAME
-
Description: The name of the REST API stage for the Amazon API Gateway configuration endpoint for sending telemetry data to the pipeline. This provides an integration option for applications that cannot integrate with Amazon Kinesis directly. The API also provides configuration endpoints for admins to use for registering their game applications with the guidance, and generating API keys for developers to use when sending events to the REST API. The default value is set to
live
. -
Type: String
-
Example:
"live"
DEV_MODE
-
Description: Whether or not to enable developer mode. This mode will ensure synthetic data, and shorter retention times are enabled. It is recommended that you set the value to
true
when first deploying the sample code for testing, as this setting will enable S3 versioning, and won't delete S3 buckets on teardown. This setting can be changed at a later time, and the infrastructure re-deployed through CI/CD. -
Type: Boolean
-
Example:
true
S3_BACKUP_MODE
-
Description: Whether or not to enable Kinesis Data Firehose to send a backup of new/raw data to S3. The default value has been set to
false
for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. -
Type: Boolean
-
Example:
false
Monitoring Options¶
EMAIL_ADDRESS
-
Description: The email address to receive operational notifications, and delivered by CloudWatch.
-
Type: String
-
Example:
"user@example.com"
CLOUDWATCH_RETENTION_DAYS
-
Description: The default number of days in which Amazon CloudWatch stores all the logs. The default value has been set to
30
for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. -
Type: Integer
-
Example:
30
Version Options¶
CDK_VERSION
-
Description: The version of the CDK installed in your environment. To see the current version of the CDK, run the
cdk --version
command. The guidance has been tested using CDK version2.92.0
of the CDK. If you are using a different version of the CDK, ensure that this version is also reflected in the./infrastructure/package.json
file. -
Type: String
-
Example:
"2.92.0"
NODE_VERSION
-
Description: The version of NodeJS being used. The default value is set to
"latest"
, and should only be changed this if you require a specific version. -
Type: String
-
Example:
"latest"
PYTHON_VESION
-
Description: The version of Python being used. The default value is set to
"3.8"
, and should only be changed if you require a specific version. -
Type: String
-
Example:
"3.8"