Configuration Reference¶
The following settings can be adjusted in ./infrastructure/config.yaml for your use case
Stack Options¶
WORKLOAD_NAME
-
Description: The name of the workload that will deployed. This name will be used as a prefix for for any component deployed into your AWS Account.
-
Type: String
-
Example:
"GameAnalyticsPipeline"
Data Platform Options¶
The following table shows unsupported configurations when options in this section are enabled
| Control | Setting | Exception |
|---|---|---|
INGEST_MODE |
DIRECT_BATCH |
|
DATA_STACK |
REDSHIFT |
|
REAL_TIME_ANALYTICS |
true |
|
INGEST_MODE
-
Description: Controls the ingestion method for events recieved from the API. When set to
"KINESIS_DATA_STREAMS"events are ingested into a real-time Kinesis Data Stream for live analytics. When set to"DIRECT_BATCH"events are ingested into an Amazon Data Firehose for near-real-time batch ingestion to a data lake. -
Type: String
-
Example:
"KINESIS_DATA_STREAMS","DIRECT_BATCH"
Info
When "DIRECT_BATCH" is configured as the data source, the Firehose stream can support up to the following throughput:
-
For US East (N. Virginia), US West (Oregon), and Europe (Ireland): 500,000 records/second, 2,000 requests/second, and 5 MiB/second.
-
For other AWS Regions: 100,000 records/second, 1,000 requests/second, and 1 MiB/second.
Choose "DIRECT_BATCH" if you know that your peak event ingestion throughput will fall below these limits and optimize event ingestion by batching multiple events per request. If higher throughput is needed, please consider using "KINESIS_DATA_STREAMS" which will enable Firehose to scale up and down with no limit. More information can be found in the Quota section of the Amazon Data Firehose documentation.
REAL_TIME_ANALYTICS
-
Description: Whether or not to enable the Real-Time component/module of the guidance. It is recommended to set this value to
truewhen first deploying this sample code for testing, as this setting will allow you to verify if streaming analytics is required for your use case. This setting can be changed at a later time, and the guidance re-deployed through CI/CD. -
Type: Boolean
-
Example:
true
DATA_STACK
-
Description: Controls the data stack that event data is saved to for analysis. When set to
"DATA_LAKE", raw events are saved to a data lake in S3 and cataloged using Glue Data Catalog. When set to"REDSHIFT"events are using the streaming ingestion feature of Redshift. -
Type: String
-
Example:
"DATA_LAKE","REDSHIFT" -
Do not change this configuration after the stack is deployed
ENABLE_APACHE_ICEBERG_SUPPORT
- Description: Whether or not to enable Apache Iceberg support in place of Apache Hive tables. When set to
true, the raw events table will be configured as an Apache Iceberg table and the Firehose will be reconfigured to send data as Iceberg transactions.
Info
For a general overview on Hive vs Iceberg:
-
Apache Iceberg is a metadata layer over the data lake. It brings benefits such as atomic transactions, advanced partitioning, improved query performance, lower management overhead, and potentially lower event ingestion cost. However, the metadata and transaction mechanism may introduce throttling for high event throughput.
-
Apache Hive is a data lake format where events are stored as parquet objects in S3 and partitioned by prefixes. The Hive format can have faster data ingestion due to the lack of metadata and transaction mechanism, but can lead to unclean reads. Hive data lakes are potentially more expensive during ingestion due to Amazon Data Firehose rounding event sizes up to the nearest 5KB increment.
We generally recommend Iceberg tables for most workloads at this time, along with using Iceberg with Amazon Data Firehose. More details can be found in the AWS documentation for Considerations and limitations with Amazon Data Firehose.
-
Type: Boolean
-
Example:
true -
Do not change this configuration after the stack is deployed. If you would like to enable Iceberg, we recommend deploying a new stack in parallel and migrating existing data.
Real-Time Analytics Options¶
These options are used for when INGEST_MODE is set to KINESIS_DATA_STREAMS
STREAM_PROVISIONED
- Description: The Kinesis stream capacity mode. When set to
true, the stream will be created with the number of shards specified inSTREAM_SHARD_COUNT. When set tofalse, the number of shards will be scaled automatically to handle throughput and theSTREAM_SHARD_COUNTsetting will be ignored. This value can be changed at a later time and re-deployed through CI/CD. For information about determining the capacity mode required for your use case, refer to Choose the data stream capacity mode in the Amazon Kinesis Data Streams Developer Guide.
Info
If you expect that the total On-demand Kinesis Data Stream throughput will be at least 25 MiBps across all on-demand streams within the account and region where you are deploying the Game Analytics Pipeline, please consider enabling Kinesis On-Demand Advantage mode when using On-demand Kinesis streams. On-Demand Advantage mode can provide at least 60% savings on On-demand streams when compared to On-demand standard by committing to a minimum spend amount per day.
Please refer to the blog Kinesis On-demand Advantage saves 60%+ on streaming costs for more information.
STREAM_SHARD_COUNT
-
Description: The number of Kinesis shards, or sequence of data records, to use for the data stream. The default value has been set to
1for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. For information about determining the shards required for your use case, refer to Amazon Kinesis Data Streams Terminology and Concepts in the Amazon Kinesis Data Streams Developer Guide. -
Type: Integer
-
Example:
1 -
Type: Boolean
-
Example:
true
Data Storage Controls¶
EVENTS_DATABASE
-
Description: Specifies the name of the AWS Glue database that contains the glue tables when
DATA_STACKis set to"DATA_LAKE". Specifies the name of the Redshift Serverless database whenDATA_STACKis set to"REDSHIFT". -
Type: String (1-255 characters)
-
Example:
"game_analytics" -
Limitations: For compatibility with tools, the name should consist of lowercase letters, numbers, and underscores and start with a letter.
-
Do not change this configuration after the stack is deployed
RAW_EVENTS_TABLE
-
Description: The name of the of the AWS Glue table within which all new/raw data is cataloged.
-
Type: String (1-255 characters)
-
Example:
"raw_events" -
Limitations: For compatibility with tools, the name should consist of lowercase letters, numbers, and underscores and start with a letter.
-
Do not change this configuration after the stack is deployed
RAW_EVENTS_PREFIX
-
Description: The prefix for new/raw data files stored in S3.
-
Type: String
-
Example:
"raw_events" -
Do not change this configuration after the stack is deployed
PROCESSED_EVENTS_PREFIX
-
Description: The prefix processed data files stored in S3.
-
Type: String
-
Example:
"processed_events" -
Do not change this configuration after the stack is deployed
GLUE_TMP_PREFIX
-
Description: The name of the temporary data store for AWS Glue.
-
Type: String
-
Example:
"glueetl-tmp"
Development Options¶
API_STAGE_NAME
-
Description: The name of the REST API stage for the Amazon API Gateway configuration endpoint for sending telemetry data to the pipeline. This provides an integration option for applications that cannot integrate with Amazon Kinesis directly. The API also provides configuration endpoints for admins to use for registering their game applications with the guidance, and generating API keys for developers to use when sending events to the REST API. The default value is set to
live. -
Type: String
-
Example:
"live"
DEV_MODE
-
Description: Whether or not to enable developer mode. This mode will ensure synthetic data, and shorter retention times are enabled. It is recommended that you set the value to
truewhen first deploying the sample code for testing, as this setting will enable S3 versioning, and won't delete S3 buckets on teardown. This setting can be changed at a later time, and the infrastructure re-deployed through CI/CD. -
Type: Boolean
-
Example:
true
S3_BACKUP_MODE
-
Description: Whether or not to enable Kinesis Data Firehose to send a backup of new/raw data to S3. The default value has been set to
falsefor initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. -
Type: Boolean
-
Example:
false
Monitoring Options¶
EMAIL_ADDRESS
-
Description: The email address to receive operational notifications, and delivered by CloudWatch.
-
Type: String
-
Example:
"user@example.com"
CLOUDWATCH_RETENTION_DAYS
-
Description: The default number of days in which Amazon CloudWatch stores all the logs. The default value has been set to
30for initial deployment, and testing purposes. This value can be changed at a later time, and the guidance re-deployed through CI/CD. -
Type: Integer
-
Example:
30
Version Options¶
NODE_VERSION
-
Description: The version of NodeJS being used. The default value is set to
"latest", and should only be changed this if you require a specific version. -
Type: String
-
Example:
"latest"