Guidance for Automated Provisioning of Application-Ready Amazon EKS Clusters

Summary: This implementation guide provides an overview of the Guidance for Automated Provisioning of Application-Ready Amazon EKS Clusters, its reference architecture and components, considerations for planning the deployment, and configuration steps for deploying the sample code to Amazon Web Services (AWS). This guide is intended for solution architects, business decision makers, DevOps engineers, and cloud professionals who want to implement Guidance for Automated Provisioning of Application-Ready Amazon EKS Clusters in their environment.

Overview

The Guidance for Automated Provisioning of Application-Ready Amazon EKS Clusters is a collection of reference implementations for Amazon Elastic Kubernetes Service (EKS) designed to accelerate time it takes to launch a workload-ready EKS cluster. It includes an “opinionated” set of pre-configured and integrated tools/add-ons, and follows best practices to support core capabilities including Autoscaling, Observability, Networking and Security.

This guidance addresses the following key points:

Provides a simplified process for setting up an application-ready EKS cluster.
Includes pre-configured tools and add-ons to support essential capabilities.
Aims to reduce the learning curve associated with deploying the first application-ready EKS cluster.
Allows users to focus on deploying and testing applications rather than cluster setup.

The motivation behind this project is to accelerate and simplify the process of setting up a cluster that is ready to support applications and workloads. We’ve heard from customers that there can be a learning curve associated with deploying your first application ready EKS cluster. This project aims to simplify the undifferentiated lifting, allowing you to focus on deploying and testing your applications.

Features and Benefits

Single Cluster Deployment: Deploy one Amazon EKS cluster per environment in a single account, simplifying management and reducing complexity for users new to Kubernetes.
Ready-to-Use Configuration: Receive a pre-configured cluster with opinionated yet customizable tooling, accelerating the setup process and reducing the learning curve for containerization beginners.
Environment-Specific Customization: Easily tailor cluster configurations for different environments (dev, staging, prod), enabling flexible and consistent multi-environment setups.
Integrated Tooling: Benefit from pre-integrated tools and add-ons that support core capabilities like Autoscaling, Observability, Networking, and Security, reducing the time and expertise needed to set up a production-ready cluster.
Best Practices Implementation: Automatically apply AWS and Kubernetes best practices, enhancing security, performance, and cost-efficiency without requiring deep expertise.
Terraform-Based Deployment: Utilize Terraform for infrastructure-as-code deployment, ensuring reproducibility and easier management of cluster configurations across environments.

This reference implementation is designed for customers that require a simple ready-to-use cluster, configured with a set of opinionated (yet configurable to some extent) tooling deployed alongside the cluster itself. The ideal customer profile includes:

Organizations in the early stages of containerization/Kubernetes adoption looking for a simplified deployment to run their applications
Teams with limited resources to manage cluster configurations
Projects with applications that can be deployed in a single cluster
Business units within an organization that need to deploy a multi-environment cluster for their specific workloads

Use Cases

Accelerating Initial Kubernetes Adoption Streamlining the transition to containerization for organizations new to Kubernetes/EKS. This use case addresses organizations in the early stages of adopting containerization and Kubernetes. The reference implementation provides a pre-configured, production-ready EKS cluster, reducing the complexity of initial setup. It allows teams to focus on application containerization rather than cluster configuration, accelerating the path to leveraging containerized workloads.
Optimizing DevOps Resources in Early Kubernetes Adoption Stages Enabling efficient cluster management for teams new to Kubernetes operations. For organizations in the early phases of Kubernetes adoption, DevOps teams often need to balance learning new technologies with maintaining existing systems. This implementation offers a pre-configured EKS cluster with best practices in security and scalability. It reduces the initial operational burden, allowing teams to gradually build Kubernetes expertise while maintaining productivity.
Simplified Single-Cluster Deployment for Initial Projects Providing a streamlined infrastructure for initial Kubernetes applications. This use case caters to teams deploying their first applications on Kubernetes. The reference implementation offers a robust, single-cluster environment per development stage. It’s ideal for initial projects that need containerization benefits without multi-cluster complexity, allowing teams to start small and scale as they gain experience.
Consistent Multi-Environment Setup for Kubernetes Newcomers Facilitating uniform cluster deployments across dev, staging, and production for teams new to Kubernetes. For organizations setting up their first Kubernetes environments, maintaining consistency across development stages is crucial. This implementation uses Terraform and environment-specific configurations to ensure identical setups across dev, staging, and production. It helps teams new to Kubernetes establish a solid foundation for their development pipeline from the start.
Adopting Terraform and AWS Best Practices for Amazon EKS Implementing infrastructure-as-code and AWS-recommended configurations for EKS deployments. This use case is tailored for organizations aiming to adopt or improve their DevOps practices using Terraform while leveraging AWS best practices for EKS. The reference implementation provides a Terraform-based deployment that incorporates AWS-recommended configurations for EKS. It allows teams to quickly implement infrastructure-as-code methodologies, ensuring reproducibility and version control of their EKS environments. Simultaneously, it applies AWS best practices, optimizing security, performance, and cost-efficiency. This approach enables companies to accelerate their DevOps transformation while ensuring their EKS deployments align with industry standards and AWS expertise.

Architecture Overview

This section provides a reference implementation architecture diagram for the components deployed with this Guidance.

At its core, the solution deploys an Amazon EKS cluster within a custom Amazon VPC. This VPC is configured with public and private subnets across multiple Availability Zones for high availability and resilience.

Key elements of the architecture include:

Amazon VPC: A custom VPC with public and private subnets, providing network isolation and security.
Amazon EKS Cluster: The central component, managing the Kubernetes control plane and compute nodes
EKS Managed Node Groups: Auto-scaling groups of EC2 instances that serve as EKS compute nodes.
Amazon Identity & Access Management: Integrated with EKS for fine-grained access control.
Amazon Elastic Container Registry (ECR): For storing and managing container images.
AWS Load Balancer Controller: Automatically provisions Application Load Balancers or Network Load Balancers when a Kubernetes service of type LoadBalancer is created.
Observability Tools: Amazon CloudWatch, Amazon Managed Grafana, and Amazon Managed Service for Prometheus for comprehensive monitoring and logging.
Terraform Resources: Representing infrastructure-as-code components that define and provision the guidance architecture.

This architecture is designed to provide a secure, scalable, and easily manageable EKS environment, incorporating AWS best practices and ready for production workloads.

Architecture diagram

EKS Workload ready cluster reference architecture

Figure 1: Application-Ready Amazon EKS Cluster Reference architecture

Architecture Steps

DevOps engineer defines a per-environment Terraform variable file that controls all of the environment-specific configuration. This variable file is used in every step of the process by all IaC configurations.
DevOps engineer applies the environment configuration using Terraform following the deployment process defined in the guidance.
An Amazon Virtual Private Cloud (VPC) is provisioned and configured based on specified configuration. According to best practices for Reliability, 3 Availability Zones (AZs) are configured with VPC Endpoints for access to resources deployed in private VPCs. Provisioned resources for private clusters, including Amazon Elastic Container Registry (Amazon ECR), Amazon EKS, Amazon Elastic Compute Cloud (Amazon EC2), and Amazon Elastic Block Store (Amazon EBS) are available via corresponding VPC endpoints.
User-facing AWS Identity and Access Management (IAM) roles (Cluster Admin, Admin, Editor and Reader) are created for various access levels to EKS cluster resources, as recommended in Kubernetes security best practices.
Amazon Elastic Kubernetes Service (Amazon EKS) cluster is provisioned with Managed Node Groups that host critical cluster add-ons (CoreDNS, AWS Load Balancer Controller and Karpenter) on its compute node instances. Karpenter manages compute capacity for other add-ons, as well as business applications that will be deployed by users while prioritizing provisioning AWS Graviton based compute node instances for the best price-performance.
Other relevant EKS add-ons are provisioned based on the configurations defined in the per-environment Terraform configuration file.
AWS OSS Observability stack is deployed if configured, including Amazon Managed Service for Prometheus (AMP), AWS Managed Collector for Amazon EKS, and Amazon Managed Grafana (AMG). In addition, a Grafana-operator addon is deployed alongside a set of predefined Grafana dashboards to get started.
Amazon EKS cluster(s) with critical add-ons, configured managed Observability stack and RBAC based security mapped to IAM roles is available for workload deployment and its Kubernetes API is accessible via an AWS Network Load Balancer.

AWS Services in this Guidance

AWS Services used in this Guidance

AWS Service	Role	Description
Amazon Elastic Kubernetes Service ( EKS)	Core service	Manages the Kubernetes control plane and worker nodes for container orchestration.
Amazon Elastic Compute Cloud (EC2)	Core service	Provides the compute instances for EKS worker nodes and runs containerized applications.
Amazon Virtual Private Cloud (VPC)	Core Service	Creates an isolated network environment with public and private subnets across multiple Availability Zones.
Amazon Elastic Container Registry (ECR)	Supporting service	Stores and manages Docker container images for EKS deployments.
Elastic Load Balancing (NLB)	Supporting service	Distributes incoming traffic across multiple targets in the EKS cluster.
Amazon Elastic Block Store (EBS)	Supporting service	Provides persistent block storage volumes for EC2 instances in the EKS cluster.
AWS Identity and Access Management (IAM)	Supporting service	Manages access to AWS services and resources securely, including EKS cluster access.
Amazon Managed Grafana (AMG)	Observability service	Provides fully managed service for metrics visualization and monitoring.
Amazon Managed Service for Prometheus (AMP)	Observability service	Offers managed Prometheus-compatible monitoring for container metrics.
AWS Certificate Manager (ACM)	Security service	Manages SSL/TLS certificates for secure communication within the cluster.
Amazon CloudWatch	Monitoring service	Collects and tracks metrics, logs, and events from EKS and other AWS resources provisoned in the guidance
AWS Systems Manager	Management service	Provides operational insights and takes action on AWS resources.
AWS Key Management Service (KMS)	Security service	Manages encryption keys for securing data in EKS and other AWS services.

Plan your deployment

Cost

You are responsible for the cost of the AWS services used while running this guidance. As of August 2024, the cost for running this guidance with the default settings in the US East (N. Virginia) Region is approximately $447.47/month.

We recommend creating a budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this guidance.

Sample cost table

The following table provides a sample cost breakdown for deploying this guidance with the default parameters in the us-east-1 (N. Virginia) Region for one month. This estimate is based on the AWS Pricing Calculator output for the full deployment as per the guidance.

AWS service	Dimensions	Cost, month [USD]
Amazon EKS	1 cluster	$73.00
Amazon VPC	3 NAT Gateways	$98.67
Amazon EC2	2 m6g.large instances	$112.42
Amazon Managed Service for Prometheus (AMP)	Metric samples, storage, and queries	$100.60
Amazon Managed Grafana (AMG)	Metrics visualization - Editor and Viewer users	$14.00
Amazon EBS	gp2 storage volumes and snapshots	$17.97
Application Load Balancer	1 ALB for workloads	$16.66
Amazon VPC	Public IP addresses	$3.65
AWS Key Management Service (KMS)	Keys and requests	$7.00
Amazon CloudWatch	Metrics	$3.00
Amazon ECR	Data storage	$0.50
TOTAL		$447.47/month

For a more accurate estimate based on your specific configuration and usage patterns, we recommend using the AWS Pricing Calculator.

Security

When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This shared responsibility model reduces your operational burden because AWS operates, manages, and controls the components including the host operating system, the virtualization layer, and the physical security of the facilities in which the services operate. For more information about AWS security, visit AWS Cloud Security.

This guidance implements several security best practices and AWS services to enhance the security posture of your EKS Workload Ready Cluster. Here are the key security components and considerations:

Identity and Access Management (IAM)

IAM Roles: The architecture uses predefined IAM roles (Cluster Admin, Admin, Edit, Read) to manage access to the EKS cluster resources. This follows the principle of least privilege, ensuring users and services have only the permissions necessary to perform their tasks.
EKS Managed Node Groups: These use IAM roles with specific permissions required for nodes to join the cluster and for pods to access AWS services.

Network Security

Amazon VPC: The EKS cluster is deployed within a custom VPC with public and private subnets across multiple Availability Zones, providing network isolation.
Security Groups: Although not explicitly shown in the diagram, security groups are typically used to control inbound and outbound traffic to EC2 instances and other resources within the VPC.
NAT Gateways: Deployed in public subnets to allow outbound internet access for resources in private subnets while preventing inbound access from the internet.

Data Protection

Amazon EBS Encryption: EBS volumes used by EC2 instances are typically encrypted to protect data at rest.
AWS Key Management Service (KMS): Used for managing encryption keys for various services, including EBS volume encryption.

Kubernetes-specific Security

Kubernetes RBAC: Role-Based Access Control is implemented within the EKS cluster to manage fine-grained access to Kubernetes resources.
AWS Certificate Manager: Integrated to manage SSL/TLS certificates for secure communication within the cluster.

Monitoring and Logging

Amazon CloudWatch: Used for monitoring and logging of AWS resources and applications running on the EKS cluster.
Amazon Managed Grafana and Prometheus: Provide additional monitoring and observability capabilities, helping to detect and respond to security events.

Container Security

Amazon ECR: Stores container images in a secure, encrypted repository. It includes vulnerability scanning to identify security issues in your container images.

Secrets Management

AWS Secrets Manager: While not explicitly shown in the diagram, it’s commonly used to securely store and manage sensitive information such as database credentials, API keys, and other secrets used by applications running on EKS.

Additional Security Considerations

Regularly update and patch EKS clusters, worker nodes, and container images.
Implement network policies to control pod-to-pod communication within the cluster.
Use Pod Security Policies or Pod Security Standards to enforce security best practices for pods.
Implement proper logging and auditing mechanisms for both AWS and Kubernetes resources.
Regularly review and rotate IAM and Kubernetes RBAC permissions.

Supported AWS Regions

The core components of “Guidance for Automated Provisioning of Application-Ready Amazon EKS Clusters” are available in all AWS Regions where Amazon EKS service is supported.

The observability components of this guidance use Amazon Managed Service for Prometheus (AMP) and Amazon Managed Grafana (AMG). These services are available in the following regions (as of September 2024):

Region Name	Region Code
US East (N. Virginia)	us-east-1
US East (Ohio)	us-east-2
US West (Oregon)	us-west-2
Asia Pacific (Mumbai)	ap-south-1
Asia Pacific (Seoul)	ap-northeast-2
Asia Pacific (Singapore)	ap-southeast-1
Asia Pacific (Sydney)	ap-southeast-2
Asia Pacific (Tokyo)	ap-northeast-1
Europe (Frankfurt)	eu-central-1
Europe (Ireland)	eu-west-1
Europe (London)	eu-west-2
Europe (Paris)	eu-west-3
Europe (Stockholm)	eu-north-1
South America (São Paulo)	sa-east-1

Regions Supporting Core Components Only

The core components of this Guidance can be deployed in any AWS Region where Amazon EKS is available. This includes all commercial AWS Regions except for the China Regions and the AWS GovCloud (US) Regions.

For the most current availability of AWS services by Region, refer to the AWS Regional Services List.

Note: If you deploy this guidance into a region where AMP and/or AMG are not available, you can disable the OSS observability tooling during deployment. This allows you to use the core components of the guidance without built-in observability features.

Quotas

Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account.

Quotas for AWS services in this Guidance

Ensure you have sufficient quota for each of the AWS services utilized in this guidance. For more details, refer to AWS service quotas.

If you need to view service quotas across all AWS services within the documentation, you can conveniently access this information in the Service endpoints and quotas page in the PDF.

For specific implementation quotas, consider the following key components and services used in this guidance:

Amazon EKS: Ensure that your account has sufficient quotas for Amazon EKS clusters, node groups, and related resources.
Amazon EC2: Verify your EC2 instance quotas, as EKS node groups rely on these.
Amazon VPC: Check your VPC quotas, including subnets and Elastic IPs, to support the networking setup.
Amazon EBS: Ensure your account has sufficient EBS volume quotas for persistent storage.
IAM Roles: Verify that you have the necessary quota for IAM roles, as these are critical for securing your EKS clusters.
AWS Systems Manager: Review the quota for Systems Manager resources, which are used for operational insights and management.
AWS Secrets Manager: If you’re using Secrets Manager for storing sensitive information, ensure your quota is adequate.

Deploy the Guidance

Prerequisites

Before deploying this guidance, please ensure you have met the following prerequisites:

AWS Account and Permissions: Ensure you have an active AWS account with appropriate permissions to create and manage AWS resources like Amazon EKS, EC2, IAM, and VPC.
AWS CLI and Terraform Installation: Install and configure the AWS CLI and Terraform tools on your local machine.
Makefile Setup: This guidance uses a Makefile to automate various tasks related to managing the Terraform infrastructure. Ensure make utility is installed on your system.
S3 Bucket and DynamoDB Table: Set up an S3 bucket and DynamoDB table for storing Terraform state and providing a locking mechanism for managing deployments across environments.
Environment Variables: Set the required environment variables, including AWS_REGION and AWS_PROFILE, to specify the AWS region and profile for deployment.
VPC and Network Configuration: Ensure that you have a VPC configuration ready or are prepared to deploy a new VPC as part of this guidance, with public and private subnets across multiple Availability Zones.
IAM Roles: Define and configure the necessary IAM roles and policies required for accessing the Amazon EKS cluster and related resources.
Observability Tools (Optional): If using the AWS OSS observability stack, ensure the necessary services like Amazon Managed Service for Prometheus and Amazon Managed Grafana are configured. Since AMG uses IAM Identity Center to provide authentication to its Workspace, you’ll have to enable IAM Identity center before deploying this pattern.

For further examples, refer to Dynamic Object and Rule Extensions for AWS Network Firewall and Scale-Out Computing on AWS.

Deployment process overview

Before launching the guidance, review the cost, architecture, security, and other considerations discussed in this guide. Follow these step-by-step instructions to configure and deploy the guidance into your account.

Time to deploy: Approximately 20-30 minutes

Step-by-Step Deployment Instructions

Step 1 - Deploy the Terraform resources for state management

Initialize and apply the Terraform configuration in the 00.tf-base directory to set up the backend (S3 bucket and DynamoDB table) for state management.

export AWS_REGION=us-east-1
export AWS_PROFILE=default
make bootstrap

Step 2 - Configure environments for deployment

Define the environments you want to deploy (e.g., dev, staging, prod) by creating files for each environment in the 00.global/vars/ directory. This guidance includes a baseline file named base-env.tfvars and files for dev and prod environments.

Configurable Variables:
This pattern uses a global configurable variable per environment to customize environment-specific objects for different environments. The variables are documented in the configuration file under the 00.global/var/base-env.tfvars file.

Instead of configuring flags for provisioning resources, this pattern uses use-case-driven flags that result in a complete configuration for a collection of deployments, services, and configurations. For example, setting observability_configuration.aws_oss_tooling = true will result in provisioning the relevant AWS resources (such as AMP and AMG) as well as the configurations that connect them together.

Step 3 - Deploy the environment

Deploy the environment based on the configuration defined per environment in the previous step by running the apply-all Makefile target.

export AWS_REGION=us-east-1
export AWS_PROFILE=default
make ENVIRONMENT=dev apply-all

Deploying the pattern:
This pattern relies on multiple Terraform configurations residing in multiple folders (such as networking, iam-roles-for-eks, eks, addons, and observability). Each folder that holds a Terraform configuration also has a backend.tf Terraform configuration file used to indicate the backend S3 prefix key.

Before deploying the entire cluster configuration, this pattern uses Amazon S3 and Amazon DynamoDB to store the Terraform state of all resources across environments and provide a locking mechanism. The deployment of the S3 Bucket and the DynamoDB table is configured under the folder 00.tf-base.

Default S3 bucket name: tfstate-<AWS_ACCOUNT_ID>
Default DynamoDB table name: tfstate-lock

Several environment variables must be used before running any target:

AWS_REGION: Specify the AWS region to run on
AWS_PROFILE: The AWS configuration profile to be used

Step 4 - Deploy a specific module

To deploy a specific module, specify a MODULE variable before the Makefile target.

export AWS_REGION=us-east-1
export AWS_PROFILE=default
make ENVIRONMENT=dev MODULE=10.networking apply 

Makefile Targets:
We are using a Makefile utility that is designed to automate various tasks related to managing the Terraform infrastructure. It provides several targets (or commands) to handle different aspects of the Terraform workflow:

bootstrap: Initializes and applies the Terraform configuration in the 00.tf-base directory, which is assumed to be responsible for setting up the backend (S3 bucket and DynamoDB table) for state management.
init-all: Initializes all Terraform modules by calling the init target for each module.
plan-all: Runs the plan target for all Terraform modules.
apply-all: Runs the apply target for all Terraform modules.
plan: Runs the terraform plan command for a specific module.
apply: Runs the terraform apply command for a specific module.

Step 5 (optional) - Deploy Observability OSS tooling for Amazon EKS Cluster

This part of this pattern helps you to monitor your Amazon Elastic Kubernetes Service (Amazon EKS) cluster with the AWS Observability tooling for OSS tooling. This includes the following resources:

Amazon Managed Service for Prometheus for storing metrics (AMP)
Amazon Managed Service for Prometheus Managed Collector
Amazon Managed Grafana - for metrics visualization (AMG)
AWS Identity Store user - this is a prerequisite to authenticate to an Amazon Managed Grafana workspace
grafana-operator - this operator allows admin to manage grafana data-source, dashboards, and other configuration directly from Kubernetes using Custom-Resources
Observability specific addons - Prometheus node-exporter and kube-state-metrics

Using Observability Services Deployed by this Guidance

AMP

To access AMP Workspace, you’ll have to use awscurl as guided in the docs. At the bottom of this page, there’s a troubleshooting section with useful commands.

AMG

Since AMG uses IAM Identity Center to provide authentication to the AMG workspace, and since this automation deploys a default grafana-admin user that is allowed to access the workspace, you’ll need to reset its password through the console. See the image below:

After logging into the AMG workspace, you should expect to see the following dashboards already configured and ready to use:

CloudWatch logs

Although CW logs are not configured in this folder, the log-group convention to access the CW logs ingested by the AWS for fluentbit addon is /aws/eks/<CLUSTER_NAME>/aws-fluentbit-logs

Architecture Decisions

Using AWS Managed Services for OSS Observability tooling

Context

This pattern runs a single Amazon EKS cluster per environment in a single account. One of the reasons for that is to simplify the operational processes for managing the cluster and its infrastructure (including connected services, addons, etc…).

Decision

To reduce the need to manage observability tooling and infrastructure, this pattern uses the Amazon Managed Service for Prometheus (AMP), and the Amazon Managed Grafana (AMG) as the observability metric database and visualization service. Amazon CloudWatch is used as the logging visualization tool alongside a deployed addon of aws-fluent-bit. It also uses an in-cluster Prometheus kube-stack, or manages an ADOT Collector, this pattern uses the recently launched AWS Managed Collector for AMP.

Consequences

Consequences of this decision, true to the date of writing this README, is that there is a 1:1 mapping between an EKS cluster and an AMP workspace, meaning that each cluster will have its own AMP workspace. This shouldn’t be a concern as the design consideration of this pattern is to create isolated per-environment deployments of the cluster, its supporting services, and addons.

A challenging consequence of this decision is that currently the AWS Managed Collector is using the previous way of authenticating to an EKS cluster, using the aws-auth ConfigMap entry, and not the new EKS Cluster Access Management feature of managing access to the EKS Cluster using Amazon EKS APIs. Because of that, the AWS Managed Collector role needs to be added with eksctl which is the best available tool to this date to manage aws-auth configMap entries.

Keep an eye on the output of this Terraform folder which includes the eksctl command to create the entry for the AWS Managed Scraper IAM Role

Grafana-Operator to configure AMG workspace

Context

To provide a complete solution in this pattern, the observability resources need to be configured as part of the deployment of this pattern. This includes connecting the AMP workspace as a data-source for the AMG workspace, deploying a set of predefined dashboards as a starting point, and a process to deploy new dashboards in the same way the “basic”/predefined dashboards were deployed.

Decision

Since the AMG API doesn’t support managing the workspace APIs itself (meaning the grafana APIs), and only the AMG service specific configurations, this pattern uses the grafana-operator to connect to the AMG workspace, and deploy dashboards into it.

Consequences

The grafana operator requires the cluster to have an API key stored as a Kubernetes secret in the cluster so that the grafana-operator can connect to the workspace. That means that the API key is generated and stored in the cluster itself.
The dashboards are managed as assets/files stored in a GitRepo and synced to the AMG workspace continuously.
TBD - use of an API key requires a process for rotating a key and redeploying it to system manager as a secret object, and sync it to the cluster afterwards (using ESO - External Secrets Operator).

Deploy it

To deploy resources contained in this folder to a specific environment, use the following commands:

export AWS_REGION=us-east-1
export AWS_PROFILE=default
make ENVIRONMENT=dev MODULE=40.observability/45.aws-oss-observability apply

Troubleshooting

How can I check that metrics are ingested to the AMP workspace?

Using awscurl allows you to interact with the AMP workspace. Use the following command samples to query collecting jobs running, labels collected, and metrics values:

REGION=<YOUR_REGION>
WSID=<YOUR_AMP_WORKSPACE_ID>

# List jobs collecting metrics
awscurl --service="aps" --region=$REGION "https://aps-workspaces.$REGION.amazonaws.com/workspaces/$WSID/api/v1/label/job/values" | jq

# List labels collected
awscurl --service="aps" --region=$REGION "https://aps-workspaces.$REGION.amazonaws.com/workspaces/$WSID/api/v1/label/__name__/values" | jq

# List values for a specific label
awscurl --service="aps" --region=$REGION "https://aps-workspaces.$REGION.amazonaws.com/workspaces/$WSID/api/v1/query?query=apiserver_request_duration_seconds_bucket" | jq

Variables:

ENVIRONMENT: Specifies the environment (e.g., dev, prod) for which the Terraform configuration will be applied. It defaults to dev if not set.
MODULE: Specifies a specific MODULE to run.

Uninstall the Guidance

To uninstall the guidance and remove all associated resources, follow the cleanup steps provided below:

Destroy Specific Modules: If you only want to destroy specific modules, use the following command. Replace 10.networking with the module you wish to destroy:
```
 export AWS_REGION=us-east-1
 export AWS_PROFILE=default
 make ENVIRONMENT=dev MODULE=10.networking destroy AUTO_APPROVE=true 
```
Destroy All Resources: To remove all resources created by this guidance, use the following command. This will run terraform destroy on all modules in reverse order:
```
 export AWS_REGION=us-east-1
 export AWS_PROFILE=default
 make ENVIRONMENT=dev destroy-all AUTO_APPROVE=true 
```
Remove Resources Provisioned Outside This Guidance

Please note that any resources provisioned outside of this guidance will need to be removed manually by the user. These may include, but are not limited to:

Application Load Balancers (ALBs) created for testing or production use
Elastic IP addresses associated with instances or network interfaces
S3 buckets created for data storage or logging
IAM roles or policies created for specific use cases
EC2 instances launched manually for testing or additional workloads
RDS databases created for application data storage
CloudWatch Logs log groups and metrics
Route 53 hosted zones and records

To remove these resources:

Use the AWS Management Console to identify and delete the resources
Use AWS CLI commands to remove resources programmatically
Check each AWS service used in your project for any lingering resources

Architecture Decisions

Global variable file per environment

Context:
This pattern can deploy the same configuration to multiple environments. There’s a need to customize environment-specific configurations to support gradual updates or per-environment specific configuration.

Decision:
This pattern standardizes on a shared Terraform variable file per environment, which is used in the CLI (see Makefile as it wraps the CLI commands) to use this file throughout multiple folder configurations.

Consequences:
This decision helps share the variables across different folders and standardizes variable naming and values.

Storing Environment-Specific State Using Terraform Workspaces

Context:
To keep the IaC code DRY, the state of the different resources needs to be kept in the context of an environment so that other configurations in different folders can access the outputs of the right environment.

Decision:
This pattern uses Terraform Workspaces to store and retrieve environment-specific state from the S3 Backend being used as the remote state backend. Per Hashicorp recommendations on “Multi-Environment Deployment”, it’s encouraged to use Terraform workspaces to manage multiple environments: “Where possible, it’s recommended to use a single backend configuration for all environments and use the terraform workspace command to switch between workspaces.”

Consequences:
The use of Terraform workspaces allows us to use the same IaC code and backend configuration without changing it per environment. As this project’s Makefile wraps the relevant workspace commands, if users choose to rewrite their own CLI automation, they’ll need to handle workspace switching before applying per-environment configuration.

Link to relevant resources and add a sentence describing how the linked solution is related.

Insert additional resources that can help the customer in a bulleted list.

Contributors

Tsahi Duek, Principal GTM Specialist Solutions Architect, Containers
Erez Zarum, Solutions Architect, Containers
Alexander Pinsker, Sr Solutions Architect, Containers
Daniel Zilberman, Sr Solutions Architect, AWS Tech Solutions Team
Tal Hibner, Technical Account Manager

Notices

Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied…

Guidance for Automated Provisioning of Application-Ready Amazon EKS Clusters

Overview

Features and Benefits

Use Cases

Architecture Overview

Architecture diagram

Architecture Steps

AWS Services in this Guidance

AWS Services used in this Guidance

Plan your deployment

Cost

Sample cost table

Security

Identity and Access Management (IAM)

Network Security

Data Protection

Kubernetes-specific Security

Monitoring and Logging

Container Security

Secrets Management

Additional Security Considerations

Supported AWS Regions

Regions Supporting Core Components Only

Quotas

Quotas for AWS services in this Guidance

Deploy the Guidance

Prerequisites

Deployment process overview

Step-by-Step Deployment Instructions

Step 1 - Deploy the Terraform resources for state management

Step 2 - Configure environments for deployment

Step 3 - Deploy the environment

Step 4 - Deploy a specific module

Step 5 (optional) - Deploy Observability OSS tooling for Amazon EKS Cluster

Using Observability Services Deployed by this Guidance

AMP

AMG

CloudWatch logs

Architecture Decisions

Using AWS Managed Services for OSS Observability tooling

Context

Decision

Consequences

Grafana-Operator to configure AMG workspace

Context

Decision

Consequences

Deploy it

Troubleshooting

How can I check that metrics are ingested to the AMP workspace?

Uninstall the Guidance

Architecture Decisions

Global variable file per environment

Storing Environment-Specific State Using Terraform Workspaces

Related resources

Contributors

Notices