Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Guidance for VPC Lattice automated DNS configuration on AWS

Summary: This implementation guide provides an overview of the Guidance for Amazon VPC Lattice Automated DNS Configuration on AWS, its reference architecture and components, considerations for planning the deployment, and configuration steps. This guide is intended for solution architects, business decision makers, DevOps engineers, data scientists, and cloud professionals who want to implement Guidance for Amazon VPC Lattice Automated DNS Configuration on AWS in their environment.


Overview

This guidance automates creation of DNS resolution configuration in Amazon Route 53 when creating new Amazon VPC Lattice services with custom domain names.

Amazon VPC Lattice is an application networking service that simplifies connectivity, monitoring, and security between your services. Its main benefits are the configuration and management simplification, allowing developers to focus on building features while networking and security administrators can provide guardrails in the services’ communication. The service simplifies the onboarding experience for developers by removing the need to implement custom application code or run additional proxies next to every workload, while maintaining the tools and controls network admins require to audit and secure their environments. Amazon VPC Lattice leverages Domain Name Service (DNS) for service discovery, so each Amazon VPC Lattice service is easily identifiable through its service-managed or custom domain names (you can find more information in the section DNS configuration in VPC Lattice). However, for custom domain names, extra manual configuration is needed to allow DNS resolution for the consumer workloads.

This guidance automates configuration of DNS resolution any time a new Amazon VPC Lattice service (with a custom domain name configured) is created. You can find the sample code in the following GitHub repository.

Features and benefits

This guidance provides the following features:

  1. Seamless service discovery with VPC Lattice when using custom domain names.
    • All the DNS resolution is configured in the Private Hosted Zone you choose.
    • Anytime a VPC Lattice service is created/deleted in any AWS Account, its DNS configuration (custom and service-managed domain names for created resources) is sent to the AWS Account managing the DNS configuration. This messages are processed by creating an Alias record.
  2. Automation resources are built using Infrastructure-as-Code.

Use Cases

While Amazon VPC Lattice can be used in a single Account, the most common use case is the use of the service in multi-account environments. With Amazon VPC Lattice, these are two main resources to be used: the VPC Lattice service network is the logical boundary that connects consumers and producers, and the Amazon VPC Lattice service is the independently deployable unit of software that delivers a task or function (the application). The multi-account model for Amazon VPC Lattice can vary depending your use case, and any model you use can work with the use of this guidance. You can find more information about the different multi-account architecture models in the following reference architecture.

This guidance supposes a centralized model in terms of the DNS resolution, as follows:

  • A central Networking account owns all the DNS configuration, sharing it with the rest of the AWS accounts.
  • The rest of the Spoke accounts consume this DNS configuration shared by the Networking account, so resources can resolve the services’ custom domain names to the Amazon VPC Lattice-generated domain name (the consumer services can know the service they want to consume needs to be done through Amazon VPC Lattice).

The guidance is configured to create the DNS resolution using Route 53 Private Hosted Zones. The automation does not create any private hosted zone, nor its association to the virtual private clouds (VPCs) that need to consume the DNS configuration. For this association, we recommend the use of Route 53 Profiles.

Architecture overview

Below is the Architecture diagram and workflow for the Guidance for VPC Lattice automated DNS configuration on AWS.

Amazon VPC Lattice automated DNS configuration architecture and workflow

Figure 1: Amazon VPC Lattice Automated DNS Configuration - Reference Architecture and Workflow.


(1) When a new spoke Account creates a new VPC Lattice service, an Amazon EventBridge rule checks that a new VPC Lattice service has been created with the proper tag. The EventBridge rule also checks whether a VPC Lattice service has been deleted, from the deletion of such tag.

(2) The event is sent to the Get VPC Lattice service information AWS Step Functions state machine. Depending the action (creation or deletion), the state machine publishes an event to a custom event bus. In addition, for created resources with custom domain names, the state machine obtains the domain name configuration (VPC Lattice-generated domain name, VPC Lattice-managed hosted zone, and custom domain name).

(3) The vpclattice_information custom Event Bus is configured with a target pointing to the cross_account Event Bus in the Networking Account.

(4) Unsuccessfully processed delivery events are stored in the Amazon SQS dead-letter queue (DLQ) in the Spoke Account for monitoring.

(5) The cross_account custom Event Bus in the Networking Account invokes the DNS configuration Step Functions state machine to process the notification send by the Spoke Account.

(6) Unsuccessfully processed delivery events are stored in the DLQ in the Networking Account for monitoring.

(7) The DNS configuration state machine will create/delete the corresponding Alias record in the Amazon Route 53 Private Hosted Zone.

(8) AWS Systems Manager and AWS Resource Access Manager (AWS RAM) are used for secure parameter storage and cross-account data sharing.

For an in-depth explanation on how the different resources are configured, please review the guidance technical deep dive section.

AWS Services used in this Guidance

AWS serviceRoleDescriptionService Availability
Amazon EventBridgeCore serviceRules and custom event buses are used for notifying and detecting new resources.Documentation
AWS Step FunctionsCore ServiceServerless state machine used for filtering, subscribing and updating information.Documentation
AWS Systems ManagerSupport ServiceUsed to store parameters that will later be shared.Documentation
AWS Resource Access Manager (RAM)Support ServiceUsed to share parameters among accounts.Documentation
Amazon Simple Queue Service (SQS)Support ServiceSimple event information queue, used for cross-account subscription.Documentation

Cost

You are responsible for the cost of the AWS services used while running this solution guidance. As of November 2024, the cost of running this guidance with default settings lies within the Free Tier, except for the use of AWS Systems Manager Advanced Paramter storage.

We recommend creating a budget through AWS Cost Explorer to help manage costs. Prices are subject to change. You can also estimate the cost for your architecture using AWS Pricing Calculator. For full details, refer to the pricing webpage for each AWS service used in this guidance or visit Pricing by AWS Service.

Estimated monthly cost breakdown - Networking account

This breakdown of the costs of the Networking account show that the highest cost of the automation implementation is the Advanced Parameter Storage resource from Systems Manager. The costs are estimated for US East 1 (Virginia) us-east-1 Region for one month.

AWS serviceDimensionsCost, month [USD]
AWS Systems Manager1 advanced parameter$ 0.05
Amazon EventBridge<= 1 million custom events$ 1.00
AWS Step Functions< 4,000 state transitions$ 0.00
Amazon SQS< 1 million requests$ 0.00
TOTAL estimate $ 1.05/month

Please review the price breakdown details in this AWS calculator.

Estimated monthly cost breakdown - Spoke accounts

The following table provides a sample cost breakdown for deploying this guidance in 1,000 different Spoke accounts which are likely to provide Amazon VPC Lattice in the future. The costs are estimated US East 1 (Virginia) us-east-1 region for one month.

AWS serviceDimensionsCost, month [USD]
Amazon EventBridge<= 1 million custom events$ 1.00
AWS Step Functions< 4,000 state transitions$ 0.00
Amazon SQS< 1 million requests$ 0.00
TOTAL estimate $ 1.00/month

Please review the price breakdown details in this sample AWS calculator.

Pricing by AWS Service

Below are the pricing references for each AWS service used in this guidance.

Prerequisites

Operating System

This guidance uses AWS Serverless managed services, so there’s no OS patching or management.

Third-party tools

For this solution you can either use AWS CloudFormation or HashiCorp Terraform as an Infrastructure-as-Code provider. For Terraform, check the requirements below.

You will need Terraform installed to deploy. These instructions were tested with Terraform version 1.9.3. You can install Terraform following Hashicorp documentation. In addition, AWS credentials need to be configured according to the Terraform AWS Provider documentation.

For each Account deployment (under the deployment/terraform folder), you will find the following HCL config files:

  • providers.tf file provides the Terraform and AWS provider version to use.
  • main.tf and iam.tf provides the resources’ configuration. While main.tf contains the configuration of different AWS services, iam.tf holds the configuration of AWS Identity and Access Management (IAM) roles and policies.
  • variables.tf defines the input for each deployment requirements. The Deploy the Guidance section details the input variables required in each AWS account.
bash-3.2$ cd guidance-for-vpc-lattice-automated-dns-configuration-on-aws/deployment/networking_account
bash-3.2$ ls
README.md
main.tf
providers.tf
iam.tf
outputs.tf
variables.tf

Sample contents of variables/tf source file:

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

# ---------- automation/networking_account/variables.tf ----------

variable "aws_region" {
  description = "AWS Region to build the automation in the Networking AWS Account."
  type        = string
}

variable "phz_id" {
  description = "Amazon Route 53 Private Hosted Zone ID."
  type        = string
}

We use local backend configuration to store the state files. We recommend use of another backend configuration that provides you more consistent storage and versioning, for example the use of Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

AWS Account Requirements

The AWS account credentials must have IAM permission to create and update resources in the Account - these permissions will vary depending the Account type (networking or spoke).

In addition, the guidance assumes your accounts are part of the same AWS Organization - as IAM policies restrict cross-account actions between accounts within the same AWS Organization. For RAM share to work, you need to enable resource sharing with the Organization.

Service quotas

Make sure you have sufficient quota for each of the services implemented in this solution. For more information, see AWS service quotas.

To view the service quotas for all AWS services in the documentation without switching pages, view the information in the Service endpoints and quotas page in the PDF instead.

Deploy the Guidance

Account typeDeployment time (min) - CloudFormationDeployment time (min) - Terraform
Networking31
Spoke31

AWS CloudFormation

  1. Networking AWS Account.
    • Variables needed: Private Hosted Zone ID to create/delete the Alias records.
    • Locate yourself in the deployment/cloudformation folder and configure the AWS credentials of your Networking Account.
     aws cloudformation deploy --stack-name {STACK_NAME} --template-file ./deployment/cloudformation/networking_account.yaml --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM --parameter-overrides PrivateHostedZone={ZONE_ID} --region {REGION}
    
  2. Spoke AWS Account. Follow this process for each spoke Account in which you are creating VPC Lattice services.
     aws cloudformation deploy --stack-name {STACK_NAME} --template-file ./deployment/cloudformation/spoke_account.yaml --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM --region {REGION}
    

Terraform

  1. Networking AWS Account.
    • Variables needed: AWS Region to deploy the resources, and Private Hosted ID to create the Alias records.
    • Locate yourself in the network_account folder and configure the AWS credentials of your Networking Account.
     cd deployment/networking_account
     (configure AWS credentials)
     ...
     terraform init
     terraform apply
    
  2. Spoke AWS Account. Follow this process for each spoke Account in which you are creating VPC Lattice services.
    • Variables needed: AWS Region to deploy the resources, and Networking Account ID.
    • Locate yourself in the spoke_account folder and configure the AWS credentials of your Spoke Account.
     cd deployment/spoke_account
     (configure AWS credentials)
     ...
     terraform init
     terraform apply
    

Test environment

In the sample code GitHub repository, you will find a test environment if you want to check and test an end-to-end implementation using the solution.

Uninstall the Guidance

AWS CloudFormation

  1. Make sure you have cleaned-up the corresponding VPC Lattice services so the automation can remove the Alias records in the Private Hosted Zone.
  2. In each Spoke Account you want to offboard, delete the guidance automation.

     aws cloudformation delete-stack --stack-name {STACK_NAME} --region {REGION}
    
  3. In the Networking AWS Account, delete the guidance automation.

     aws cloudformation delete-stack --stack-name {STACK_NAME} --region {REGION}
    

Terraform

  1. Make sure you have cleaned-up the corresponding VPC Lattice services so the automation can remove the Alias records in the Private Hosted Zone.
  2. In each Spoke Account that you want to offboard, delete the guidance automation.

     cd deployment/spoke_account
     (configure AWS credentials)
     terraform destroy
    
  3. In the networking AWS account, delete the guidance automation.

     cd deployment/networking_account
     (configure AWS credentials)
     terraform destroy
    

Security

When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This shared responsibility model reduces your operational burden because AWS operates, manages, and controls the components including the host operating system, the virtualization layer, and the physical security of the facilities in which the services operate. For more information about AWS security visit AWS Cloud Security.

This guidance relies on many reasonable default options and “principle of least privilege” access for all resources. Users that deploy it in production should go through all the deployed resources and ensure those defaults comply with their security requirements and policies, have adequate logging levels and alarms enabled, and protect access to publicly exposed APIs. In Amazon SQS and Amazon SNS, the resource policies are defined such that only the specified account, organization, or resource can access such resource. IAM roles are defined for Lambda to only access the corresponding resources such as EventBridge, Amazon SQS, and Amazon SNS. AWS RAM securely shares resource parameter such as SQS queue ARN and EventBridge custom event bus ARN. This limits the access to the Amazon VPC Lattice DNS resolution automation to the configuration resources and involved accounts only.

NOTE: Please note that by cloning and using third party open-source code, you assume responsibility for its patching, securing, and managing in the context of this project.

Encryption at rest

Encryption at rest is configured in the SNS topic and SQS queues, using AWS-managed keys. Systems Manager parameters are not configured as SecureString due to the fact that they must be encrypted with a customer managed key, and you must share the key separately through AWS Key Management Service (AWS KMS).

  • Given its sensitivity, we are not creating any KMS resource in this guidance.
  • If you would like to use customer managed keys to encrypt at rest the data of all these services, you will to change the code to configure this option in the corresponding resources:

Technical Deep Dive

DNS configuration in VPC Lattice

When a new Amazon VPC Lattice service is created, a service-managed domain name is generated. This domain name is publicly resolvable and resolves either to an IPv4 link-local address or an IPv6 unique-local address. So, a consumer application using this service-managed domain name does not require any extra DNS configuration for the service-to-service communication (provided the VPC Lattice configuration allows connectivity). However, it’s more likely that you will use your own custom domain names.

When using custom domain names for Amazon VPC Lattice services, an alias (for Amazon Route 53 hosted zones) or CNAME (if you use another DNS solution) have to be created to map the custom domain name with the service-managed domain name. In multi-account environments, the creation of the DNS resolution configuration can create heavy operational overhead. Each Amazon VPC Lattice service created (by each developers’ team) will require a central networking team to be notified with the information about the new service created and the required DNS resolution to be configured.

Networking Account Configuration

Amazon EventBridge

A custom event bus (cross_account_eventbus) is configured to receive events from all the spoke AWS Accounts in the Organization.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "AllowOrgAccess",
    "Effect": "Allow",
    "Principal": {
      "AWS": "*"
    },
    "Action": ["events:PutRule", "events:PutEvents"],
    "Resource": "arn:aws:events:{REGION}:{ACCOUNT_ID}:event-bus/cross_account_eventbus",
    "Condition": {
      "StringEquals": {
        "aws:PrincipalOrgID": "{ORG_ID}"
      }
    }
  }]
}

This event bus has a simple rule: it sends all the events with the vpclattice_information source (Event Pattern configuration below) to the Step Functions state machine that configures the DNS resolution.

{
  "source": ["vpclattice_information"]
}

AWS Step Functions state machine

Networking AWS Account state machine.

Figure 2: Networking AWS Account state machine.


Above you can find the Step Functions state machine definition for the Networking AWS Account. The first state will check the type of message send by the Spoke AWS Account: ServiceCreated or ServiceDeleted

  • ServiceCreated: A parallel workflow state will handle the following actions:
    • Two states will create the corresponding DNS configuration: A (IPv4) & AAAA (IPv6) Alias records.
    • A key-value tag in the Private Hosted Zone. This tag will map the VPC Lattice service ARN (key) with the custom domain name configured (value).
Networking AWS Account state machine. ServiceCreated actions.

Figure 3: Networking AWS Account state machine: ServiceCreated actions.


  • ServiceDeleted: From the information received under this type (only the VPC Lattice service ARN will be provided), this branch will first list all the tags of the Private Hosted Zone, and a map state will iterate over the list of tags to find the custom domain name configured for the provided VPC Lattice service ARN. If found, the following actions will be handled (parallel state):
    • That tag will be deleted.
    • Another map state will iterate over the records configured in the Private Hosted Zone, deleting the ones configuring the custom domain name of the VPC Lattice service ARN provided (both A & AAAA).
Networking AWS Account state machine. ServiceDeleted actions.

Figure 4: Networking AWS Account state machine: ServiceDeleted actions.


  • Otherwise (malformed event), the state machine will end without any actions taken.

Spoke Account Configuration

Amazon EventBridge

First, an EventBridge rule (default event bus) is configured to catch the creation/deletion of VPC Lattice services. Given VPC Lattice does not support EventBridge events, we use the aws.tag source.

{
  "detail": {
    "changed-tag-keys": ["NewService"],
    "resource-type": ["service"],
    "service": ["vpc-lattice"]
  },
  "detail-type": ["Tag Change on Resource"],
  "source": ["aws.tag"]
}

The target of this rule is the Step Functions state machine that obtains the DNS configuration of the VPC Lattice service created/deleted.

In addition, a custom event bus (vpclattice_information) is created to process events after the Step Functions state machine finishes processing the previous event. This event bus has a simple rule: it sends all the events with the vpclattice_information source (Event Pattern configuration below) to the custom event bus created in the Networking AWS Account (and shared using a Systems Manager Advanced Parameter).

{
  "source": ["vpclattice_information"]
}

AWS Step Functions state machine

Spoke AWS Account state machine.

Figure 5: Spoke AWS Account state machine.


Above you can find the Step Functions state machine definition for the Spoke AWS Account. The first state will check the type of event: either a VPC Lattice service has been created (tag NewService = true), or deleted (tag NewService is not configured).

  • If the NewService tag is equal to true, the state machine determines that a new VPC Lattice service has been created. From the VPC Lattice service ARN passed, it checks the resource configuration:
    • If a custom domain name has been configured, an event is published to the custom event bus vpclattice_information with the VPC Lattice service information (ARN, custom domain name, and VPC Lattice-generated domain name).
    • Otherwise, no action is taken.
  • If the NewService tag is not present, the state machine determines that a VPC Lattice service has been deleted. An event is published to the custom event bus vpclattice_information with the deleted VPC Lattice service ARN.

  • Otherwise (malformed event), the state machine will end without any actions taken.

AWS CloudFormation Custom Resources

In the AWS CloudFormation deployment code, two custom resources are used to obtain the AWS Organization ID (Networking AWS Account) and retrieve the AWS Systems Manager parameters shared (Spoke AWS Account). These custom resources can be removed if the corresponding variables are passed as parameters.

  • Networking AWS Account - Retrieving AWS Organization ID.
import logging
import boto3
import json
import cfnresponse
from botocore.exceptions import ClientError

log = logging.getLogger("handler")
log.setLevel(logging.INFO)

org = boto3.client('organizations')

def lambda_handler(event, context):
    try:
        log.info("Received event: %s", json.dumps(event))
        request_type = event['RequestType']
        response = {}

        if request_type == 'Create':
            org_info = org.describe_organization()
            response['Id'] = org_info['Organization']['Id']
            response['Arn'] = org_info['Organization']['Arn']
                  
        cfnresponse.send(event, context, cfnresponse.SUCCESS, response)
                  
    except:
        log.exception("whoops")
        cfnresponse.send(
            event,
            context,
            cfnresponse.FAILED,
            {},
            reason="Caught exception, check logs",
        )
  • Spoke AWS Account - Retrieving AWS SSM parameters’ values (shared by the Networking Account).
import logging
import boto3
import json
import cfnresponse
from botocore.exceptions import ClientError

log = logging.getLogger("handler")
log.setLevel(logging.INFO)

ssm = boto3.client('ssm')

def lambda_handler(event, context):
    try:
        log.info("Received event: %s", json.dumps(event))
        request_type = event['RequestType']
        response = {}

        if request_type == 'Create':
            parameter_name = event["ResourceProperties"]['ParameterName']

            parameter_arn = ssm.describe_parameters(
                Filters=[
                    {
                        'Key': 'Name',
                        'Values': [
                            parameter_name,
                        ]
                    },
                ],
                MaxResults=5,
                Shared=True
            )['Parameters'][0]['ARN']
                    
            value = ssm.get_parameter(
                Name=parameter_arn
            )['Parameter']['Value']

            response['Value'] = value
                  
        cfnresponse.send(event, context, cfnresponse.SUCCESS, response)
                  
    except:
        log.exception("whoops")
        cfnresponse.send(
            event,
            context,
            cfnresponse.FAILED,
            {},
            reason="Caught exception, check logs",
        )

Contributors

The following individuals contributed to this document:

  • Maialen Loinaz Antón, Networking Solutions Architect Intern
  • Pablo Sánchez Carmona, Sr Networking Specialist Solutions Architect
  • Daniel Zilberman, Sr Tech Solutions Specialist Solutions Architect

Notices

Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.