Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS

Summary: This implementation guide provides an overview of the Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS, its reference architecture and components, considerations for planning the deployment, and configuration steps for deploying the Guidance name to Amazon Web Services (AWS). This guide is intended for solution architects, business decision makers, DevOps engineers, data scientists, and cloud professionals who want to implement Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS in their environment.

Overview

Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS provides an end-to-end, pipeline-based solution on AWS to reconstruct 3D scenes or objects from images or video inputs. The infrastructure can be deployed with AWS Cloud Development Kit (AWS CDK) or Terraform, leveraging infrastructure-as-code.

Use cases

Once deployed, the Guidance features a full 3D reconstruction back-end system with the following customizable components or pipelines:

Media Ingestion: Process videos or collections of images as input
Image Processing: Automatic filtering, enhancement, and preparation of source imagery (for example, background removal)
Structure from Motion (SfM): Camera pose estimation and initial 3D point cloud generation
Gaussian Splat Training: Optimization of 3D Gaussian primitives to represent the scene using AI/ML
Export and Delivery: Generation of the final 3D asset in standard formats for easy viewing and notification by email

By deploying this Guidance, users gain access to a flexible infrastructure that handles the entire 3D reconstruction process programatically, from media upload to final 3D model delivery, while being highly modular through its componentized pipeline-based approach. This Guidance addresses the significant challenges organizations face when trying to create photorealistic 3D content—traditionally a time-consuming, expensive, and technically complex process requiring specialized skills and equipment.

Custom GS pipeline container

A Docker container image contains all of the 3D reconstruction tools for Gaussian Splatting in this project. This container has a Dockerfile, main.py, and helper script files and open source libraries under the source/container directory. The main script processes each request from the Amazon SageMaker Training Job invoke message and saves the result to S3 upon successful completion.

The list of open source libraries that make this project possible include:

NerfStudio (Apache-2.0) - Splat Training Engine
Glomap (BSD-3-Clause) - Global SfM
Colmap (BSD) - Incremental SfM
OpenCV (Apache-2.0) - Video and Image Processing
gsplat (Apache-2.0) - Splat Model
splatfacto-w (Apache-2.0) - Splat Model
3DGRUT (Apache-2.0) - Gaussian Ray Tracing Model
backgroundremover (MIT) - General Background Remover for Objects
sam2 (Apache-2.0/BSD-3-Clause) - High Quality Background Remover for Objects in Video
SuperSplat (MIT) - Splat Editor
Gradio (Apache-2.0) - UI and Splat Viewer

User interface and viewer

A Gradio UI is provided in this Guidance to assist in the generation process. This UI enables users to submit gaussian splatting jobs, as well as view the generated splat in a 3D viewer.

Features and benefits

This 3D reconstruction toolbox on the cloud allows customers to quickly and easily build photorealistic 3D assets from images or video using the best-in-class open source libraries that have commercial friendly licensing. This guide includes an automated end-to-end experience from deployment, submitting requests, and viewing the 3D result in a 3D web browser.

Architecture overview

This Guidance will:

create the infrastructure required to create a Gaussian splat from a video or set of images.
create the mechanism to run the code and perform 3D reconstruction.
enable a user to create a 3D gaussian splat using open source tools and AWS by uploading a video (.mp4 or .mov) or images (.png or .jpg) and metadata (.json) into Amazon Simple Storage Service (Amazon S3).
provide a 3D viewer for viewing the photo-realistic effects and performant nature of gaussian splats.

Deployment architecture diagram

Figure 1: 3D Reconstruction Toolbox for Gaussian Splats on AWS Deployment Architecture

Deployment architecture steps

An administrator deploys the Guidance to an AWS account and Region using AWS Cloud Development Kit (AWS CDK) or Terraform.
The Base AWS CloudFormation stack to deploy will create all the AWS resources needed to host the Guidance. This includes: an Amazon Simple Storage Service (Amazon S3) bucket, AWS Lambda functions, an Amazon DynamoDB table, necessary AWS Identity and Access Management (IAM) permissions, and an Amazon Elastic Container Registry (Amazon ECR) image registry. Additionally, it includes an AWS Step Functions state machine resource ID in Parameter Store, a capability of AWS Systems Manager, and it creates an Amazon Simple Notification Service (SNS) topic.
Once the Base CloudFormation stack has been deployed, deploy the Post Deploy CloudFormation stack. That stack will build a Docker container and push it to the Amazon ECR registry. It will also build and push the pre-processing models used during training, such as for background removal, into the S3 bucket.

Architecture diagram

Figure 2: 3D Reconstruction Toolbox for Gaussian Splats on AWS Reference Architecture

Architecture steps

The user authenticates to IAM using AWS Tools and SDKs.
The input is uploaded to a dedicated S3 job bucket location. This can be done using a Gradio interface and AWS Software Development Kit (AWS SDK).
Optionally, the Guidance supports external job submission by uploading a ‘.JSON’ job configuration file and media files into a designated S3 job bucket location.
The job JSON file uploaded to the S3 job bucket will trigger an Amazon SNS message that will invoke the initialization Job Trigger Lambda function.
The Job Trigger Lambda function will perform input validation and set appropriate variables for the Step Functions State Machine.
The workflow job record will be created in the DynamoDB job table.
The Job Trigger Lambda function will invoke Step Functions State Machine to handle the entire workflow job.
An Amazon SageMaker AI Training Job will be submitted synchronously using the state machine built-in wait until completion mechanism.
The Amazon ECR container image and S3 job bucket model artifacts will be used to deploy a new container on a graphics processing unit (GPU) based compute node. The compute node instance type is determined by the job JSON configuration.
The container will run the entire pipeline as an Amazon SageMaker AI training job on a GPU compute node.
The Job Completion Lambda function will complete the workflow job by updating the job metadata in DynamoDB and using Amazon SNS to notify the user through email upon completion.
The internal workflow parameters are stored in Parameter Store during deployment to decouple the Job Trigger Lambda function and the Step Function State Machine.
Amazon CloudWatch logs and monitors the training jobs, surfacing possible errors to the user.

AWS services in this Guidance

AWS Service	Role
Amazon Simple Storage Service (Amazon S3)	Core	Host training models, job configurations, media, and generated assets
AWS Lambda	Core	Run custom code to process requests
Amazon Simple Notification Service (Amazon SNS)	Core	Send completion status notification by email
AWS Step Functions	Core	Orchestrate the 3D reconstruction workflow
Amazon DynamoDB	Core	Store training job details and attributes
Amazon SageMaker	Core	Run 3D reconstruction pipeline processing on container
Amazon Elastic Container Registry	Core	Image store for the custom created container
Amazon CloudWatch	Core	Monitor logs and surface errors to Amazon SNS
AWS Identity and Access Management (IAM)	Core	Security access controls to run the workflow securely
AWS Cloud Development Kit (AWS CDK)	Core	Cloud infrastructure as code for easy deployment
Amazon Systems Manager Parameter Store	Core	Securely store infrastructure resource ids in Parameter Store to aid in deployment and execution

Security

When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This shared responsibility model reduces your operational burden because AWS operates, manages, and controls the components including the host operating system, the virtualization layer, and the physical security of the facilities in which the services operate. For more information about AWS security, visit AWS Cloud Security.

All data is encrypted at rest and at transit within the AWS Cloud services in this Guidance
An Amazon S3 access logging bucket logs all access to the asset bucket
Input validation on the job configuration will flag any misconfigurations in the json file
Least priviledge access rights on service actions

Plan your deployment

Service quotas

Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account.

Quotas for AWS services in this Guidance

Service quotas - increases can be requested through the AWS Management Console, AWS command line interface (CLI), or AWS SDKs (see Accessing Service Quotas)
This Guidance runs SageMaker Training Jobs which uses a Docker container to run the training. This deployment guide walks through building a custom container image for SageMaker.
- Depending on what instances you will be using to train on (configured during job submission, ml.g5.4xlarge is the default), you may need to adjust the SageMaker Training Jobs quota. This will be under the SageMaker service in Service Quotas named “training job usage”.
- (Optional) You can optionally build and test this container locally (not running on SageMaker) on a GPU-enabled EC2 instance. If you plan to do this, increase the EC2 quota named “Running On-Demand G and VT instances” and/or “Running On-Demand P instances”, depending on the instance family you plan to use, to a desired maximum number of vCPUs for running instances of the target family. Note, this is vCPUs NOT number of instances like the SageMaker Training Jobs quota.

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of May 2025, the cost for running this Guidance with the default settings in the default AWS Region (US East 1(N. Virginia)) is approximately $278.33 per month for processing 100 requests.

We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Cost table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

AWS Service	Dimensions	Cost [USD]
Amazon S3	Standard feature storage (input=200MB, output=2.5GB)	$1.61/month
Amazon S3	Data transfer feature	$0.90/month
Amazon DynamoDB	Job table storage, 0.5MB per month, 1GB total, avg item size=825bytes	$0.81/month
AWS Lambda	2 invocations per job, 1.25s, 7.1s = 8.5s	$0.01/month
AWS Step Functions	State transitions per workflow = 5	$0.01/month
Amazon SageMaker	num_instance=1, num_hours_per_job=1, ml.g5.4xlarge, Volume_size_in_GB_per_job=15	$273.00/month
Amazon ECR	Data storage, 15GB	$1.47/month
Amazon SNS	Email notifications, 1 per request	$0.01/month
Parameter Store	Store 1 param	$0.01/month
Amazon CloudWatch	Metrics, 1GB	$0.50/month
TOTAL	(est. 100 requests)	$278.33/month

Deploy the Guidance

Prerequisites

AWS account with IAM user or role with elevated permissions to deploy resources (Admin recommended for test accounts)
Local computer with appropriate AWS credentials to deploy the CDK or Terraform solution
(Optional, but recommended) Use an Amazon Elastic Compute Cloud (Amazon EC2) workstation to build and deploy the CDK or Terraform solution
- Ensure your local computer has an SSH client (For Windows, Putty was tested)
- Ensure your local computer has the NICE DCV client installed Windows , MacOS , or Linux
- A CloudFormation template is given here to spin up a fresh, full-featured Ubuntu desktop
  1. Prerequisites: Before you build the EC2 workstation stack, ensure the following resources are created in your AWS account and region of choice:
    - VPC
      Follow these instructions if you do not have one. This will be where your EC2 will live. Ensure there is a public subnet available with internet access in order to pull the GitHub repositories.
    - Keypair
      Follow these instructions if you do not have one. This is used to remote into the EC2 desktop.
    - Security Group
      Follow these instructions to create a security group. Enable inbound NiceDCV using TCP/UDP port 8443 and SSH using port 22. Ensure your source IP address is the resource for all entries.
      For Inbound rules, add:
      Custom TCP, Port range=8443, source="My IP"
      Custom UDP, Port range=8443, source="My IP"
      SSH, Port range=22, source="My IP"
      If you plan on using the included Gradio user interface to submit jobs, enable the default port so you can access the app through your local browser.
      Custom TCP, Port range=7860, source="My IP"
      Record the security group Id for later
  2. Download the deep-learning-ubuntu-desktop.yaml file locally from the repo linked above
  3. Open the AWS Console and navigate to the CloudFormation console
  4. Select 'Create stack' -> 'With new resources'
  5. On `Create Stack` page, select:
    - Choose an existing template
    - Choose Upload a template file
    - Select the deep-learning-ubuntu-desktop.yaml file downloaded earlier
  6. On Specify stack details page, leave default values except for the following:
    - Stack Name: YOUR-CHOICE
    - AWSUbuntuAMIType: UbuntuPro2204LTS
    - DesktopAccessCIDR: YOUR-PUBLIC-IP-ADDRESS/32
    - DesktopInstanceType: g5.2xlarge
    - DesktopSecurityGroupId: SG-ID-FROM-ABOVE
    - DesktopVpcId: VPC-ID-FROM-ABOVE
    - DesktopVpcSubnetId: YOUR-PUBLIC-SUBNET-ID
    - KeyName: KEYNAME-FROM-ABOVE
    - (optional) S3Bucket: S3-BUCKET-WITH-CODE
  7. Submit and monitor the stack creation in the CloudFormation console
  8. On successful building of the stack, navigate to the EC2 console in the account and region the deployed stack is in
  9. Locate the instance just created using the `Stack Name` entered above, select the instance, and select Actions->Security->Modify IAM Role
  10. Record the current IAM role name
  11. Navigate to the IAM Console in a separate browser tab or window
  12. Under `Roles`, search for the role using the IAM role name identified above
  13. Select the role by clicking on its name
  14. In the permissions policies table, select Add permissions->Attach policies:
    - Attach the following AWS managed policies to the role
      AmazonEC2ContainerRegistryFullAccess
      AmazonS3FullAccess
      AmazonSSMManagedInstanceCore
      AWSCloudFormationFullAccess
      IAMFullAccess
  15. SSH into the workstation using the EC2 public IP (found in the EC2 console), security group, and SSH terminal
  16. Once connected to the EC2 workstation, perform the following commands to update the OS and password
```
          
          sudo apt update
          sudo passwd ubuntu
          
          
```
  17. The EC2 will reboot automatically while updating is being performed in the background
  18. The EC2 setup is complete once the message echo 'NICE DCV server is enabled!' is shown when performing the following command
```
          
          tail /var/log/cloud-init-output.log
          
          
```
  19. Once the EC2 has the enabled NICE DCV message, use the NICE DCV client, EC2 public IP address, username 'ubuntu' and Ubuntu password set earlier to remotely connect to the EC2 instance.
  20. Be sure to not upgrade the OS (even when prompted) as it will break critical packages. Only choose to enable security updates.
  21. Open the Visual Code program in the EC2 instance by locating it in the Application library
Install and configure the AWS CLI (if not using the recommended EC2 deployment below)
- Install or update the latest version of the AWS CLI
- Set up the AWS CLI - create configuration and set up credentials
Install Git (if not using the recommended EC2 deployment below)

Docker installed (if not using the recommended EC2 deployment below)

N.B. buildx is also required. For Windows and macOS buildx is included in Docker Desktop

Docker is required to build the container image that is used for training the splat. This will require at least 20GB of empty disk space on your deployment machine.

Note: If building on Windows or MacOS and receive the below error, set the number of logical processors to 1. Also, it is recommended to use the EC2 Ubuntu deployment method below to mitigate this error.

  #17 [13/28] RUN pip install --no-cache-dir -r ./requirements_2.txt
  #17 2.026 Processing ./diff-gaussian-rasterization
  #17 2.029   Preparing metadata (setup.py): started
  #17 8.350   Preparing metadata (setup.py): finished with status 'done'
  #17 8.365 Processing ./diff-surfel-rasterization
  #17 8.367   Preparing metadata (setup.py): started
  #17 12.41   Preparing metadata (setup.py): finished with status 'done'
  #17 12.42 Building wheels for collected packages: diff-gaussian-rasterization, diff-surfel-rasterization
  #17 12.42   Building wheel for diff-gaussian-rasterization (setup.py): started
  ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF

(Optional) if using VSCode and using the Terraform deployment, the HashiCorp Terraform extension is helpful

Visit clone a repository for instructions. Clone this repository locally.

  git clone https://github.com/aws-solutions-library-samples/guidance-for-open-source-3d-reconstruction-toolbox-for-gaussian-splats-on-aws.git

Download and install Node Package Manager (NPM) if not already on your system. The LTS version is recommended.

For Ubuntu EC2

  sudo apt update
  sudo apt install npm nodejs -y
  sudo chown -R $USER PATH-TO-THIS-REPO
  sudo npm install -g n
  sudo n stable

For Windows/MacOS
```
  npm install -g npm@latest
```

Confirm authenticated as the correct IAM principal and in the correct AWS account (see get-caller-identity for more info).
```
  aws sts get-caller-identity
```
Set the AWS Region, replacing the Region below with your deployed region
- Windows
```
  set AWS_REGION=us-east-1
```
- Linux/Mac
```
  export AWS_REGION=us-east-1
```

Deployment process overview

Before you launch the Guidance, review the cost, architecture, security, and other considerations discussed in this guide. Follow the step-by-step instructions in this section to configure and deploy the Guidance into your account.

Time to deploy: Approximately 60 minutes

[OPTION #1: CDK instructions] Deploy backend infrastructure

Download and install Node Package Manager (NPM) if not already on your system. The LTS version is recommended.
For EC2 Ubuntu, it might be necessary to add sudo privilages to the below command
```
 npm install -g npm@latest
```
Install the latest CDK toolkit (see AWS CDK Toolkit for more info). Check the version to confirm the installation was successful.
For EC2 Ubuntu, it might be necessary to add sudo privilages to the below command
```
 npm install -g aws-cdk@latest
```
Once installed, the following command will confirm installation was successful.
```
 cdk --version
```
From the project root, navigate to the deployment/cdk/ folder.
Open config.json and enter values for the following fields:
- accountId - target 12-digit AWS account number
- region - target AWS region (e.g. us-west-2)
- s3TriggerKey - S3 name/key to use for json job submissions
- adminEmail - valid email address used to create a user for the SNS email notifications
- maintainS3ObjectsOnStackDeletion - whether to keep the S3 objects upon stack deletion or not
Create a virtual environment.
- For Windows
  Download and install the latest Python version
  Replace PATH-TO-PYTHON-VERSION-EXE with your version and path to Python (e.g. virtualenv --python C:/Python312/python.exe 3dgs-env)
  virtualenv --python PATH-TO-PYTHON-VERSION-EXE 3dgs-env .\3dgs-env\Scripts\activate
- For macOS/Linux
  For Ubuntu, you may need to install virtualenv using sudo apt install python3-virtualenv -y
  Replace PATH-TO-PYTHON-VERSION-EXE with your version and path to Python (e.g. virtualenv --python /usr/bin/python3 3dgs-env)
  virtualenv --python PATH-TO-PYTHON-VERSION-BINARY 3dgs-env source 3dgs-env/bin/activate
Install project dependencies.
```
 pip install -r requirements.txt
```
Note: The next few steps will deploy resources. You can use aws sts get-caller-identity to confirm your IAM principal and AWS account before proceeding.
Bootstrap your environment, replacing the AWS account ID and Region below.
```
 cdk bootstrap aws://101010101010/us-east-1
```
Deploy Infrastructure Stack.
```
 cdk deploy GSWorkflowBaseStack --require-approval never --outputs-file outputs.json --exclusively
```
NOTE: To view the created resource names, you can view them in CloudFormation console under Outputs, or in deployment/cdk/outputs.json
Figure 3: AWS CDK Base Stack Output
Deploy Post Deployment Stack.
```
 cdk deploy GSWorkflowPostDeployStack --require-approval never --exclusively
```
NOTE: The project take up to an hour to build the container and deploy. See below for Amazon SNS Notification details.
Figure 4: AWS CDK Post Deploy Stack Output

[OPTION #2: Terraform instructions] Deploy backend infrastructure

Install Terraform (Terraform commonly looks for AWS credentials in a shared credentials file like the one created by the AWS CLI configuration or environment variables; see AWS Provider for more info)
From the project root, navigate to the deployment/terraform/ folder.
Open terraform.tfvars and enter values for the following fields:
- account_id - target 12-digit AWS account number
- region - target AWS region (e.g. us-west-2)
- admin_email - valid email address used to create a user for the SNS email notification
- maintain_s3_objects_on_stack_deletion - whether to keep the S3 objects upon stack deletion or not
- deployment_phase - used to select the deployment phase (“base” or “post”) – No need to change this directly, use the terraform -var deployment_phase flag to set
Note: The next few steps deploys resources. You can use aws sts get-caller-identity to confirm your IAM principal and AWS account before proceeding.
Initialize Terraform.
```
 terraform init
```
Deploy the Terraform base environment.
```
 terraform apply -var "deployment_phase=base" -var-file=terraform.tfvars -target=module.infra -auto-approve
```
- Scrolling up will reveal the created terraform execution plan which lists the resources to be created.
NOTE: To view the created resource names, you can view them in CloudFormation console under Outputs, or in deployment/terraform/outputs.json
Figure 5: Terraform Base Stack Output
Deploy the Terraform post deployment environment.
```
 terraform apply -var "deployment_phase=post" -var-file=terraform.tfvars -auto-approve
```
NOTE: The project takes approximately an hour to build the container and deploy
Figure 6: Terraform Post Deploy Stack Output

Post deployment requirements

After deployment, the admin email provided in the deployment configuration will receive an email subscription notification.
Figure 7: SNS Subscription Email for jobs
Please enable notifications by clicking on the link in the email.
Figure 8: SNS Subscription Confirmation

Submitting a job

Job submission options

To generate a splat, the backend requires a video or .zip of images and a unique metadata (.json) file to be uploaded to Amazon S3. A sample video can be used to immediately test the solution located at assets/input/BenchMelb.mov

For all options below, be sure to fill in the appropriate options and confirm you are authenticated with AWS. The default options will work for videos orbitting an object.

OPTION A. User interface with S3 library browser and 3D viewer

A Gradio interface is included in this repo at source/Gradio/. Please follow the directions below to use it:
- Open a console/command window of a machine that has the repo deployed
- Change directories to the repo under the source/Gradio/ directory
- Enter the following commands
```
  python -m venv venv
  source venv/bin/activate # On Windows, use: venv\Scripts\activate
  pip install -r requirements.txt
```
- Configure the application - configure the Gradio Application to use the created bucket:
  - Open generate_splat_gradio.py in a text editor
  - Input the S3 bucket name into the self.s3_bucket = "" field
  - Save the file and exit
- Authenticate with AWS
  - Confirm authenticated as the correct IAM principal and in the correct AWS account (see get-caller-identity for more info).
    aws sts get-caller-identity
- Start the Gradio interface:
```
  cd source/Gradio
  python generate_splat_gradio.py
```
- Open your web browser and navigate to the URL displayed in the terminal (typically http://0.0.0.0:7860 or use the local or public IP of the machine running the above script)
  Figure 9: Gradio App: AWS Configuration
- Navigate to the Job Settings tab in the app and configure the job accordingly (The default settings will work for a video with the camera orbitting an object)
  Figure 10: Gradio App: Job Settings
- Navigate to the Job Submit tab in the app, input the media, then submit the job to AWS
  Figure 11: Gradio App: Job Submission

OPTION B. Python script and S3 upload

Use the source/generate_splat.py file to submit the artifacts and upload the media and json metafile directly into the S3 bucket.
The metadata file can be created manually following the structure documented above or created automatically and submitted using source/generate-splat.py.
Modify the script contents to output a valid metadata file before uploading your media to Amazon S3.
Using the AWS Management Console or AWS CLI, follow the instructions below:
1. Choose an S3 prefix {inputPrefix} in {bucketName} for your media files and create a folder {bucketName}/{inputPrefix}. The {bucketName} is obtained from both the Terraform/CDK console output and the outputs.json file within the deployment/terraform or deployment/cdk directory.
2. Upload a video (.mp4 or .mov) into {bucketName}/{inputPrefix}
3. Submit a unique UUID metadata json file
  - Use utility to create and submit job
    - Open source/generate_splat.py in your favorite text editor
    - Fill in the top section of the script ensuring you enter the bucket name from the deployment configuration output and media filename in it and save it.
    - Open a shell session and run the script from a machine that has AWS access to PUT into the {bucketName}/{inputPrefix} location.
      python generate_splat.py
      OR
  - Manually create and submit metadata file
    - Create a file locally uuid.json
    - Open the uuid.json file in your favorite editor
    - Copy the metadata json block above and change the parameters to suit your use case
    - Save the metadata file locally
    - Using AWS credentials, log into the AWS Console in the account and Region specified in CDK/Terraform
    - Navigate to {bucketName}/{s3TriggerKey} in the Amazon S3 console. {s3TriggerKey} is obtained from the CDK/Terraform configuration
    - Upload the uuid.json file into s3 at the {bucketName}/upload-workflow location
      Note: each metadata file needs to have a unique UUID both on the filename and inside of the json file
Note: If you do not have the storage bucket name noted down, it can be found in the Terraform outputs.json file or CDK outputs file depending on your deployment method.

An example metadata file is below.

Example json metadata file for a video input and below configuration: 1a51caa6-1f6a-4a73-8f94-475b2ae9b04e.json

    
        {
            "uuid": "1a51caa6-1f6a-4a73-8f94-475b2ae9b04e",
            "instanceType": "ml.g5.4xlarge",
            "logVerbosity": "info",
            "s3": {
                "bucketName": "3dgs-bucket-*****",
                "inputPrefix": "workflow-input",
                "inputKey": "amazon-spheres.MOV",
                "outputPrefix": "workflow-output"
            },
            "videoProcessing": {
                "maxNumImages": "300"
            },
            "imageProcessing": {
                "filterBlurryImages": true
            },
            "sfm": {
                "enable": true,
                "softwareName": "glomap",
                "enableEnhancedFeatureExtraction": false,
                "matchingMethod": "sequential",
                "posePriors": {
                    "usePosePriorColmapModelFiles": false,
                    "usePosePriorTransformJson": {
                        "enable": false,
                        "sourceCoordinateName": "arkit",
                        "poseIsWorldToCam": true
                    }
                }
            },
            "training": {
                "enable": true,
                "maxSteps": "15000",
                "model": "splatfacto",
                "enableMultiGpu": false,
                "rotateSplat": true
            },
            "sphericalCamera": {
                "enable": false,
                "cubeFacesToRemove": [],
                "optimizeSequentialFrameOrder": true
            },
            "segmentation": {
                "removeBackground": false,
                "backgroundRemovalModel": "u2net",
                "maskThreshold": "0.6",
                "removeHumanSubject": false
            }
        }

Note: a unique id uuid is required for each submission. This will be the filename and occupy the “uuid” field below

Understanding the configuration and capabilities

This Guidance is built with a variety of open source tools that can be used for various use cases. Because of this, many options are contained in this Guidance. The following is a high-level overview of each option and its applicability:

Workflow input:
- Video (.mov or .mp4)
- Archive (.zip) of images (.png or .jpeg)
- Archive (.zip) of pose priors and images (transforms.json, colmap model files)
Workflow output: .ply and .spz, archive of project files (images, point cloud, metadata)
UUID: A unique identifier used by backend system to record individual requests in DynamoDB
Instance type: The EC2 compute resource to use for the workflow. Currently, these instance types are tested and supported:
- ml.g5.4xlarge (recommended for <500 4k images)
- ml.g5.8xlarge (recommended for <500 4k images)
- ml.g6e.4xlarge (for large datasets (e.g. >500 4k images or 3DGRT))
- ml.g6e.8xlarge (for large datasets (e.g. >500 4k images or 3DGRT))
S3:
- Bucket name: The name of the S3 bucket that was deployed by CDK/Terraform. This is an output from the deployment.
- Input prefix: The S3 prefix (initial directory minus the job prefix) for the input media
- Input key: The S3 key for the input media
- Output prefix: The S3 prefix (initial directory minus the job prefix) for the output files
- S3 Job prefix: The S3 prefix (initial directory) for the job json files
Video processing:
- Max number of images: (integer), the maximum number of images to use when a video is given as input. If using a .zip file with images or pose priors, this parameter will be ignored.
Image processing:
- Filter blurry images: (boolean), whether to remove blurry images from the dataset. If using a .zip file with pose priors, this parameter will be ignored.
Structure from motion (SfM):
- Enable: true or false, Whether to enable SfM or not. Future plans will enable input of SfM output
- Software name: colmap or glomap, software to use for the triangulation of the mapper
- Enable enhanced feature extraction: true or false, whether to enable enhanced feature extraction which uses estimate_affine_shape and domain_size_pooling to enhance the feature matching
- Matching method:
  - sequential (best for videos or images that share overlapping features)
  - spatial (best to use for pose priors to take spatial orientation into account)
  - vocab (best for large datasets that are not sequentially bound e.g. <1000 images)
  - exhaustive (only use this method if dataset struggles to converge with other methods)
- Pose priors: in order to speed up reconstruction, camera poses associated with the images can be used as input. In particular, this feature accepts a .zip archive folder that has the same schema as NerfCapture or can be a .zip archive folder that contains images and sparse directories with Colmap model text files.
  At this time, depth images are not used in the splat process.
  Note: All image files must be sequentially named and padded (e.g. 001.png, 002.png, etc.)
  - Use pose prior Colmap model files: see Colmap model text files
    - The file schema for providing the already created Colmap model files looks like this
      Archive structure (.zip)
      archive.zip ├── images/ └── *.{png,jpg,jpeg} └── sparse/ ├── 0/ └── cameras.txt (camera intrinsics) └── images.txt (camera extrinsics/poses) └── points3D.txt (empty)
  - Use pose prior transform JSON: see NerfCapture
    Ensure the .zip contains both the transforms.json file and an /images directory with sequentially named RGB images.
    - Enable: (boolean), whether to enable using transforms.json file for pose priors.
    - Source coordinate name: (“arkit” or “arcore” or “opengl” or “opencv” or “ros”), the source coordinates used with pose priors
    - Pose is world-to-camera: (boolean), whether the source coordinates for pose priors are in world-to-camera (True) or camera-to-world (False)
    - The schema for transforms.json to input looks like this in case you would perform an alternate method to extract the image and poses.
      Note: Depth images do not need to be present in the .zip file, but the “depth_path” still needs to be filled so the extension is known. For example if image is images/3.png, then depth_path=images/3.depth.png and file_path=images/3. timestamp and depth_images are not currently used in the pipeline.
      Sample transforms.json contents
      { "frames": [ { "fl_y": 1363.47, "w": 1920, "fl_x": 1363.47, "cy": 728.95135, "timestamp": 309075.09929275, "depth_path": "images/3.depth.png", "transform_matrix": [ [ -0.98750305, 0.08610178, -0.13199921, 0.9487833 ], [ 0.024557188, 0.91140217, 0.41078326, -0.5495963 ], [ 0.1556736, 0.40240824, -0.9021269, -0.4476293 ], [ 0, 0, 0, 1 ] ], "file_path": "images/3", "cx": 956.5136, "h": 1440 }, ... ] }
    - Parameter definitions
      Parameter defintions
      "cx": camera sensor center on x-axis in pixels "cy": camera sensor center on y-axis in pixels "fl_x": focal length on x-axis in pixels "fl_y": focal length on y-axis in pixels "w": camera resolution width in pixels "h": camera resolution height in pixels "file_path": this will look like this: "images/13" if image filename is images/13.png "depth_path": this will look like this: "images/13.depth.png" if image filename is images/13.depth.png (even if you dont have depth images, fill this in with correct extension) "transform_matrix": 4x4 pose matrix "timestamp": seconds (can be absolute or use epoch)
    - Archive structure (.zip)
      Archive structure (.zip)
      archive.zip ├── transforms.json └── images/ └── *.{png,jpg,jpeg} # Image files, depth images need "depth" in file name
Training:
- Enable: true or false, whether to enable 3DGS training or not. Future plans will enable user to only perform SfM
- Maximum steps: (integer), The maximum training steps to use while training the splat
- Splat model: splatfacto, splatfacto-big, splatfacto-w-light, nerfacto, the GS model to use for training
  - Pointers:
    - splatfacto: a great, generalized model that is perfect to start with if you are unsure what model to choose
    - splatfacto-big: high quality model that should be used to output feature-rich scenes and objects. This will yield a larger .ply file.
    - splatfacto-w-light: if wanting to achieve superior quality, while still pruning unwanted gaussians, use this one. This model will take the longest time, but the output file size will be much less, while still upholding quality.
    - splatfacto-mcmc: the current SotA that balances small training time with high quality output
    - nerfacto: used for testing and comparisons between NeRF and GS. The output will be a NeRF. Beware, this model will need more images than GS in order to maintain higher quality.
    - 3dgut: used for enabling Distorted Cameras and Secondary Rays in Gaussian Splatting. Great for fisheye camera input.
    - 3dgrt: used for 3D Gaussian Ray Tracing and Fast Tracing of Particle Scenes. Great for highly detailed scenes at the cost of processing power and time
- Rotate splat: true or false, whether to rotate the output splat for the Gradio 3D Model viewer coordinate system (set to true will rotate both the .ply and .spz)
Spherical camera:
- Enable: true or false, whether to enable 360 camera support or not
- Cube faces to remove: “[‘back’, ‘down’, ‘front’, ‘left’, ‘right’, ‘up’]”, a list of cube faces to remove from the spherical image. This is great for cropping out people or objects from the 360 image.
  Note: The above configurations were tested with an Insta360 ONE X2, exporting frame(s) in equirectangular format, 9:16 ratio, 5.7k resolution, and 30 frames per second. The captures were taken with the camera display aimed toward the person capturing the frame(s).
Figure 12: Equirectangular cube map views from a spherical camera

Figure 13: Example: Enable Spherical Camera = true, Cube Faces to Remove = "['back', 'down']"
Segmentation:
- Remove background: true or false, whether to remove the background when input an object (not scene) or not
- Background removal model: “u2net”, “u2net-human”, or “sam2” the background removal model to use
  Note: The sam2 model can only be used on video at this time
- Remove human subject: true or false, whether to remove humans from the scene or not. This can be combined with other removal methods such as background removal and cube face removal.
Figure 14: Background removal using SAM2

Monitoring a job

The training progress can be monitored a few different ways:

An SNS notification will be sent once the 3D reconstruction is complete (usually 20-80 min depending on the model, quality, or time constraints)
Figure 15: Job successful completion email notification using SNS

Figure 16: Job error notification using SNS
SageMaker
- Using the AWS Management Console, navigate to SageMaker -> Training -> Training Jobs.
  Figure 17: Amazon SageMaker Training Jobs Selection
- Enter the uuid in the search bar, and click on the entry to view status and monitor metrics/logs.
  Figure 18: Amazon SageMaker Training Job Status
  
  Figure 19: Amazon SageMaker Training Job Monitoring
Step Functions
- Using the AWS Management Console, navigate to Step Functions -> State Machines.
  Figure 20: AWS Step Functions State Machine Selection
- Enter the identifier found in outputs.json in the search bar, and click the entry to view status and monitor the state of the workflow.
  Figure 21: AWS Step Functions State Machine Job Detail
CloudWatch
- Using the AWS Management Console, navigate to CloudWatch -> Log Groups.
  Figure 22: Amazon CloudWatch Log Group
- Enter /aws/sagemaker/TrainingJobs in the search bar, and click on the log group. Enter the uuid in the log stream search bar to inspect the log
  Figure 23: Amazon CloudWatch Training Logs

Viewing the job result

After training has completed on submitted media and an SNS email has been received, the splat result can be viewed in the Gradio user interface (see Job submission options, Option A. to launch). Once the Gradio app has been launched:

Navigate to the browser, enter the local IP of the machine you launched the Gradio app on (either the EC2 instance or host machine), and port for the app (e.g. http://01.23.456.789:7860)
Figure 24: Gradio Application
Inside the Gradio app, navigate to the 3D Viewer & Library tab
Figure 25: 3D Viewer & Library
Drag and drop or select the object you would like to view. There is a sample splat located at source/Gradio/favorites/wolf.spz and can be viewed by clicking on the favorite button titled “wolf.spz”.
Figure 26: Viewing Local 3D Objects
To select and view items from S3, click “Refresh Input Contents”, and the table should populate with completed items in S3. Select an item (row) you would like to view and press “View Selected”.
Figure 27: Refreshing Table Items
Splats can be either downloaded or can be added to favorites once the item (row) is selected and the appropriate button is pressed.
Figure 28: Viewing S3 3D Objects
If you would like to edit the splat, first download it following the above step, navigate to the bottom of the 3D Viewer/Library tab and launch the SuperSplat viewer by clicking on the button. This will open up the editor in a different browser tab and allow you to drag-and-drop the downloaded splat into the editor. At the time of authoring this, the .spz format was not supported by the SuperSplat editor, but you can use the .ply file instead.
Figure 29: Launching the SuperSplat Editor

Figure 30: Editing Splats with SuperSplat

Tips

General

The default settings will enable you to process a video that has the camera orbitting an object. Use the assets/input/BenchMelb.mov sample as an example of capturing objects using video.
SAM2 segmentation only works on video input, otherwise use u2net

SfM Convergence

There is a possibility that your input media is not sufficient enough to reconstruct the scene. This will cause the camera pose estimation process to not converge. This typically occurs when:

Image Quality Issues:
- Insufficient overlap between consecutive frames
- Motion blur in images
- Poor lighting conditions
- Low image resolution
Scene Characteristics:
- Not enough distinctive features in the scene
- Highly reflective or transparent surfaces
- Uniform/textureless areas
- Dynamic objects or movement in scene
Camera Motion:
- Too rapid camera movement
- Large gaps in viewpoints
- Irregular camera paths

Recommendations:

Image Capture:
- Ensure 60-80% overlap between consecutive frames
- Move camera slowly and steadily
- Maintain consistent lighting
- Capture higher resolution images
- Avoid motion blur
Scene Setup:
- Add more distinctive features to the scene
- Ensure adequate and consistent lighting
- Avoid highly reflective surfaces
- Remove moving objects if possible
Processing:
- Try reducing the number of input images
- Consider using a different subset of images
- Verify image quality before processing
General:
- Use video input when possible with sequential matching
- If pose data is available from the images, use pose-priors and spatial matching
- If your environment is featureless, use pose data to help SfM converge
- Colmap is an incremental mapper, while Glomap is a global mapper
- Incremental SfM (COLMAP)/Sequential approach:
  - Summary: Starts with a pair of images, estimates their poses, then incrementally adds one image at a time
  - Process: Initialize → Add image → Bundle adjustment → Repeat
  - Advantages: More robust to outliers, handles challenging scenes better
  - Disadvantages: Slower, can accumulate drift over long sequences
- Global SfM (GLOMAP)/Simultaneous approach:
  - Summary: Estimates all camera poses at once using global optimization
  - Process: Extract features → Match all pairs → Solve for all poses simultaneously
  - Advantages: Faster, no drift accumulation, better for well-connected image sets
  - Disadvantages: Less robust to outliers, requires good feature matches across many images

Spherical Camera

When scanning outside-in, similar to scanning objects in an orbital path, a monocular 4K camera is all you need.
When capturing spaces inside-out (environments, not objects), we recommend using a spherical camera to gather imagery in 360 degrees.
Using a spherical camera will greatly increase the number of input images without manual work and enable SfM to more effectively converge.
At the time of writing this, Colmap requires the input images to be in perspective. For this, we have implemented a robust algorithm that will automatically transform your equirectangular video/images into perspective images.
It is sometimes handy to remove views from the 360 image due to possibly camera person holding the camera. The “remove faces” option allows you to mask a view from the cubemap so the feature will not be in the output.
We have added a feature to optimize the cubemap views of the image sequence using connective images and view nodes to help SfM converge. Please see the /source/container/src/pipeline/spherical/equirectangular_to_perspective.py script for more details.
Be careful enabling optimizing cubemap views and masking views other than up or down as the algorithm leans on the horizontal views for connectivity.

Uninstall the Guidance

Note: Be aware that this will also delete any generated assets in the S3 bucket as well, if not following directions below:
If not already done, delete the EC2 instance and associated resources (IAM role, key pair, security group) used to run the container script.
Delete the backend.
For Terraform:
Navigate to deployment/terraform
If you would like to keep your S3 assets, issue this command:
  sed -i 's/\r$//' ./orphan_s3.sh
  chmod +x orphan_s3.sh
  ./orphan_s3.sh
Figure 30: Orphan S3 Bucket from Stack
Then, issue the destroy command:
  terraform destroy
Figure 31: Destroy Terraform Resources
For CDK:
Navigate to deployment/cdk and issue the destroy command.
To destroy all resources
  cdk destroy --all -c destroy=true
To destroy individual stacks
  cdk destroy GSWorkflowBaseStack -c destroy=true
  cdk destroy GSWorkflowPostDeployStack -c destroy=true
Successful deletion output
   ✅  GSWorkflowBaseStack: destroyed
   ✅  GSWorkflowPostDeployStack: destroyed

Contributors

Eric Cornwell, Sr. Spatial Compute SA
Dario Macangano, Sr. Worldwide Visual Compute SA
Stanford Lee, Technical Account Manager
Daniel Zilberman, Sr. SA AWS Technical Guidances

Notices

Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.