Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS
Summary: This implementation guide provides an overview of the Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS, its reference architecture and components, considerations for planning the deployment, and configuration steps for deploying the Guidance name to Amazon Web Services (AWS). This guide is intended for solution architects, business decision makers, DevOps engineers, data scientists, and cloud professionals who want to implement Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS in their environment.
Overview
Guidance for Open Source 3D Reconstruction Toolbox for Gaussian Splats on AWS provides an end-to-end, pipeline-based solution on AWS to reconstruct 3D scenes or objects from images or video inputs. The infrastructure can be deployed with AWS Cloud Development Kit (AWS CDK) or Terraform, leveraging infrastructure-as-code.
Use cases
Once deployed, the Guidance features a full 3D reconstruction back-end system with the following customizable components or pipelines:
Media Ingestion: Process videos or collections of images as input
Image Processing: Automatic filtering, enhancement, and preparation of source imagery (for example, background removal)
Structure from Motion (SfM): Camera pose estimation and initial 3D point cloud generation
Gaussian Splat Training: Optimization of 3D Gaussian primitives to represent the scene using AI/ML
Export and Delivery: Generation of the final 3D asset in standard formats for easy viewing and notification by email
By deploying this Guidance, users gain access to a flexible infrastructure that handles the entire 3D reconstruction process programatically, from media upload to final 3D model delivery, while being highly modular through its componentized pipeline-based approach. This Guidance addresses the significant challenges organizations face when trying to create photorealistic 3D content—traditionally a time-consuming, expensive, and technically complex process requiring specialized skills and equipment.
Custom GS pipeline container
A Docker container image contains all of the 3D reconstruction tools for Gaussian Splatting in this project. This container has a Dockerfile, main.py, and helper script files and open source libraries under the source/container directory. The main script processes each request from the Amazon SageMaker Training Job invoke message and saves the result to S3 upon successful completion.
The list of open source libraries that make this project possible include:
A Gradio UI is provided in this Guidance to assist in the generation process. This UI enables users to submit gaussian splatting jobs, as well as view the generated splat in a 3D viewer.
Features and benefits
This 3D reconstruction toolbox on the cloud allows customers to quickly and easily build photorealistic 3D assets from images or video using the best-in-class open source libraries that have commercial friendly licensing. This guide includes an automated end-to-end experience from deployment, submitting requests, and viewing the 3D result in a 3D web browser.
Architecture overview
This Guidance will:
create the infrastructure required to create a Gaussian splat from a video or set of images.
create the mechanism to run the code and perform 3D reconstruction.
enable a user to create a 3D gaussian splat using open source tools and AWS by uploading a video (.mp4 or .mov) or images (.png or .jpg) and metadata (.json) into Amazon Simple Storage Service (Amazon S3).
provide a 3D viewer for viewing the photo-realistic effects and performant nature of gaussian splats.
Deployment architecture diagram
Figure 1: 3D Reconstruction Toolbox for Gaussian Splats on AWS Deployment Architecture
Deployment architecture steps
An administrator deploys the Guidance to an AWS account and Region using AWS Cloud Development Kit (AWS CDK) or Terraform.
The Base AWS CloudFormation stack to deploy will create all the AWS resources needed to host the Guidance. This includes: an Amazon Simple Storage Service (Amazon S3) bucket, AWS Lambda functions, an Amazon DynamoDB table, necessary AWS Identity and Access Management (IAM) permissions, and an Amazon Elastic Container Registry (Amazon ECR) image registry. Additionally, it includes an AWS Step Functions state machine resource ID in Parameter Store, a capability of AWS Systems Manager, and it creates an Amazon Simple Notification Service (SNS) topic.
Once the Base CloudFormation stack has been deployed, deploy the Post Deploy CloudFormation stack. That stack will build a Docker container and push it to the Amazon ECR registry. It will also build and push the pre-processing models used during training, such as for background removal, into the S3 bucket.
Architecture diagram
Figure 2: 3D Reconstruction Toolbox for Gaussian Splats on AWS Reference Architecture
Architecture steps
The user authenticates to IAM using AWS Tools and SDKs.
The input is uploaded to a dedicated S3 job bucket location. This can be done using a Gradio interface and AWS Software Development Kit (AWS SDK).
Optionally, the Guidance supports external job submission by uploading a ‘.JSON’ job configuration file and media files into a designated S3 job bucket location.
The job JSON file uploaded to the S3 job bucket will trigger an Amazon SNS message that will invoke the initialization Job Trigger Lambda function.
The Job Trigger Lambda function will perform input validation and set appropriate variables for the Step Functions State Machine.
The workflow job record will be created in the DynamoDB job table.
The Job Trigger Lambda function will invoke Step Functions State Machine to handle the entire workflow job.
An Amazon SageMaker AI Training Job will be submitted synchronously using the state machine built-in wait until completion mechanism.
The Amazon ECR container image and S3 job bucket model artifacts will be used to deploy a new container on a graphics processing unit (GPU) based compute node. The compute node instance type is determined by the job JSON configuration.
The container will run the entire pipeline as an Amazon SageMaker AI training job on a GPU compute node.
The Job Completion Lambda function will complete the workflow job by updating the job metadata in DynamoDB and using Amazon SNS to notify the user through email upon completion.
The internal workflow parameters are stored in Parameter Store during deployment to decouple the Job Trigger Lambda function and the Step Function State Machine.
Amazon CloudWatch logs and monitors the training jobs, surfacing possible errors to the user.
Securely store infrastructure resource ids in Parameter Store to aid in deployment and execution
Security
When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This shared responsibility model reduces your operational burden because AWS operates, manages, and controls the components including the host operating system, the virtualization layer, and the physical security of the facilities in which the services operate. For more information about AWS security, visit AWS Cloud Security.
All data is encrypted at rest and at transit within the AWS Cloud services in this Guidance
An Amazon S3 access logging bucket logs all access to the asset bucket
Input validation on the job configuration will flag any misconfigurations in the json file
Least priviledge access rights on service actions
Plan your deployment
Service quotas
Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account.
Quotas for AWS services in this Guidance
Service quotas - increases can be requested through the AWS Management Console, AWS command line interface (CLI), or AWS SDKs (see Accessing Service Quotas)
This Guidance runs SageMaker Training Jobs which uses a Docker container to run the training. This deployment guide walks through building a custom container image for SageMaker.
Depending on what instances you will be using to train on (configured during job submission, ml.g5.4xlarge is the default), you may need to adjust the SageMaker Training Jobs quota. This will be under the SageMaker service in Service Quotas named “training job usage”.
(Optional) You can optionally build and test this container locally (not running on SageMaker) on a GPU-enabled EC2 instance. If you plan to do this, increase the EC2 quota named “Running On-Demand G and VT instances” and/or “Running On-Demand P instances”, depending on the instance family you plan to use, to a desired maximum number of vCPUs for running instances of the target family. Note, this is vCPUs NOT number of instances like the SageMaker Training Jobs quota.
Cost
You are responsible for the cost of the AWS services used while running this Guidance. As of May 2025, the cost for running this Guidance with the default settings in the default AWS Region (US East 1(N. Virginia)) is approximately $278.33 per month for processing 100 requests.
We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.
Cost table
The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.
AWS Service
Dimensions
Cost [USD]
Amazon S3
Standard feature storage (input=200MB, output=2.5GB)
A CloudFormation template is given here to spin up a fresh, full-featured Ubuntu desktop
Prerequisites: Before you build the EC2 workstation stack, ensure the following resources are created in your AWS account and region of choice:
VPC
Follow these instructions if you do not have one. This will be where your EC2 will live. Ensure there is a public subnet available with internet access in order to pull the GitHub repositories.
Keypair
Follow these instructions if you do not have one. This is used to remote into the EC2 desktop.
Security Group
Follow these instructions to create a security group. Enable inbound NiceDCV using TCP/UDP port 8443 and SSH using port 22. Ensure your source IP address is the resource for all entries.
For Inbound rules, add:
Custom TCP, Port range=8443, source="My IP"
Custom UDP, Port range=8443, source="My IP"
SSH, Port range=22, source="My IP"
If you plan on using the included Gradio user interface to submit jobs, enable the default port so you can access the app through your local browser.
Custom TCP, Port range=7860, source="My IP"
Record the security group Id for later
Download the deep-learning-ubuntu-desktop.yaml file locally from the repo linked above
Open the AWS Console and navigate to the CloudFormation console
Select 'Create stack' -> 'With new resources'
On `Create Stack` page, select:
Choose an existing template
Choose Upload a template file
Select the deep-learning-ubuntu-desktop.yaml file downloaded earlier
On Specify stack details page, leave default values except for the following:
Stack Name: YOUR-CHOICE
AWSUbuntuAMIType: UbuntuPro2204LTS
DesktopAccessCIDR: YOUR-PUBLIC-IP-ADDRESS/32
DesktopInstanceType: g5.2xlarge
DesktopSecurityGroupId: SG-ID-FROM-ABOVE
DesktopVpcId: VPC-ID-FROM-ABOVE
DesktopVpcSubnetId: YOUR-PUBLIC-SUBNET-ID
KeyName: KEYNAME-FROM-ABOVE
(optional) S3Bucket: S3-BUCKET-WITH-CODE
Submit and monitor the stack creation in the CloudFormation console
On successful building of the stack, navigate to the EC2 console in the account and region the deployed stack is in
Locate the instance just created using the `Stack Name` entered above, select the instance, and select Actions->Security->Modify IAM Role
Record the current IAM role name
Navigate to the IAM Console in a separate browser tab or window
Under `Roles`, search for the role using the IAM role name identified above
Select the role by clicking on its name
In the permissions policies table, select Add permissions->Attach policies:
Attach the following AWS managed policies to the role
AmazonEC2ContainerRegistryFullAccess
AmazonS3FullAccess
AmazonSSMManagedInstanceCore
AWSCloudFormationFullAccess
IAMFullAccess
SSH into the workstation using the EC2 public IP (found in the EC2 console), security group, and SSH terminal
Once connected to the EC2 workstation, perform the following commands to update the OS and password
sudo apt update
sudo passwd ubuntu
The EC2 will reboot automatically while updating is being performed in the background
The EC2 setup is complete once the message echo 'NICE DCV server is enabled!' is shown when performing the following command
tail /var/log/cloud-init-output.log
Once the EC2 has the enabled NICE DCV message, use the NICE DCV client, EC2 public IP address, username 'ubuntu' and Ubuntu password set earlier to remotely connect to the EC2 instance.
Be sure to not upgrade the OS (even when prompted) as it will break critical packages. Only choose to enable security updates.
Open the Visual Code program in the EC2 instance by locating it in the Application library
Install and configure the AWS CLI (if not using the recommended EC2 deployment below)
Docker is required to build the container image that is used for training the splat. This will require at least 20GB of empty disk space on your deployment machine.
Note: If building on Windows or MacOS and receive the below error, set the number of logical processors to 1. Also, it is recommended to use the EC2 Ubuntu deployment method below to mitigate this error.
#17 [13/28] RUN pip install --no-cache-dir -r ./requirements_2.txt#17 2.026 Processing ./diff-gaussian-rasterization#17 2.029 Preparing metadata (setup.py): started#17 8.350 Preparing metadata (setup.py): finished with status 'done'#17 8.365 Processing ./diff-surfel-rasterization#17 8.367 Preparing metadata (setup.py): started#17 12.41 Preparing metadata (setup.py): finished with status 'done'#17 12.42 Building wheels for collected packages: diff-gaussian-rasterization, diff-surfel-rasterization#17 12.42 Building wheel for diff-gaussian-rasterization (setup.py): started
ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF
Download and install Node Package Manager (NPM) if not already on your system. The LTS version is recommended.
For Ubuntu EC2
sudo apt update
sudo apt install npm nodejs -ysudo chown-R$USER PATH-TO-THIS-REPO
sudo npm install-g n
sudo n stable
For Windows/MacOS
npm install-g npm@latest
Confirm authenticated as the correct IAM principal and in the correct AWS account (see get-caller-identity for more info).
aws sts get-caller-identity
Set the AWS Region, replacing the Region below with your deployed region
Windows
set AWS_REGION=us-east-1
Linux/Mac
export AWS_REGION=us-east-1
Deployment process overview
Before you launch the Guidance, review the cost, architecture, security, and other considerations discussed in this guide. Follow the step-by-step instructions in this section to configure and deploy the Guidance into your account.
Note: The next few steps will deploy resources. You can use aws sts get-caller-identity to confirm your IAM principal and AWS account before proceeding.
Bootstrap your environment, replacing the AWS account ID and Region below.
cdk bootstrap aws://101010101010/us-east-1
Deploy Infrastructure Stack.
cdk deploy GSWorkflowBaseStack --require-approval never --outputs-file outputs.json --exclusively
NOTE: To view the created resource names, you can view them in CloudFormation console under Outputs, or in deployment/cdk/outputs.json
Figure 3: AWS CDK Base Stack Output
Deploy Post Deployment Stack.
cdk deploy GSWorkflowPostDeployStack --require-approval never --exclusively
NOTE: The project take up to an hour to build the container and deploy. See below for Amazon SNS Notification details.
Install Terraform (Terraform commonly looks for AWS credentials in a shared credentials file like the one created by the AWS CLI configuration or environment variables; see AWS Provider for more info)
From the project root, navigate to the deployment/terraform/ folder.
Open terraform.tfvars and enter values for the following fields:
account_id - target 12-digit AWS account number
region - target AWS region (e.g. us-west-2)
admin_email - valid email address used to create a user for the SNS email notification
maintain_s3_objects_on_stack_deletion - whether to keep the S3 objects upon stack deletion or not
deployment_phase - used to select the deployment phase (“base” or “post”) – No need to change this directly, use the terraform -var deployment_phase flag to set
Note: The next few steps deploys resources. You can use aws sts get-caller-identity to confirm your IAM principal and AWS account before proceeding.
NOTE: The project takes approximately an hour to build the container and deploy
Figure 6: Terraform Post Deploy Stack Output
Post deployment requirements
After deployment, the admin email provided in the deployment configuration will receive an email subscription notification.
Figure 7: SNS Subscription Email for jobs
Please enable notifications by clicking on the link in the email.
Figure 8: SNS Subscription Confirmation
Submitting a job
Job submission options
To generate a splat, the backend requires a video or .zip of images and a unique metadata (.json) file to be uploaded to Amazon S3. A sample video can be used to immediately test the solution located at assets/input/BenchMelb.mov
For all options below, be sure to fill in the appropriate options and confirm you are authenticated with AWS. The default options will work for videos orbitting an object.
OPTION A. User interface with S3 library browser and 3D viewer
A Gradio interface is included in this repo at source/Gradio/. Please follow the directions below to use it:
Open a console/command window of a machine that has the repo deployed
Change directories to the repo under the source/Gradio/ directory
Configure the application - configure the Gradio Application to use the created bucket:
Open generate_splat_gradio.py in a text editor
Input the S3 bucket name into the self.s3_bucket = "" field
Save the file and exit
Authenticate with AWS
Confirm authenticated as the correct IAM principal and in the correct AWS account (see get-caller-identity for more info).
aws sts get-caller-identity
Start the Gradio interface:
cd source/Gradio
python generate_splat_gradio.py
Open your web browser and navigate to the URL displayed in the terminal (typically http://0.0.0.0:7860 or use the local or public IP of the machine running the above script)
Figure 9: Gradio App: AWS Configuration
Navigate to the Job Settings tab in the app and configure the job accordingly (The default settings will work for a video with the camera orbitting an object)
Figure 10: Gradio App: Job Settings
Navigate to the Job Submit tab in the app, input the media, then submit the job to AWS
Figure 11: Gradio App: Job Submission
OPTION B. Python script and S3 upload
Use the source/generate_splat.py file to submit the artifacts and upload the media and json metafile directly into the S3 bucket.
The metadata file can be created manually following the structure documented above or created automatically and submitted using source/generate-splat.py.
Modify the script contents to output a valid metadata file before uploading your media to Amazon S3.
Using the AWS Management Console or AWS CLI, follow the instructions below:
Choose an S3 prefix {inputPrefix} in {bucketName} for your media files and create a folder {bucketName}/{inputPrefix}. The {bucketName} is obtained from both the Terraform/CDK console output and the outputs.json file within the deployment/terraform or deployment/cdk directory.
Upload a video (.mp4 or .mov) into {bucketName}/{inputPrefix}
Submit a unique UUID metadata json file
Use utility to create and submit job
Open source/generate_splat.py in your favorite text editor
Fill in the top section of the script ensuring you enter the bucket name from the deployment configuration output and media filename in it and save it.
Open a shell session and run the script from a machine that has AWS access to PUT into the {bucketName}/{inputPrefix} location.
python generate_splat.py
OR
Manually create and submit metadata file
Create a file locally uuid.json
Open the uuid.json file in your favorite editor
Copy the metadata json block above and change the parameters to suit your use case
Save the metadata file locally
Using AWS credentials, log into the AWS Console in the account and Region specified in CDK/Terraform
Navigate to {bucketName}/{s3TriggerKey} in the Amazon S3 console. {s3TriggerKey} is obtained from the CDK/Terraform configuration
Upload the uuid.json file into s3 at the {bucketName}/upload-workflow location
Note: each metadata file needs to have a unique UUID both on the filename and inside of the json file
Note: If you do not have the storage bucket name noted down, it can be found in the Terraform outputs.json file or CDK outputs file depending on your deployment method.
An example metadata file is below.
Example json metadata file for a video input and below configuration: 1a51caa6-1f6a-4a73-8f94-475b2ae9b04e.json
Note: a unique id uuid is required for each submission. This will be the filename and occupy the “uuid” field below
Understanding the configuration and capabilities
This Guidance is built with a variety of open source tools that can be used for various use cases. Because of this, many options are contained in this Guidance. The following is a high-level overview of each option and its applicability:
Workflow input:
Video (.mov or .mp4)
Archive (.zip) of images (.png or .jpeg)
Archive (.zip) of pose priors and images (transforms.json, colmap model files)
Workflow output: .ply and .spz, archive of project files (images, point cloud, metadata)
UUID: A unique identifier used by backend system to record individual requests in DynamoDB
Instance type: The EC2 compute resource to use for the workflow. Currently, these instance types are tested and supported:
ml.g5.4xlarge (recommended for <500 4k images)
ml.g5.8xlarge (recommended for <500 4k images)
ml.g6e.4xlarge (for large datasets (e.g. >500 4k images or 3DGRT))
ml.g6e.8xlarge (for large datasets (e.g. >500 4k images or 3DGRT))
S3:
Bucket name: The name of the S3 bucket that was deployed by CDK/Terraform. This is an output from the deployment.
Input prefix: The S3 prefix (initial directory minus the job prefix) for the input media
Input key: The S3 key for the input media
Output prefix: The S3 prefix (initial directory minus the job prefix) for the output files
S3 Job prefix: The S3 prefix (initial directory) for the job json files
Video processing:
Max number of images: (integer), the maximum number of images to use when a video is given as input. If using a .zip file with images or pose priors, this parameter will be ignored.
Image processing:
Filter blurry images: (boolean), whether to remove blurry images from the dataset. If using a .zip file with pose priors, this parameter will be ignored.
Structure from motion (SfM):
Enable: true or false, Whether to enable SfM or not. Future plans will enable input of SfM output
Software name: colmap or glomap, software to use for the triangulation of the mapper
Enable enhanced feature extraction: true or false, whether to enable enhanced feature extraction which uses estimate_affine_shape and domain_size_pooling to enhance the feature matching
Matching method:
sequential (best for videos or images that share overlapping features)
spatial (best to use for pose priors to take spatial orientation into account)
vocab (best for large datasets that are not sequentially bound e.g. <1000 images)
exhaustive (only use this method if dataset struggles to converge with other methods)
Pose priors: in order to speed up reconstruction, camera poses associated with the images can be used as input. In particular, this feature accepts a .zip archive folder that has the same schema as NerfCapture or can be a .zip archive folder that contains images and sparse directories with Colmap model text files.
At this time, depth images are not used in the splat process.
Note: All image files must be sequentially named and padded (e.g. 001.png, 002.png, etc.)
Ensure the .zip contains both the transforms.json file and an /images directory with sequentially named RGB images.
Enable: (boolean), whether to enable using transforms.json file for pose priors.
Source coordinate name: (“arkit” or “arcore” or “opengl” or “opencv” or “ros”), the source coordinates used with pose priors
Pose is world-to-camera: (boolean), whether the source coordinates for pose priors are in world-to-camera (True) or camera-to-world (False)
The schema for transforms.json to input looks like this in case you would perform an alternate method to extract the image and poses.
Note: Depth images do not need to be present in the .zip file, but the “depth_path” still needs to be filled so the extension is known. For example if image is images/3.png, then depth_path=images/3.depth.png and file_path=images/3. timestamp and depth_images are not currently used in the pipeline.
"cx": camera sensor center on x-axis in pixels
"cy": camera sensor center on y-axis in pixels
"fl_x": focal length on x-axis in pixels
"fl_y": focal length on y-axis in pixels
"w": camera resolution width in pixels
"h": camera resolution height in pixels
"file_path": this will look like this: "images/13" if image filename is images/13.png
"depth_path": this will look like this: "images/13.depth.png" if image filename is images/13.depth.png (even if you dont have depth images, fill this in with correct extension)
"transform_matrix": 4x4 pose matrix
"timestamp": seconds (can be absolute or use epoch)
Archive structure (.zip) Archive structure (.zip)
archive.zip
├── transforms.json
└── images/
└── *.{png,jpg,jpeg} # Image files, depth images need "depth" in file name
Training:
Enable: true or false, whether to enable 3DGS training or not. Future plans will enable user to only perform SfM
Maximum steps: (integer), The maximum training steps to use while training the splat
Splat model: splatfacto, splatfacto-big, splatfacto-w-light, nerfacto, the GS model to use for training
Pointers:
splatfacto: a great, generalized model that is perfect to start with if you are unsure what model to choose
splatfacto-big: high quality model that should be used to output feature-rich scenes and objects. This will yield a larger .ply file.
splatfacto-w-light: if wanting to achieve superior quality, while still pruning unwanted gaussians, use this one. This model will take the longest time, but the output file size will be much less, while still upholding quality.
splatfacto-mcmc: the current SotA that balances small training time with high quality output
nerfacto: used for testing and comparisons between NeRF and GS. The output will be a NeRF. Beware, this model will need more images than GS in order to maintain higher quality.
3dgut: used for enabling Distorted Cameras and Secondary Rays in Gaussian Splatting. Great for fisheye camera input.
3dgrt: used for 3D Gaussian Ray Tracing and Fast Tracing of Particle Scenes. Great for highly detailed scenes at the cost of processing power and time
Rotate splat: true or false, whether to rotate the output splat for the Gradio 3D Model viewer coordinate system (set to true will rotate both the .ply and .spz)
Spherical camera:
Enable: true or false, whether to enable 360 camera support or not
Cube faces to remove: “[‘back’, ‘down’, ‘front’, ‘left’, ‘right’, ‘up’]”, a list of cube faces to remove from the spherical image. This is great for cropping out people or objects from the 360 image.
Note: The above configurations were tested with an Insta360 ONE X2, exporting frame(s) in equirectangular format, 9:16 ratio, 5.7k resolution, and 30 frames per second. The captures were taken with the camera display aimed toward the person capturing the frame(s).
Figure 12: Equirectangular cube map views from a spherical camera
Figure 13: Example: Enable Spherical Camera = true, Cube Faces to Remove = "['back', 'down']"
Segmentation:
Remove background: true or false, whether to remove the background when input an object (not scene) or not
Background removal model: “u2net”, “u2net-human”, or “sam2” the background removal model to use
Note: The sam2 model can only be used on video at this time
Remove human subject: true or false, whether to remove humans from the scene or not. This can be combined with other removal methods such as background removal and cube face removal.
Figure 14: Background removal using SAM2
Monitoring a job
The training progress can be monitored a few different ways:
An SNS notification will be sent once the 3D reconstruction is complete (usually 20-80 min depending on the model, quality, or time constraints)
Figure 15: Job successful completion email notification using SNS
Figure 16: Job error notification using SNS
SageMaker
Using the AWS Management Console, navigate to SageMaker -> Training -> Training Jobs.
Figure 17: Amazon SageMaker Training Jobs Selection
Enter the uuid in the search bar, and click on the entry to view status and monitor metrics/logs.
Figure 18: Amazon SageMaker Training Job Status
Figure 19: Amazon SageMaker Training Job Monitoring
Step Functions
Using the AWS Management Console, navigate to Step Functions -> State Machines.
Figure 20: AWS Step Functions State Machine Selection
Enter the identifier found in outputs.json in the search bar, and click the entry to view status and monitor the state of the workflow.
Figure 21: AWS Step Functions State Machine Job Detail
CloudWatch
Using the AWS Management Console, navigate to CloudWatch -> Log Groups.
Figure 22: Amazon CloudWatch Log Group
Enter /aws/sagemaker/TrainingJobs in the search bar, and click on the log group. Enter the uuid in the log stream search bar to inspect the log
Figure 23: Amazon CloudWatch Training Logs
Viewing the job result
After training has completed on submitted media and an SNS email has been received, the splat result can be viewed in the Gradio user interface (see Job submission options, Option A. to launch). Once the Gradio app has been launched:
Navigate to the browser, enter the local IP of the machine you launched the Gradio app on (either the EC2 instance or host machine), and port for the app (e.g. http://01.23.456.789:7860)
Figure 24: Gradio Application
Inside the Gradio app, navigate to the 3D Viewer & Library tab
Figure 25: 3D Viewer & Library
Drag and drop or select the object you would like to view. There is a sample splat located at source/Gradio/favorites/wolf.spz and can be viewed by clicking on the favorite button titled “wolf.spz”.
Figure 26: Viewing Local 3D Objects
To select and view items from S3, click “Refresh Input Contents”, and the table should populate with completed items in S3. Select an item (row) you would like to view and press “View Selected”.
Figure 27: Refreshing Table Items
Splats can be either downloaded or can be added to favorites once the item (row) is selected and the appropriate button is pressed.
Figure 28: Viewing S3 3D Objects
If you would like to edit the splat, first download it following the above step, navigate to the bottom of the 3D Viewer/Library tab and launch the SuperSplat viewer by clicking on the button. This will open up the editor in a different browser tab and allow you to drag-and-drop the downloaded splat into the editor. At the time of authoring this, the .spz format was not supported by the SuperSplat editor, but you can use the .ply file instead.
Figure 29: Launching the SuperSplat Editor
Figure 30: Editing Splats with SuperSplat
Tips
General
The default settings will enable you to process a video that has the camera orbitting an object. Use the assets/input/BenchMelb.mov sample as an example of capturing objects using video.
SAM2 segmentation only works on video input, otherwise use u2net
SfM Convergence
There is a possibility that your input media is not sufficient enough to reconstruct the scene. This will cause the camera pose estimation process to not converge. This typically occurs when:
Image Quality Issues:
Insufficient overlap between consecutive frames
Motion blur in images
Poor lighting conditions
Low image resolution
Scene Characteristics:
Not enough distinctive features in the scene
Highly reflective or transparent surfaces
Uniform/textureless areas
Dynamic objects or movement in scene
Camera Motion:
Too rapid camera movement
Large gaps in viewpoints
Irregular camera paths
Recommendations:
Image Capture:
Ensure 60-80% overlap between consecutive frames
Move camera slowly and steadily
Maintain consistent lighting
Capture higher resolution images
Avoid motion blur
Scene Setup:
Add more distinctive features to the scene
Ensure adequate and consistent lighting
Avoid highly reflective surfaces
Remove moving objects if possible
Processing:
Try reducing the number of input images
Consider using a different subset of images
Verify image quality before processing
General:
Use video input when possible with sequential matching
If pose data is available from the images, use pose-priors and spatial matching
If your environment is featureless, use pose data to help SfM converge
Colmap is an incremental mapper, while Glomap is a global mapper
Incremental SfM (COLMAP)/Sequential approach:
Summary: Starts with a pair of images, estimates their poses, then incrementally adds one image at a time
Advantages: More robust to outliers, handles challenging scenes better
Disadvantages: Slower, can accumulate drift over long sequences
Global SfM (GLOMAP)/Simultaneous approach:
Summary: Estimates all camera poses at once using global optimization
Process: Extract features → Match all pairs → Solve for all poses simultaneously
Advantages: Faster, no drift accumulation, better for well-connected image sets
Disadvantages: Less robust to outliers, requires good feature matches across many images
Spherical Camera
When scanning outside-in, similar to scanning objects in an orbital path, a monocular 4K camera is all you need.
When capturing spaces inside-out (environments, not objects), we recommend using a spherical camera to gather imagery in 360 degrees.
Using a spherical camera will greatly increase the number of input images without manual work and enable SfM to more effectively converge.
At the time of writing this, Colmap requires the input images to be in perspective. For this, we have implemented a robust algorithm that will automatically transform your equirectangular video/images into perspective images.
It is sometimes handy to remove views from the 360 image due to possibly camera person holding the camera. The “remove faces” option allows you to mask a view from the cubemap so the feature will not be in the output.
We have added a feature to optimize the cubemap views of the image sequence using connective images and view nodes to help SfM converge. Please see the /source/container/src/pipeline/spherical/equirectangular_to_perspective.py script for more details.
Be careful enabling optimizing cubemap views and masking views other than up or down as the algorithm leans on the horizontal views for connectivity.
Uninstall the Guidance
Note: Be aware that this will also delete any generated assets in the S3 bucket as well, if not following directions below:
If not already done, delete the EC2 instance and associated resources (IAM role, key pair, security group) used to run the container script.
Delete the backend.
For Terraform:
Navigate to deployment/terraform
If you would like to keep your S3 assets, issue this command:
Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.