Cognition and State Modeling

DUSt3R

DUSt3R is a geometric 3D vision tool that reconstructs pointmaps, camera relationships, and aligned 3D structure from image pairs or multi-view image collections.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

Upload two or more images and DUSt3R predicts dense 3D pointmaps, confidence maps, and camera relationships without a traditional SfM preprocessing pipeline.

InputImage pair / image set

Output3D pointmaps, camera poses, confidence maps

Trigger TimingTriggered on demand from the source demo or local example command.

RuntimePython / PyTorch / Gradio demo

BeforeImage pair / image set

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

After3D pointmaps, camera poses, confidence maps

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/dust3r/examples/images/

PromptUse 512px ViT-Large checkpoint and global alignment

ExpectedAligned 3D point cloud, pair confidence maps, and camera pose estimates.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

imagesfile

Image pair or image collection used for reconstruction.

model_nameselectDUSt3R_ViTLarge_BaseDecoder_512_dpt

Pretrained checkpoint used for pointmap and confidence prediction.

image_sizeselect512

Input resolution used by the pretrained model.

global_alignmenttoggletrue

Optimizes multiple pair predictions into one coherent scene.

Output Explanation

pointmaps

Dense 3D points predicted for each image in a shared or alignable coordinate frame.

confidence

Per-pixel confidence values that help filter unreliable geometry.

camera_poses

Estimated camera relationships recovered during pair inference or global alignment.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Resources

GitHubhttps://github.com/naver/dust3r Demohttps://huggingface.co/spaces/naver/DUSt3R Code Downloadhttps://github.com/naver/dust3r/archive/refs/heads/main.zip Project Pagehttps://dust3r.europe.naverlabs.com/arXivhttps://arxiv.org/abs/2312.14132 DUSt3R Checkpointshttps://github.com/naver/dust3r#checkpoints

Deployment Notes

Clone the official DUSt3R repository and install the PyTorch/Gradio dependencies.
Download the official 512px pretrained checkpoint or let the demo resolve the model name.
Run demo.py for the local UI or call the inference and global alignment modules programmatically.
Export point clouds, confidence maps, and visualizations under tools/dust3r/runs/.

Relative Path Example

# Relative-path local entry for the DUSt3R tool folder
python tools/dust3r/demo.py   --model_name DUSt3R_ViTLarge_BaseDecoder_512_dpt   --local_network

# Local checkpoint example:
python tools/dust3r/demo.py   --weights tools/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth   --image_size 512

# Programmatic entry points:
# tools/dust3r/dust3r/inference.py
# tools/dust3r/dust3r/model.py
# tools/dust3r/dust3r/cloud_opt/
# tools/dust3r/visloc.py

# This page documents the path. The static page does not execute DUSt3R.

Expected Result Shape

{
  "tool": "dust3r",
  "status": "ok",
  "scene_state": [
    {
      "label": "Geometric 3D reconstruction",
      "score": 0.87,
      "output": "3D pointmaps, camera poses, confidence maps"
    }
  ],
  "timing": {
    "runtime": "All main results use the same 512px model; multi-view global alignment scales with image count and pair count. The supplement reports training on about 8.5M extracted image pairs.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/dust3r/runs/visualization.png",
    "raw_predictions": "tools/dust3r/runs/predictions.json"
  }
}

Paper figure

Academic Info

Paper identity and contribution summary.

TitleDUSt3R: Geometric 3D Vision Made Easy

AuthorsShuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

VenueCVPR 2024 / arXiv:2312.14132

ContributionPredicts dense 3D pointmaps and confidence directly from images, then supports global alignment for easier stereo, multi-view reconstruction, and visual localization workflows.

Citation

@misc{dust3r2024,
  title={DUSt3R: Geometric 3D Vision Made Easy},
  author={Shuzhe Wang and Vincent Leroy and Yohann Cabon and Boris Chidlovskii and Jerome Revaud},
  year={2024},
  note={CVPR 2024 / arXiv:2312.14132},
  url={https://arxiv.org/abs/2312.14132}
}

Benchmark

Only compact, source-reported numbers are shown here.

Dataset	Metric	Value	Runtime	Source
CO3Dv2	RRA@15 / RTA@15 / mAA@30	96.2 / 86.8 / 76.7 with global alignment	512px model	CVPR 2024 paper
DTU zero-shot MVS	Accuracy / completeness / overall	2.677 mm / 0.805 mm / 1.741 mm	Multi-view global alignment	DUSt3R paper

Artifacts

DUSt3R paper, MVS/pose/depth tables, Gradio demo, pretrained checkpoints, pointmaps, confidence maps, and global alignment outputs.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.