Cognition and State Modeling

DUSt3R

DUSt3R is a geometric 3D vision tool that reconstructs pointmaps, camera relationships, and aligned 3D structure from image pairs or multi-view image collections.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

Upload two or more images and DUSt3R predicts dense 3D pointmaps, confidence maps, and camera relationships without a traditional SfM preprocessing pipeline.

InputImage pair / image set
Output3D pointmaps, camera poses, confidence maps
Trigger TimingTriggered on demand from the source demo or local example command.
RuntimePython / PyTorch / Gradio demo
BeforeImage pair / image set

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

After3D pointmaps, camera poses, confidence maps

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/dust3r/examples/images/
PromptUse 512px ViT-Large checkpoint and global alignment
ExpectedAligned 3D point cloud, pair confidence maps, and camera pose estimates.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

imagesfile

Image pair or image collection used for reconstruction.

model_nameselectDUSt3R_ViTLarge_BaseDecoder_512_dpt

Pretrained checkpoint used for pointmap and confidence prediction.

image_sizeselect512

Input resolution used by the pretrained model.

global_alignmenttoggletrue

Optimizes multiple pair predictions into one coherent scene.

Output Explanation

pointmaps

Dense 3D points predicted for each image in a shared or alignable coordinate frame.

confidence

Per-pixel confidence values that help filter unreliable geometry.

camera_poses

Estimated camera relationships recovered during pair inference or global alignment.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Deployment Notes

  1. Clone the official DUSt3R repository and install the PyTorch/Gradio dependencies.
  2. Download the official 512px pretrained checkpoint or let the demo resolve the model name.
  3. Run demo.py for the local UI or call the inference and global alignment modules programmatically.
  4. Export point clouds, confidence maps, and visualizations under tools/dust3r/runs/.

Relative Path Example

# Relative-path local entry for the DUSt3R tool folder
python tools/dust3r/demo.py   --model_name DUSt3R_ViTLarge_BaseDecoder_512_dpt   --local_network

# Local checkpoint example:
python tools/dust3r/demo.py   --weights tools/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth   --image_size 512

# Programmatic entry points:
# tools/dust3r/dust3r/inference.py
# tools/dust3r/dust3r/model.py
# tools/dust3r/dust3r/cloud_opt/
# tools/dust3r/visloc.py

# This page documents the path. The static page does not execute DUSt3R.

Expected Result Shape

{
  "tool": "dust3r",
  "status": "ok",
  "scene_state": [
    {
      "label": "Geometric 3D reconstruction",
      "score": 0.87,
      "output": "3D pointmaps, camera poses, confidence maps"
    }
  ],
  "timing": {
    "runtime": "All main results use the same 512px model; multi-view global alignment scales with image count and pair count. The supplement reports training on about 8.5M extracted image pairs.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/dust3r/runs/visualization.png",
    "raw_predictions": "tools/dust3r/runs/predictions.json"
  }
}
Paper figure

Academic Info

Paper identity and contribution summary.

TitleDUSt3R: Geometric 3D Vision Made Easy
AuthorsShuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud
VenueCVPR 2024 / arXiv:2312.14132
ContributionPredicts dense 3D pointmaps and confidence directly from images, then supports global alignment for easier stereo, multi-view reconstruction, and visual localization workflows.

Citation

@misc{dust3r2024,
  title={DUSt3R: Geometric 3D Vision Made Easy},
  author={Shuzhe Wang and Vincent Leroy and Yohann Cabon and Boris Chidlovskii and Jerome Revaud},
  year={2024},
  note={CVPR 2024 / arXiv:2312.14132},
  url={https://arxiv.org/abs/2312.14132}
}

Benchmark

Only compact, source-reported numbers are shown here.

DatasetMetricValueRuntimeSource
CO3Dv2RRA@15 / RTA@15 / mAA@3096.2 / 86.8 / 76.7 with global alignment512px modelCVPR 2024 paper
DTU zero-shot MVSAccuracy / completeness / overall2.677 mm / 0.805 mm / 1.741 mmMulti-view global alignmentDUSt3R paper

Artifacts

DUSt3R paper, MVS/pose/depth tables, Gradio demo, pretrained checkpoints, pointmaps, confidence maps, and global alignment outputs.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.