Perception and Grounding

MiDaS

Cross-dataset depth estimation model for relative depth prediction from single images.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

MiDaS predicts relative depth maps for single RGB images across diverse scenes.

InputRGB image
OutputDepth map
Trigger TimingTriggered on demand after the required input files and configuration are prepared.
RuntimePython / PyTorch
BeforeRGB image

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

AfterDepth map

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/midas/examples/input.jpg
Promptmodel_type: dpt_beit_large_512
ExpectedA depth map file exported to the output directory.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

input_pathpath

Input image file or folder.

output_pathpath

Destination for predicted depth outputs.

model_typeselectdpt_beit_large_512

Checkpoint variant controlling quality and speed.

Output Explanation

depth_map

Single-channel relative depth prediction.

sidecar_visualization

Optional visualization image for quick inspection.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Deployment Notes

  1. Install MiDaS requirements and download selected checkpoints.
  2. Choose model_type according to GPU memory and target speed.
  3. Run inference with repository-relative input/output paths.
  4. Archive output depth maps in tools/midas/runs/.

Relative Path Example

python run.py --input_path tools/midas/examples --output_path tools/midas/runs --model_type dpt_beit_large_512

Expected Result Shape

{
  "tool": "midas",
  "status": "ok",
  "depth_map": [
    {
      "label": "Monocular depth estimation",
      "score": 0.87,
      "output": "Depth map"
    }
  ],
  "timing": {
    "runtime": "The official README reports 5.7 FPS for BEiT-L-512 on RTX 3090 with 345M parameters, and 30 FPS for Next-ViT-L-384 with 72M parameters.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/midas/runs/visualization.png",
    "raw_predictions": "tools/midas/runs/predictions.json"
  }
}
Paper figure

Academic Info

Paper identity and contribution summary.

TitleTowards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
AuthorsRené Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
VenueTPAMI 2022 / arXiv:1907.01341
ContributionShows robust relative depth estimation through mixed-dataset training and transfer-oriented objectives.

Citation

@misc{midas2022,
  title={Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
  author={René Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
  year={2022},
  note={TPAMI 2022 / arXiv:1907.01341},
  url={https://arxiv.org/abs/1907.01341}
}

Benchmark

Only compact, source-reported numbers are shown here.

DatasetMetricValueRuntimeSource
6-dataset zero-shot benchmarkDIW WHDR / ETH3D AbsRel / Sintel AbsRel / TUM / KITTI / NYUv2 zero-shot error0.1137 / 0.0659 / 0.2366 / 6.13 / 11.56 / 1.86 for MiDaS v3.1 BEiT-L-512345M params, 5.7 FPS on RTX 3090Official README
6-dataset zero-shot benchmarkDIW WHDR / ETH3D AbsRel / Sintel AbsRel / TUM / KITTI / NYUv2 zero-shot error0.1031 / 0.0954 / 0.2295 / 9.21 / 6.89 / 3.47 for MiDaS v3.1 Next-ViT-L-38472M params, 30 FPS on RTX 3090Official README

Artifacts

Official model table, checkpoints, accuracy-speed figure, and run.py inference entry.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.