Perception and Grounding

MiDaS

Cross-dataset depth estimation model for relative depth prediction from single images.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

MiDaS predicts relative depth maps for single RGB images across diverse scenes.

InputRGB image

OutputDepth map

Trigger TimingTriggered on demand after the required input files and configuration are prepared.

RuntimePython / PyTorch

BeforeRGB image

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

AfterDepth map

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/midas/examples/input.jpg

Promptmodel_type: dpt_beit_large_512

ExpectedA depth map file exported to the output directory.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

input_pathpath

Input image file or folder.

output_pathpath

Destination for predicted depth outputs.

model_typeselectdpt_beit_large_512

Checkpoint variant controlling quality and speed.

Output Explanation

depth_map

Single-channel relative depth prediction.

sidecar_visualization

Optional visualization image for quick inspection.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Resources

GitHubhttps://github.com/isl-org/MiDaS Code Downloadhttps://github.com/isl-org/MiDaS/archive/refs/heads/master.zip arXivhttps://arxiv.org/abs/1907.01341 MiDaS Model Listhttps://github.com/isl-org/MiDaS#accuracy

Deployment Notes

Install MiDaS requirements and download selected checkpoints.
Choose model_type according to GPU memory and target speed.
Run inference with repository-relative input/output paths.
Archive output depth maps in tools/midas/runs/.

Relative Path Example

python run.py --input_path tools/midas/examples --output_path tools/midas/runs --model_type dpt_beit_large_512

Expected Result Shape

{
  "tool": "midas",
  "status": "ok",
  "depth_map": [
    {
      "label": "Monocular depth estimation",
      "score": 0.87,
      "output": "Depth map"
    }
  ],
  "timing": {
    "runtime": "The official README reports 5.7 FPS for BEiT-L-512 on RTX 3090 with 345M parameters, and 30 FPS for Next-ViT-L-384 with 72M parameters.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/midas/runs/visualization.png",
    "raw_predictions": "tools/midas/runs/predictions.json"
  }
}

Paper figure

Academic Info

Paper identity and contribution summary.

TitleTowards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

AuthorsRené Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun

VenueTPAMI 2022 / arXiv:1907.01341

ContributionShows robust relative depth estimation through mixed-dataset training and transfer-oriented objectives.

Citation

@misc{midas2022,
  title={Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
  author={René Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
  year={2022},
  note={TPAMI 2022 / arXiv:1907.01341},
  url={https://arxiv.org/abs/1907.01341}
}

Benchmark

Only compact, source-reported numbers are shown here.

Dataset	Metric	Value	Runtime	Source
6-dataset zero-shot benchmark	DIW WHDR / ETH3D AbsRel / Sintel AbsRel / TUM / KITTI / NYUv2 zero-shot error	0.1137 / 0.0659 / 0.2366 / 6.13 / 11.56 / 1.86 for MiDaS v3.1 BEiT-L-512	345M params, 5.7 FPS on RTX 3090	Official README
6-dataset zero-shot benchmark	DIW WHDR / ETH3D AbsRel / Sintel AbsRel / TUM / KITTI / NYUv2 zero-shot error	0.1031 / 0.0954 / 0.2295 / 9.21 / 6.89 / 3.47 for MiDaS v3.1 Next-ViT-L-384	72M params, 30 FPS on RTX 3090	Official README

Artifacts

Official model table, checkpoints, accuracy-speed figure, and run.py inference entry.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.