Short Explanation
Input an image and text phrases, then Grounding DINO returns grounded boxes with confidence scores.
Text-conditioned detector that grounds natural language prompts to image regions.
Core parameters, trigger timing, and visual before/after demo references.
Input an image and text phrases, then Grounding DINO returns grounded boxes with confidence scores.
Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.
Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.
A quick-run style example for the documentation page.
Readable controls and the meaning of each returned artifact.
imagefileInput RGB image.
text_prompttextmug . cup . bottleDot-separated category words or phrases.
box_thresholdslider0.35Minimum confidence for predicted boxes.
text_thresholdslider0.25Minimum phrase similarity threshold.
boxesPredicted region coordinates.
phrasesMatched text phrases for each box.
scoresConfidence values for grounded detections.
Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.
python demo/inference_on_a_image.py -c tools/grounding-dino/config/GroundingDINO_SwinT_OGC.py -p tools/grounding-dino/weights/groundingdino_swint_ogc.pth -i tools/grounding-dino/examples/input.jpg -t "mug . cup . bottle" -o tools/grounding-dino/runs
{
"tool": "grounding-dino",
"status": "ok",
"results": [
{
"label": "Open-set object detection",
"score": 0.87,
"output": "Bounding boxes + labels + scores"
}
],
"timing": {
"runtime": "The repository exposes lighter Swin-T and stronger Swin-B checkpoints; the benchmark section highlights explicit AP numbers rather than a single official latency figure.",
"device": "documented in source benchmark when available"
},
"artifacts": {
"visualization": "tools/grounding-dino/runs/visualization.png",
"raw_predictions": "tools/grounding-dino/runs/predictions.json"
}
}Paper identity and contribution summary.
@misc{groundingdinoYEAR,
title={Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection},
author={Shilong Liu and Zhaoyang Zeng and Tianhe Ren and et al.},
year={YEAR},
note={arXiv:2303.05499},
url={https://arxiv.org/abs/2303.05499}
}Only compact, source-reported numbers are shown here.
| Dataset | Metric | Value | Runtime | Source |
|---|---|---|---|---|
| COCO zero-shot evaluation | box AP | 48.5 expected from the official evaluation script; 48.4 zero-shot / 57.2 fine-tune for GroundingDINO-T | Swin-T checkpoint | Official README and model table |
| COCO object detection checkpoints | box AP | 56.7 for GroundingDINO-B | Swin-B checkpoint | Official model table |
Official config files, pretrained weights, benchmark table, and demo outputs.
Visual references from the original tool. Click any image to inspect the original size.