Short Explanation
Provide one scene image and a spatial instruction, and LINGO-Space returns the referred region as a probabilistic distribution plus a target point.
LINGO-Space incrementally grounds relational language into a probabilistic spatial distribution, letting a robot localize placement targets from instructions such as between, left of, or near.
Core parameters, trigger timing, and visual before/after demo references.
Provide one scene image and a spatial instruction, and LINGO-Space returns the referred region as a probabilistic distribution plus a target point.
Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.
Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.
A quick-run style example for the documentation page.
Readable controls and the meaning of each returned artifact.
imagefileSingle RGB tabletop image used for spatial grounding.
instructiontextNatural-language spatial command describing the source object and its intended relation to references.
output_dirpathtools/lingo_space/exampleDirectory where result JSON, heatmaps, overlays, and logs are written.
deviceselectcpuInference device for the local wrapper.
predicted_point_pixelFinal target location in image coordinates.
detectionsDetected source and reference objects with labels, boxes, and scores.
confidence_summaryPeak and mass statistics for the grounded spatial distribution.
scene_graphLightweight relational graph built from the detected objects during local inference.
Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.
# Relative-path local entry for the LINGO-Space deployment cd tools/lingo_space/LINGO-Space python run_inference.py --image tools/lingo_space/example/input_image.jpg --instruction "把红方块放到蓝色长方块和蓝色箱子之间" --output-dir tools/lingo_space/example --device cpu # This page documents the path. The static page does not execute LINGO-Space.
{
"tool": "lingo_space",
"status": "ok",
"results": [
{
"label": "Language-conditioned spatial grounding",
"score": 0.87,
"output": "Grounded target point, detections, heatmap, scene relations"
}
],
"timing": {
"runtime": "The paper reports task success scores rather than a single source-reported latency number; the local wrapper supports CPU inference.",
"device": "documented in source benchmark when available"
},
"artifacts": {
"visualization": "tools/lingo_space/runs/visualization.png",
"raw_predictions": "tools/lingo_space/runs/predictions.json"
}
}Paper identity and contribution summary.
@misc{lingo_space2024,
title={LINGO-Space: Language-Conditioned Incremental Grounding for Space},
author={Dohyun Kim and Nayoung Oh and Deokmin Hwang and Daehyung Park},
year={2024},
note={AAAI 2024},
url={https://arxiv.org/abs/2402.01183}
}Only compact, source-reported numbers are shown here.
| Dataset | Metric | Value | Runtime | Source |
|---|---|---|---|---|
| Single referring expression, 12 tabletop tasks | Success score range | 80.0 to 100.0 across the 12 reported tasks | Source paper result | Official AAAI 2024 paper, Table 1 |
| New spatial predicates | Seen/unseen success scores | close 86.0 / 81.0; far 95.5 / 95.0 | Source paper result | Official AAAI 2024 paper, Table 2 |
| Multiple referring expressions | Compositional / one-step / composite success scores | 90.5; 97.5 / 96.5; 79.1 | Source paper result | Official AAAI 2024 paper, Table 3 |
Official AAAI 2024 paper tables, project page, pipeline figure, deployment README, local result JSON, heatmap, overlay, debug image, and run log.
Visual references from the original tool. Click any image to inspect the original size.