Perception and Grounding

LINGO-Space

LINGO-Space incrementally grounds relational language into a probabilistic spatial distribution, letting a robot localize placement targets from instructions such as between, left of, or near.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

Provide one scene image and a spatial instruction, and LINGO-Space returns the referred region as a probabilistic distribution plus a target point.

InputRGB image + spatial instruction

OutputGrounded target point, detections, heatmap, scene relations

Trigger TimingTriggered on demand after the required input files and configuration are prepared.

RuntimePython / PyTorch / GroundingDINO

BeforeRGB image + spatial instruction

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

AfterGrounded target point, detections, heatmap, scene relations

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/lingo_space/example/input_image.jpg

Prompt把红方块放到蓝色长方块和蓝色箱子之间

ExpectedA grounded target point, detected reference objects, a probability heatmap, and an overlay showing the inferred placement region.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

imagefile

Single RGB tabletop image used for spatial grounding.

instructiontext

Natural-language spatial command describing the source object and its intended relation to references.

output_dirpathtools/lingo_space/example

Directory where result JSON, heatmaps, overlays, and logs are written.

deviceselectcpu

Inference device for the local wrapper.

Output Explanation

predicted_point_pixel

Final target location in image coordinates.

detections

Detected source and reference objects with labels, boxes, and scores.

confidence_summary

Peak and mass statistics for the grounded spatial distribution.

scene_graph

Lightweight relational graph built from the detected objects during local inference.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Resources

GitHubhttps://github.com/rirolab/LINGO-Space Code Downloadhttps://github.com/rirolab/LINGO-Space/archive/refs/heads/main.zip Project Pagehttps://lingo-space.github.io Paperhttps://arxiv.org/abs/2402.01183 Composite Checkpointhttps://github.com/rirolab/LINGO-Space/releases/download/v0.1.0-alpha/composite.pt GroundingDINO Weightshttps://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth Colab Demohttps://colab.research.google.com/drive/14Nl0sozJ3JpfwxkfwGk_0s8k8DOqTEVN?usp=sharing

Deployment Notes

Clone the official repository with submodules and create the `lingo_space` Conda environment with Python 3.8.
Install PyTorch, PyG, CLIP, the listed Python dependencies, and editable GroundingDINO from the bundled submodule.
Download the published composite checkpoint and the GroundingDINO weights into the repository-relative folders described in the deployment README.
Run `run_inference.py` with a local image, instruction string, and output directory to export heatmaps, overlays, and result JSON.

Relative Path Example

# Relative-path local entry for the LINGO-Space deployment
cd tools/lingo_space/LINGO-Space
python run_inference.py   --image tools/lingo_space/example/input_image.jpg   --instruction "把红方块放到蓝色长方块和蓝色箱子之间"   --output-dir tools/lingo_space/example   --device cpu

# This page documents the path. The static page does not execute LINGO-Space.

Expected Result Shape

{
  "tool": "lingo_space",
  "status": "ok",
  "results": [
    {
      "label": "Language-conditioned spatial grounding",
      "score": 0.87,
      "output": "Grounded target point, detections, heatmap, scene relations"
    }
  ],
  "timing": {
    "runtime": "The paper reports task success scores rather than a single source-reported latency number; the local wrapper supports CPU inference.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/lingo_space/runs/visualization.png",
    "raw_predictions": "tools/lingo_space/runs/predictions.json"
  }
}

Paper figure

Academic Info

Paper identity and contribution summary.

TitleLINGO-Space: Language-Conditioned Incremental Grounding for Space

AuthorsDohyun Kim, Nayoung Oh, Deokmin Hwang, Daehyung Park

VenueAAAI 2024

ContributionModels spatial language as an incrementally updated probabilistic grounding distribution so robots can resolve compositional placement expressions over tabletop scenes.

Citation

@misc{lingo_space2024,
  title={LINGO-Space: Language-Conditioned Incremental Grounding for Space},
  author={Dohyun Kim and Nayoung Oh and Deokmin Hwang and Daehyung Park},
  year={2024},
  note={AAAI 2024},
  url={https://arxiv.org/abs/2402.01183}
}

Benchmark

Only compact, source-reported numbers are shown here.

Dataset	Metric	Value	Runtime	Source
Single referring expression, 12 tabletop tasks	Success score range	80.0 to 100.0 across the 12 reported tasks	Source paper result	Official AAAI 2024 paper, Table 1
New spatial predicates	Seen/unseen success scores	close 86.0 / 81.0; far 95.5 / 95.0	Source paper result	Official AAAI 2024 paper, Table 2
Multiple referring expressions	Compositional / one-step / composite success scores	90.5; 97.5 / 96.5; 79.1	Source paper result	Official AAAI 2024 paper, Table 3

Artifacts

Official AAAI 2024 paper tables, project page, pipeline figure, deployment README, local result JSON, heatmap, overlay, debug image, and run log.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.