Execution and Control

R3M

Compares post-action visual observations with a goal instruction to decide whether a manipulation step physically succeeded.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

Use R3M after an action finishes to block false claims of completion and stop errors from accumulating.

InputPost-action frame + goal instruction

OutputVerification score and completion flag

Trigger TimingTriggered on demand after the required input files and configuration are prepared.

RuntimeLocal GPU

BeforePost-action frame + goal instruction

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

AfterVerification score and completion flag

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/r3m/examples/end.png

Promptpick up the red cup

ExpectedA task completion score and boolean success flag.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

current_camera_viewfiletools/r3m/examples/end.png

Post-action camera image.

semantic_task_texttextpick up the red cup

Natural-language goal or step that should be verified.

start_imagefile

Optional pre-action image for comparing state change.

Output Explanation

task_completion_score

Score estimating how well the final visual state matches the instruction.

is_successful

Boolean completion decision from the wrapper.

feature_extractor_mode

Backend representation mode used for the score.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Resources

GitHubhttps://github.com/facebookresearch/r3m Code Downloadhttps://github.com/facebookresearch/r3m/archive/refs/heads/main.zip Project Pagehttps://sites.google.com/view/robot-r3m/Paperhttps://arxiv.org/abs/2203.12601

Deployment Notes

Clone or download the official R3M repository.
Install the PyTorch dependencies and prepare the selected visual representation checkpoint.
Prepare start and end images plus a semantic task instruction under tools/r3m/examples/.
Run the verifier and save scores under tools/r3m/runs/.

Relative Path Example

python tools/r3m/run.py --start-image tools/r3m/examples/start.png --end-image tools/r3m/examples/end.png --instruction "pick up the red cup" --output tools/r3m/runs/verification.json

Expected Result Shape

{
  "tool": "r3m",
  "status": "ok",
  "results": [
    {
      "label": "Post-action success verification",
      "score": 0.87,
      "output": "Verification score and completion flag"
    }
  ],
  "timing": {
    "runtime": "The submitted wrapper describes about 50 ms local verification; the official R3M paper focuses on policy success rather than verifier latency.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/r3m/runs/visualization.png",
    "raw_predictions": "tools/r3m/runs/predictions.json"
  }
}

Paper figure

Academic Info

Paper identity and contribution summary.

TitleR3M: A Universal Visual Representation for Robot Manipulation

AuthorsAdd authors

VenueCoRL 2022

ContributionProvides a robot manipulation visual representation that can score whether an executed step matches the intended semantic goal.

Citation

@misc{r3m2022,
  title={R3M: A Universal Visual Representation for Robot Manipulation},
  author={Author},
  year={2022},
  note={CoRL 2022},
  url={https://arxiv.org/abs/2203.12601}
}

Benchmark

Only compact, source-reported numbers are shown here.

Dataset	Metric	Value	Runtime	Source
12 simulated manipulation tasks	Average imitation-learning success rate	62% success; over 20% above training from scratch and over 10% above CLIP/MoCo baselines	Frozen R3M representation for downstream policy learning	Official CoRL 2022 paper, Fig. 4
Franka Kitchen / MetaWorld / Adroit	R3M ablation success rate	53.1+/-2.7% / 69.2+/-2.0% / 65.0+/-1.7%; all domains 62.4+/-1.3%	Downstream behavior cloning evaluation	Official CoRL 2022 paper, Table 1
Real-world Franka Emika Panda manipulation	Demonstration count	20 demonstrations for real cluttered-apartment manipulation tasks	Real-robot learning setup	Official CoRL 2022 paper

Artifacts

Official CoRL 2022 paper, project page, repository link, mock start/end image shape, task instruction, and verification output from the submitted spreadsheet.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.

R3M