Short Explanation
Use Action Genome-style state modeling when the planner needs a concrete physical state instead of a static-frame guess.
Uses spatio-temporal scene graph structure to infer object states and contact relations from short video clips.
Core parameters, trigger timing, and visual before/after demo references.
Use Action Genome-style state modeling when the planner needs a concrete physical state instead of a static-frame guess.
Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.
Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.
A quick-run style example for the documentation page.
Readable controls and the meaning of each returned artifact.
temporal_video_bufferfiletools/action-genome/examples/mock_video.mp4Short clip or buffered frames used for temporal state inference.
objects_of_interesttextObject list whose states and relations should be tracked.
feature_tensorpathOptional precomputed features for the scene graph model.
bbox_tensorpathOptional detected object boxes aligned with the video frames.
object_statesFrame spans labeled with object states.
contact_relationsTemporal relations between objects and actors.
state_timelineOrdered state transitions that downstream planners can audit.
Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.
python tools/action-genome/run.py --video tools/action-genome/examples/mock_video.mp4 --objects tools/action-genome/examples/objects.json --output tools/action-genome/runs/state_timeline.json
{
"tool": "action-genome",
"status": "ok",
"scene_state": [
{
"label": "Spatio-temporal scene graph state modeling",
"score": 0.87,
"output": "State graph and temporal relations"
}
],
"timing": {
"runtime": "The submitted wrapper is described as interactive; the official paper focuses on dataset/task metrics rather than wall-clock wrapper latency.",
"device": "documented in source benchmark when available"
},
"artifacts": {
"visualization": "tools/action-genome/runs/visualization.png",
"raw_predictions": "tools/action-genome/runs/predictions.json"
}
}Paper identity and contribution summary.
@misc{actiongenome2020,
title={Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs},
author={Author},
year={2020},
note={CVPR 2020},
url={https://arxiv.org/abs/1912.06992}
}Only compact, source-reported numbers are shown here.
| Dataset | Metric | Value | Runtime | Source |
|---|---|---|---|---|
| Action Genome / Charades | Dataset scale | 10K videos, 0.4M objects, 1.7M visual relationships | Official dataset annotation scale | Official CVPR 2020 paper |
| Action Genome / Charades few-shot action recognition | mAP with 10 examples | 42.7% | Few-shot action recognition experiment | Official CVPR 2020 paper |
| Action Genome detailed annotation | Frame/object/relation coverage | 234K video frames, 476K object bounding boxes, 1.72M relationships, 157 action categories | Official dataset statistics | Official CVPR 2020 paper |
Official CVPR 2020 paper, project page, repository link, mock video input, feature tensor shape, bbox tensor shape, and state timeline output.
Visual references from the original tool. Click any image to inspect the original size.