Cognition and State Modeling

sentence-transformers

Dense semantic embeddings for retrieval, similarity, clustering, and reranking.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

Convert text into dense vectors for semantic search, matching, and clustering.

InputText / sentence list
OutputVector embeddings / similarity scores
Trigger TimingTriggered on demand after the required input files and configuration are prepared.
RuntimePython / PyTorch
BeforeText / sentence list

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

AfterVector embeddings / similarity scores

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/sentence-transformers/examples/sentences.txt
Promptmodel: all-MiniLM-L6-v2
ExpectedEmbedding matrix and optional similarity scores.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

modeltextall-MiniLM-L6-v2

Sentence-transformers model identifier.

inputpath

Path to text lines or JSON records.

normalizetoggletrue

Whether to L2-normalize vectors for cosine search.

Output Explanation

embeddings

Dense vectors for each input text item.

similarity_matrix

Optional pairwise semantic similarity output.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Deployment Notes

  1. Install sentence-transformers and compatible torch versions.
  2. Download model from Hugging Face on first run or pre-cache offline.
  3. Run encoding wrapper with repository-relative input paths.
  4. Store vectors in tools/sentence-transformers/runs/ for retrieval tooling.

Relative Path Example

python tools/sentence-transformers/run.py --model all-MiniLM-L6-v2 --input tools/sentence-transformers/examples/sentences.txt --out tools/sentence-transformers/runs/embeddings.npy

Expected Result Shape

{
  "tool": "sentence-transformers",
  "status": "ok",
  "results": [
    {
      "label": "Sentence embedding",
      "score": 0.87,
      "output": "Vector embeddings / similarity scores"
    }
  ],
  "timing": {
    "runtime": "The official SBERT table reports 14,200 sentences/sec on V100, with an 80 MB model, 384-dimensional embeddings, and max sequence length 256.",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/sentence-transformers/runs/visualization.png",
    "raw_predictions": "tools/sentence-transformers/runs/predictions.json"
  }
}
Paper figure

Academic Info

Paper identity and contribution summary.

TitleSentence-BERT: Sentence Embeddings using Siamese BERT-Networks
AuthorsNils Reimers, Iryna Gurevych
VenueEMNLP-IJCNLP 2019 / arXiv:1908.10084
ContributionEnables efficient semantic similarity and retrieval with reusable sentence embeddings.

Citation

@misc{sentencetransformers2019,
  title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
  author={Nils Reimers and Iryna Gurevych},
  year={2019},
  note={EMNLP-IJCNLP 2019 / arXiv:1908.10084},
  url={https://arxiv.org/abs/1908.10084}
}

Benchmark

Only compact, source-reported numbers are shown here.

DatasetMetricValueRuntimeSource
SBERT model table, 14 sentence-embedding datasetsall-MiniLM-L6-v2 sentence performance68.0614,200 sentences/sec on V100Official SBERT pretrained model table
SBERT model table, 6 semantic-search datasetsall-MiniLM-L6-v2 semantic-search performance49.5480 MB model, 384 dimensions, max sequence length 256Official SBERT pretrained model table
SBERT model tableTraining scale1B+ training pairsMean pooling, normalized embeddingsOfficial SBERT pretrained model table

Artifacts

Official SBERT pretrained model table, model card link, pretrained models, evaluation scripts, and embedding outputs.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.