Skip to content

Extract vision features from images

model.extract() operates on any {stim_id: image} mapping independently of BaseData.

Prepare images

import numpy as np

# Values can be: str/Path (local file), np.ndarray (HWC uint8), or PIL.Image
imgs: dict[str, np.ndarray] = {
    f"img_{i:03d}": rng.integers(0, 255, (224, 224, 3), dtype=np.uint8)
    for i in range(20)
}

Extract features

vrs = model.extract(imgs, batch_size=16)
# VisualRepresentations(20 stimuli x 13 modules)

vrs.meta  # DataFrame: model, module_type, module_name, shape

Index results

Index Returns
vrs["layer_name"] VisualRepresentation (single layer)
vrs[int] VisualRepresentation (by position)
vrs[bool_mask] VisualRepresentations (subset) or VisualRepresentation (1 match)
# By layer name
vr = vrs["layernorm"]
arr = vrs.numpy("layernorm")    # → ndarray, shape (20, 257, 768)
t   = vrs.to_tensor("layernorm")  # → torch.Tensor

# Bool mask — multiple matches → VisualRepresentations
subset = vrs[vrs.meta["module_type"] == "Dinov2Layer"]

# Select a stimulus subset (all layers aligned)
vrs_5 = vrs.select(list(vrs.stim_ids[:5]))