Extract vision features from images¶

model.extract() operates on any {stim_id: image} mapping independently of BaseData.

Prepare images¶

import numpy as np

# Values can be: str/Path (local file), np.ndarray (HWC uint8), or PIL.Image
imgs: dict[str, np.ndarray] = {
    f"img_{i:03d}": rng.integers(0, 255, (224, 224, 3), dtype=np.uint8)
    for i in range(20)
}

Extract features¶

vrs = model.extract(imgs, batch_size=16)
# VisualRepresentations(20 stimuli x 13 modules)

vrs.meta  # DataFrame: model, module_type, module_name, shape

Index results¶

Index	Returns
`vrs["layer_name"]`	`VisualRepresentation` (single layer)
`vrs[int]`	`VisualRepresentation` (by position)
`vrs[bool_mask]`	`VisualRepresentations` (subset) or `VisualRepresentation` (1 match)

# By layer name
vr = vrs["layernorm"]
arr = vrs.numpy("layernorm")    # → ndarray, shape (20, 257, 768)
t   = vrs.to_tensor("layernorm")  # → torch.Tensor

# Bool mask — multiple matches → VisualRepresentations
subset = vrs[vrs.meta["module_type"] == "Dinov2Layer"]

# Select a stimulus subset (all layers aligned)
vrs_5 = vrs.select(list(vrs.stim_ids[:5]))