SAM Family ========== This page documents the RankSEG integration path for SAM-family outputs from Hugging Face ``transformers``. SAM-family outputs use explicit adapter classes instead of the standard Transformers semantic-segmentation helper. Why SAM has its own adapters ---------------------------- SAM outputs are not plain semantic-segmentation logits. Before RankSEG is called, SAM masks must be restored from model space back to image space using family-specific geometry: .. code-block:: text SAM processor -> SAM model outputs -> restore mask probabilities -> RankSEG -> prompt, instance, or semantic masks The adapters in ``rankseg.integration.sam`` keep this restoration step explicit. RankSEG only replaces the final binary mask step after masks have been resized and converted to probabilities. .. code-block:: python from rankseg.integration import sam Adapter map ----------- .. list-table:: :widths: 18 28 28 26 :header-rows: 1 * - Adapter - Input family - Main prediction method - Probability-only method * - ``sam.Sam1`` - SAM1 and SAM-HQ prompt masks - ``postprocess(...)`` - ``restore_mask_probs(...)`` * - ``sam.Sam2`` - SAM2 prompt masks - ``postprocess(...)`` - ``restore_mask_probs(...)`` * - ``sam.Sam3`` - SAM3 instance masks - ``postprocess_instance(...)`` - ``restore_instance_mask_probs(...)`` * - ``sam.Sam3`` - SAM3 semantic masks - ``postprocess_semantic(...)`` - ``restore_semantic_mask_probs(...)`` Recommended RankSEG options --------------------------- SAM prompt and instance masks are naturally represented as per-mask binary probability maps, so the adapters default to ``output_mode="multilabel"`` when ``rankseg_kwargs`` does not specify an output mode. .. code-block:: python adapter = sam.Sam1( rankseg_kwargs={"metric": "dice", "solver": "RMA"} ) SAM1 prompt masks ----------------- .. code-block:: python adapter = sam.Sam1(rankseg_kwargs={"metric": "dice"}) preds = adapter.postprocess( outputs, original_sizes=inputs["original_sizes"], reshaped_input_sizes=inputs["reshaped_input_sizes"], ) ``original_sizes`` and ``reshaped_input_sizes`` should come from the SAM processor inputs. The adapter removes padding, resizes masks back to the original image size, applies ``sigmoid``, and then calls RankSEG. SAM2 prompt masks ----------------- .. code-block:: python adapter = sam.Sam2( rankseg_kwargs={"metric": "dice"}, apply_non_overlapping_constraints=False, ) preds = adapter.postprocess( outputs, original_sizes=inputs["original_sizes"], ) Set ``apply_non_overlapping_constraints=True`` when you want lower-scoring overlapping SAM2 masks to be suppressed before converting logits to probabilities. SAM3 instance masks ------------------- .. code-block:: python adapter = sam.Sam3(rankseg_kwargs={"metric": "dice"}, threshold=0.3) results = adapter.postprocess_instance( outputs, target_sizes=target_sizes, ) The instance method returns one dictionary per image with ``scores``, ``boxes``, and ``masks``. ``threshold`` filters low-confidence instances before RankSEG is applied to the remaining mask probabilities. SAM3 semantic masks ------------------- .. code-block:: python adapter = sam.Sam3(rankseg_kwargs={"metric": "dice"}) preds = adapter.postprocess_semantic( outputs, target_sizes=target_sizes, ) The SAM adapters follow the official Transformers post-processing order through geometry and score restoration. RankSEG replaces the final binary mask step. For SAM3, callers choose ``postprocess_instance(...)`` or ``postprocess_semantic(...)`` explicitly. Shape conventions ----------------- .. list-table:: :widths: 28 34 38 :header-rows: 1 * - Stage - Typical shape - Meaning * - Restored SAM prompt probabilities - ``(num_masks, 1, H, W)`` or compatible per-image tensors - One probability map per proposed prompt mask. * - Restored SAM3 instance probabilities - ``(num_instances, H, W)`` - One probability map per retained instance. * - RankSEG prompt predictions - Binary mask tensors matching restored mask geometry - Final prompt masks after RankSEG post-processing. * - RankSEG semantic predictions - One tensor per image - Final semantic masks from SAM3 semantic probabilities. Restored probabilities ---------------------- Use the ``restore_*`` methods when you need the restored mask probabilities instead of final RankSEG predictions: .. code-block:: python sam1_probs = sam.Sam1().restore_mask_probs( outputs, original_sizes=inputs["original_sizes"], reshaped_input_sizes=inputs["reshaped_input_sizes"], ) sam2_probs = sam.Sam2().restore_mask_probs( outputs, original_sizes=inputs["original_sizes"], ) sam3_instances = sam.Sam3(threshold=0.3).restore_instance_mask_probs( outputs, target_sizes=target_sizes, ) sam3_semantic_probs = sam.Sam3().restore_semantic_mask_probs( outputs, target_sizes=target_sizes, ) Explicit adapter imports are also supported when you prefer shorter local names: .. code-block:: python from rankseg.integration.sam import Sam1, Sam2, Sam3 Current exclusions ------------------ The SAM integration does not currently support SAM video tracker state. Executable tutorial ------------------- The SAM-family notebook is the recommended way to learn this integration. It runs SAM1, SAM2, and SAM3 examples in separate sections, compares the official post-processing path with the RankSEG path, and keeps the model outputs shared between the two paths so the replacement point is visible. - `notebooks/rankseg_with_sam_family.ipynb `_ - `Open in Colab `_