Transformers¶
This page documents the RankSEG integration path for standard Hugging Face
transformers semantic-segmentation outputs.
Use this path when you already run inference through a standard
processor -> model -> outputs workflow and want RankSEG to replace the
final argmax-style prediction step.
Where RankSEG fits¶
Hugging Face segmentation models do not all expose the same raw tensor field.
Some return outputs.logits; query-based models return class-query and mask
logits; some processors also own model-specific resizing logic. The RankSEG
Transformers helper keeps the official inference flow intact and handles the
last post-processing step:
processor(images) -> model(**inputs) -> restore semantic probabilities
-> RankSEG -> prediction masks
from rankseg.integration import transformers
The main helper is:
transformers.postprocess(
outputs,
*,
model=None,
target_sizes=None,
rankseg_kwargs=None,
)
Its role is intentionally narrow:
restore probabilities from supported Hugging Face output families;
resize them to the original image size when needed;
apply
RankSEGas the final post-processing step.
Helper arguments¶
Argument or return |
Shape or type |
Meaning |
|---|---|---|
|
Structured Transformers output |
The object returned by |
|
Optional Transformers model |
Required for output families whose semantic reconstruction depends on the model configuration. |
|
List or tensor of |
One original output size per image. For a PIL image, use
|
|
|
Example: |
Return value |
|
One predicted mask per input image. |
Minimal integration¶
The standard Hugging Face inference structure stays the same. The only
integration change happens after outputs = model(**inputs).
from transformers import SegformerImageProcessor, AutoModelForSemanticSegmentation
from rankseg.integration import transformers
from PIL import Image
import requests
processor = SegformerImageProcessor.from_pretrained("mattmdjaga/segformer_b2_clothes")
model = AutoModelForSemanticSegmentation.from_pretrained("mattmdjaga/segformer_b2_clothes")
image = Image.open(requests.get("https://plus.unsplash.com/premium_photo-1673210886161-bfcc40f54d1f?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8MXx8cGVyc29uJTIwc3RhbmRpbmd8ZW58MHx8MHx8&w=1000&q=80", stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
preds = transformers.postprocess(
outputs,
target_sizes=[image.size[::-1]],
rankseg_kwargs={"metric": "dice"},
)
For supported output families, transformers.postprocess(...) preserves the
surrounding Hugging Face inference code and replaces only the final prediction
step. SAM-family outputs are intentionally handled by sam.Sam1,
sam.Sam2, or sam.Sam3 after from rankseg.integration import sam
instead of this helper.
Compare with the usual argmax step¶
The usual SegFormer-style baseline is:
import torch.nn.functional as F
upsampled_logits = F.interpolate(
outputs.logits,
size=image.size[::-1],
mode="bilinear",
align_corners=False,
)
baseline_pred = upsampled_logits.argmax(dim=1)[0]
The RankSEG version asks the helper to restore probabilities and then produce the final prediction:
rankseg_pred = transformers.postprocess(
outputs,
target_sizes=[image.size[::-1]],
rankseg_kwargs={"metric": "dice", "solver": "RMA"},
)[0]
This is the intended replacement point: the processor, model, checkpoint, and input preparation remain unchanged.
Advanced probability helper¶
The namespace also exposes:
from rankseg.integration import transformers
transformers.restore_semantic_probs(...) returns restored semantic
probability maps directly as a per-image list of (C, H, W) tensors. Use it
when you need probability tensors instead of final RankSEG predictions.
Pass target_sizes as one (height, width) entry per batch item, for
example target_sizes=[image.size[::-1]] for a single PIL image.
transformers.postprocess(...) follows the same per-image list convention
for prediction outputs.
Explicit helper imports are also supported when you prefer shorter local names:
from rankseg.integration.transformers import postprocess, restore_semantic_probs
Supported output families¶
The standard Transformers helper supports the main semantic-segmentation
output families used by transformers:
outputs.logitsoutputs.class_queries_logits+outputs.masks_queries_logitsoutputs.logits+outputs.pred_masksoutputs.semantic_seg
When a branch requires model-specific handling, pass model=... so the
helper can follow the corresponding official post-processing behavior.
Output defaults¶
If rankseg_kwargs omits output_mode, the helper chooses:
"multiclass"when the restored semantic probability map has more than one class channel;"multilabel"when the restored probability map has one channel.
You can override this explicitly:
preds = transformers.postprocess(
outputs,
target_sizes=[image.size[::-1]],
rankseg_kwargs={
"metric": "dice",
"solver": "RMA",
"output_mode": "multiclass",
},
)
Current exclusions¶
The simplified API does not currently support:
SAM-family outputs, which use the explicit adapters in
rankseg.integration.sam;outputs with
patch_offsetsthat require official patch-merge logic;tuple-style outputs such as
return_dict=Falsereturns;custom unstructured outputs from
trust_remote_code=Truemodels;SegGPT-style
pred_maskssemantic reconstruction.
These cases should fail explicitly rather than silently using an incorrect semantic restoration path.
Executable tutorial¶
The notebook below is written as a user-facing tutorial: it first runs the official Hugging Face baseline, then repeats the same inference flow with only the final post-processing step replaced by RankSEG.
The maintained script version is: