MAMBO-G: Magnitude-Aware Mitigation for Boosted Guidance

ICML 2026 · Poster

MAMBO-G: Magnitude-Aware Mitigation for Boosted Guidance

Lossless conditional generation speedup by damping guidance updates that are too large for the current denoising state.

Training-free.

Plug-and-play with CFG.

3x lossless acceleration on SD3.5, 4x on Lumina, 2x on Wan2.1-14B.

3xSD3.5 lossless speedup

4xLumina lossless speedup

2xWan2.1-14B lossless speedup

0training needed

Presented by Matrix Team — Efficient Generative Inference

ICML Poster arXiv HTML PDF Diffusers PR Results BibTeX

StatusICML 2026 poster arXiv2508.03442 v4 Methodtraining-free guidance DiffusersMagnitudeAwareGuidance

Authors & Affiliations

Shangwen Zhu^1,*, Qianyu Peng^2,*, Zhilei Shu^3,*, Yuting Hu¹, Zhantao Yang¹, Han Zhang¹, Zhao Pu¹, Andy Zheng⁴, Xinyu Cui⁵, Jian Zhao⁶, Ruili Feng^4,†, Fan Cheng^1,†

¹Shanghai Jiao Tong University · ²The University of Hong Kong · ³University of Science and Technology of China · ⁴University of Waterloo · ⁵Chinese Academy of Sciences · ⁶Zhongguancun Academy

* Equal contribution. † Corresponding authors.

Affiliated institutions

Diffusers quickstart

Use MAMBO-G by swapping the guider

0 quality loss No drop in qualitative examples or quantitative metrics.

1guider swap

2API calls

4xlossless speedup

0training required

Validated on SD3.5, Lumina, and Wan2.1-14B; Qwen-Image workflow included below. Broad Diffusers CFG coverage: Stable Diffusion/SDXL/SD3, FLUX, Qwen-Image, Lumina, Hunyuan-DiT, PixArt, Sana, Kolors, Wan, CogVideoX, and related pipelines.

Keep your pipeline unchanged. Replace CFG with MagnitudeAwareGuidance.

Minimal change

from diffusers.guiders import MagnitudeAwareGuidance

pipeline.update_components(
    guider=MagnitudeAwareGuidance(
        guidance_scale=10.0,
        alpha=8.0,
        guidance_rescale=1.0,
    )
)

Full Qwen-Image modular pipeline example

import torch
from diffusers.guiders import ClassifierFreeGuidance, MagnitudeAwareGuidance
from diffusers.modular_pipelines import SequentialPipelineBlocks
from diffusers.modular_pipelines.qwenimage import TEXT2IMAGE_BLOCKS

# 1. Build the Qwen-Image modular pipeline.
blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
pipeline = blocks.init_pipeline("YiYiXu/QwenImage-modular")
pipeline.load_components(torch_dtype=torch.bfloat16)
pipeline.to("cuda")

# 2. Keep the same prompt, size, steps, and seed for a fair comparison.
prompt = "a comic portrait of a female necromancer with big cute eyes, fine face, realistic shaded lighting, anime style"
width, height = 1328, 1328
num_inference_steps = 10
seed = 1
generator = torch.Generator("cuda").manual_seed(seed)

# 3. Baseline: standard Classifier-Free Guidance.
pipeline.update_components(
    guider=ClassifierFreeGuidance(guidance_scale=4.0)
)
image = pipeline(
    prompt=prompt,
    width=width,
    height=height,
    output="images",
    num_inference_steps=num_inference_steps,
    generator=generator,
)[0]
image.save(f"t2i_cfg_{num_inference_steps}_steps.png")

# 4. MAMBO-G: swap only the guider, then run the same workflow.
generator = torch.Generator("cuda").manual_seed(seed)
pipeline.update_components(
    guider=MagnitudeAwareGuidance(
        guidance_scale=10.0,
        alpha=8.0,
        guidance_rescale=1.0,
    )
)
image = pipeline(
    prompt=prompt,
    width=width,
    height=height,
    output="images",
    num_inference_steps=num_inference_steps,
    generator=generator,
)[0]
image.save(f"t2i_mambo_g_{num_inference_steps}_steps.png")

No retraining.Use the same model weights.

No sampler rewrite.Swap the guider only.

No extra branch.Ratio comes from CFG predictions.

CFG · 20 NFE

CFG · 60 NFE

MAMBO-G · 20 NFE

MAMBO-G narrows the speed-quality gap. In the paper teaser, MAMBO-G reaches the quality of longer CFG image sampling using fewer function evaluations.

News

Latest updates on MAMBO-G.

ICML 2026 MAMBO-G accepted as an ICML 2026 poster. Poster page. NEW
arXiv v4 The latest HTML version is available at arXiv:2508.03442v4.
Diffusers MagnitudeAwareGuidance is available through the Diffusers API path. The quickstart above shows the guider swap.

Idea

Guidance should be strong only when the state can absorb it

Classifier-Free Guidance improves prompt alignment, but early sampling states are dominated by noise. MAMBO-G watches the relative guidance magnitude and automatically damps high-risk updates.

The failure mode

At initialization, guidance directions for different seeds are almost identical for the same prompt. Pushing every random noise sample along this generic direction with a large scale can overshoot the data manifold.

The signal

r_t = || v_cond - v_uncond || || v_uncond ||

A high ratio means the conditional update is large compared with the model's own denoising velocity.

The fix

MAMBO-G maps that ratio to a sample-wise guidance scale: aggressive when the update is safe, exponentially damped when the update is disproportionate.

Method

A small ratio gate in the sampling loop

No retraining, no new model branch, no architectural change. MAMBO-G only changes how strongly CFG is applied at each step and for each sample.

Measure relative guidance strength

Compute the magnitude ratio between the CFG update and the unconditional velocity at the current denoising step.

Damp risky outliers

Use an exponential schedule so large ratios receive stronger suppression, while normal updates retain boosted guidance.

Keep the pipeline unchanged

The method remains compatible with existing flow and diffusion samplers and can stack with other guidance stabilization methods.

Adaptive guidance scale w(r_t) = 1 + (w_max - 1) · exp(-αr_t)

When r_t is small, the scale approaches w_max. When r_t is large, the scale relaxes toward 1 to prevent unstable early amplification.

Diagnosis

The ratio identifies when guidance is unsafe

The paper backs the schedule with a compact chain of measurements: generic initial directions, a large early ratio, lower quality for high-ratio samples, and an exponential relation between ratio and optimal guidance scale.

Cosine similarity of guidance updates across seeds — Initial guidance updates collapse to nearly the same direction across random seeds.

Ratio dynamics over sampling steps — The relative guidance ratio peaks early, exactly where zero-SNR sampling is most fragile.

ImageReward distributions for low-ratio and high-ratio groups — Low-ratio samples score higher, validating the ratio as a stability indicator.

Optimal guidance scale versus ratio — The best guidance scale decays as the ratio grows, motivating the exponential gate.

Results

Lossless acceleration with fewer sampling steps

MAMBO-G is evaluated on text-to-image models, text-to-video models, high-resolution generation, scheduler variants, and combinations with other guidance methods.

3xlossless speedup on SD3.5

4xlossless speedup on Lumina

2xlossless speedup on Wan2.1-14B

1024²stable high-resolution Qwen-Image samples

Text-to-image

SD3.5 and Lumina improve in low-step regimes

SD3.5 ImageReward comparison — SD3.5 · ImageReward

SD3.5 CLIPScore comparison — SD3.5 · CLIPScore

Lumina ImageReward comparison — Lumina · ImageReward

Lumina CLIPScore comparison — Lumina · CLIPScore

Text-to-video

Video quality benefits from the same magnitude-aware damping

vBench aesthetic quality comparison — vBench · Aesthetic quality

vBench imaging quality comparison — vBench · Imaging quality

MAMBO-G video strip — MAMBO-G video sample

Resolution stress test

Baseline CFG degrades with dimensionality; MAMBO-G stays coherent

256

512

768

1024

CFG

MAMBO-G

Resolution	CFG ImageReward	MAMBO-G ImageReward
256 x 256	0.53	0.83
512 x 512	0.63	1.10
768 x 768	0.30	1.07
1024 x 1024	0.20	1.02

Release

Designed for practical adoption

The method is intentionally small: it only needs the conditional and unconditional predictions already computed by CFG.

Plug-and-play API path

The paper notes that the implementation follows mainstream open-source standards and has been merged into the Hugging Face Diffusers workflow. The quickstart at the top of this page shows the guider swap.

View Diffusers PR

Orthogonal to guidance rescaling

MAMBO-G stacks with Guidance Rescale and Adaptive Projection Guidance because it controls scale rather than redesigning the guidance direction.

Baseline CFG	0.12
Rescale	0.73
Rescale + MAMBO-G	1.12
APG	0.85
APG + MAMBO-G	0.96

Robust defaults

Ablations show that the exponential mapping works best among tested schedules, while performance stays stable over broad hyperparameter regions.

Citation

BibTeX

@misc{zhu2025mambog,
  title        = {MAMBO-G: Magnitude-Aware Mitigation for Boosted Guidance},
  author       = {Shangwen Zhu and Qianyu Peng and Zhilei Shu and Yuting Hu and Zhantao Yang and Han Zhang and Zhao Pu and Andy Zheng and Xinyu Cui and Jian Zhao and Ruili Feng and Fan Cheng},
  year         = {2025},
  eprint       = {2508.03442},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2508.03442}
}