DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

The Hebrew University of Jerusalem

*Indicates Equal Contribution

TL;DR

DyPE (Dynamic Position Extrapolation) enables pre-trained diffusion transformers to generate ultra-high-resolution images far beyond their training scale. It dynamically adjusts positional encodings during denoising to match evolving frequency content—achieving faithful 4K × 4K results without retraining or extra sampling cost.

Abstract

Diffusion Transformer models can generate images with remarkable fidelity and detail, yet training them at ultra-high resolutions remains extremely costly due to the self-attention mechanism's quadratic scaling with the number of image tokens. In this paper, we introduce Dynamic Position Extrapolation (DyPE), a novel, training-free method that enables pre-trained diffusion transformers to synthesize images at resolutions far beyond their training data, with no additional sampling cost. DyPE takes advantage of the spectral progression inherent to the diffusion process, where low-frequency structures converge early, while high-frequencies take more steps to resolve. Specifically, DyPE dynamically adjusts the model's positional encoding at each diffusion step, matching their frequency spectrum with the current stage of the generative process. This approach allows us to generate images at resolutions that exceed the training resolution dramatically, e.g., 16 million pixels using FLUX. On multiple benchmarks, DyPE consistently improves performance and achieves state-of-the-art fidelity in ultra-high-resolution image generation, with gains becoming even more pronounced at higher resolutions.

FLUX vs. FLUX + DyPE Comparisons

Each pair shows the original FLUX result (left) and our FLUX + DyPE version (right).

Click any image to open the full-resolution version in a new tab.

“A white church with a golden dome stands against a bright blue sky, featuring arched windows and a cross on top, while a solitary figure walks nearby.”
“Neon signs illuminate a rainy street filled with people, a person in a white raincoat cycling, others walk with umbrellas and a man sits by a food stand.”
“A muscular, bald man holds a flower above his head with both arms, set against a soft circular background in black and white.”
“A gray pitcher with a handle filled with white flowers sits on a wooden surface beside three lemons and a small petal.”

How DyPE Works?

Training diffusion transformers at ultra-high resolutions is costly, and static positional extrapolation ignores diffusion’s low → high frequency progression. DyPE makes extrapolation time-aware to match how content emerges during denoising.

Key Observation

Early steps stabilize low-frequency structure; later steps refine high-frequency detail. DyPE follows this progression instead of using a fixed spectrum allocation throughout sampling.

Step-Aware Scaling κ(t)

DyPE introduces a scheduler κ(t) = λs · tλt that decays from strong scaling at the beginning to no scaling near the end. Intuition: early: emphasize global structure; late: restore high-frequency capacity.

Plug-and-Play Variants

  • DY-NTK: frequency-aware scaling where the NTK exponent is multiplied by κ(t), relaxing toward the original training PE late in sampling.
  • DY-YaRN: YaRN’s ramps are moved toward no scaling via a scheduler κ(t), with YaRN’s attention-temperature tweak retained.

Method Comparison (4K × 4K)

Each row shows the same prompt rendered by five methods: FLUX, NTK, Dy-NTK (ours), YaRN, and Dy-YaRN (ours).

Click any image to open the full-resolution version in a new tab.

"Woman with short hair and black dress stands in a forest, holding an owl with large, outstretched wings. The sunlight filters through the trees, highlighting her tattoo and the owl's detailed feathers."
“A man with an afro hairstyle wears futuristic reflective sunglasses and a coat with fur lining, standing in front of a vibrant pink and blue neon sign.”
"A stone archway covered in ivy leads into misty forest, lined with lush greenery and ferns, mushrooms grow at the base of trees. Sunlight filters through the canopy, creating a serene atmosphere."

This work is patent pending. For commercial use or licensing inquiries, please contact the authors .

BibTeX

@misc{issachar2025dypedynamicpositionextrapolation,
      title={DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion}, 
      author={Noam Issachar and Guy Yariv and Sagie Benaim and Yossi Adi and Dani Lischinski and Raanan Fattal},
      year={2025},
      eprint={2510.20766},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.20766},
}