Uniform time-splitting creates localized capacity bottlenecks. Standard diffusion models deploy a single network uniformly across all timesteps, despite the fact that no individual denoising regime warrants such massive capacity on its own. As demonstrated above, naively splitting the diffusion timeline into equal time intervals ($t=0.33, t=0.67$) can force one sub-network to bear a disproportionate share of the generative workload, while the others remain severely underutilized.
Complexity-Balanced Splitting (CBS) addresses this by applying de Boor's equidistribution principle to partition the diffusion timeline into segments of equal approximation burden. By shifting the boundaries to align with the local modeling complexity, CBS ensures each sub-network assumes an equal share of the representational burden. This minimizes the maximal local approximation errors, leading to a more uniformly accurate flow field and improved sample quality.
TL;DR
Complexity-Balanced Splitting (CBS) is a principled framework for temporal capacity allocation in diffusion models. By mathematically partitioning the diffusion timeline based on local approximation burden (e.g., path acceleration), CBS significantly improves image synthesis quality, improving FID by ~35% on SiT-XL with CFG relative to naive temporal partitioning, without increasing per-step inference costs.
Abstract
Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS), a principled framework for temporal capacity allocation that distributes the generative workload across multiple specialized sub-networks. Grounded in function approximation theory and de Boor's equidistribution principle, CBS partitions the diffusion timeline into segments of equal approximation burden, allocating more representational capacity to regions where the generative dynamics are more difficult to model. To estimate this local complexity, we introduce two complementary and tractable monitor functions: a spatial measure based on the flow's Dirichlet energy, and a geometric measure based on the acceleration of the sampling trajectories. Using a lightweight auxiliary model to estimate these complexity profiles, our approach eliminates the need for heuristic temporal splits or computationally expensive search procedures. Extensive evaluation across multiple architectures (SiT, JiT, and UNet) and datasets demonstrates that CBS consistently improves synthesis quality without increasing per-step inference cost. In particular, CBS improves FID by ~35% on SiT-XL with CFG relative to naive temporal partitioning.
Maximal Local Errors Govern ODE Approximation
Figure 1: Illustration of how maximal local errors govern ODE approximation. Once a large error is encountered, the resulting path diverges from its true course and the misalignment is maintained forward in time, even if the flow is fairly accurate in the subsequent steps.
When a generative model relies on path-integrated flow fields during sampling, local approximation errors compound severely. Standard training paradigms optimize for the expected error averaged across the entire denoising interval, which fundamentally diverges from minimizing the maximum instantaneous error.
By contrast, CBS splits the generation timeline explicitly to bound these peak errors, as the objective of minimizing the maximal error bound is specifically directed toward better sampling quality.
Tractable Complexity Monitors
To evenly distribute the representational workload, we need to quantify the local approximation burden of the target flow. We introduce two computable, mathematically grounded monitor functions $m(t)$ to measure this complexity.
1. Dirichlet Spectral Energy
Barron's theorem bounds a network's approximation error using the spectral complexity of the target function. Because the instantaneous vector field is too high-dimensional to assess directly, we obtain a formal bound on $C_{v_t}$ based on the vector field's global spatial variation, measured by its Dirichlet energy.
This establishes a direct, computable monitor function based on the global spatial roughness of the flow and the expected approximation error for a parameter budget $n$:
2. Path Acceleration
Alternatively, we can evaluate the geometric complexity of the sampling trajectories induced by the flow field through their second-order time derivative. Using the first derivative (velocity) confounds geometric complexity with traversal speed.
Instead, we opt to bound the error using the path's acceleration, which effectively filters out the constant-velocity displacement and isolates the curviness of the path:
Quantitative Results (ImageNet-256)
Evaluation of Scalable Interpolant Transformers (SiT) on latent ImageNet-256. CBS yields consistent improvements across all capacities, achieving superior FID while strictly maintaining the identical active parameter count and per-step inference cost as the monolithic baseline.
| Model | # Act. Par. (M) | GFLOPS | Without CFG | With CFG | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| FID ↓ | IS ↑ | Prec. ↑ | Rec. ↑ | FID ↓ | IS ↑ | Prec. ↑ | Rec. ↑ | |||
| SiT-S/2 (Baseline) | 33 | 6.06 | 58.97 | 23.34 | 0.40 | 0.59 | 30.10 | 49.82 | 0.57 | 0.50 |
| SiT-S/2 (Uniform: 0.33, 0.66) | 33 | 6.06 | 58.82 | 23.23 | 0.40 | 0.59 | 29.83 | 52.17 | 0.57 | 0.51 |
| SiT-S/2 Ours (0.4, 0.77) | 33 | 6.06 | 50.87 | 29.57 | 0.44 | 0.62 | 18.61 | 72.73 | 0.62 | 0.51 |
| SiT-B/2 (Baseline) | 130 | 23.01 | 34.84 | 41.53 | 0.52 | 0.64 | 16.79 | 84.75 | 0.66 | 0.55 |
| SiT-B/2 (Uniform: 0.33, 0.66) | 130 | 23.01 | 34.52 | 41.45 | 0.52 | 0.64 | 16.51 | 87.21 | 0.66 | 0.56 |
| SiT-B/2 Ours (0.4, 0.77) | 130 | 23.01 | 30.51 | 52.20 | 0.55 | 0.65 | 10.72 | 121.09 | 0.72 | 0.56 |
| SiT-XL/2 (Baseline) | 675 | 118.64 | 18.04 | 73.90 | 0.63 | 0.64 | 6.53 | 162.23 | 0.73 | 0.56 |
| SiT-XL/2 (Uniform: 0.33, 0.66) | 675 | 118.64 | 17.97 | 73.85 | 0.63 | 0.64 | 6.24 | 165.29 | 0.74 | 0.56 |
| SiT-XL/2 Ours (0.4, 0.77) | 675 | 118.64 | 15.81 | 86.43 | 0.64 | 0.66 | 4.03 | 195.72 | 0.80 | 0.56 |
BibTeX
@misc{issachar2026complexitybalanceddiffusionsplitting,
title={Complexity-Balanced Diffusion Splitting},
author={Noam Issachar and Dani Lischinski and Raanan Fattal},
year={2026},
eprint={2606.06477},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.06477},
}