ViQ quantizes visual inputs at arbitrary resolutions into discrete representations, achieving 20–70% training acceleration compared to continuous image encodings.
Different layers perform different roles and could therefore enable non-uniform distribution of parameters and computational resources as an alternative to constant architectural width.