Nvidia released a paper about a 100KB text-to-image model that only trained for 4 minutes but claims to be better than bigger models
Key-Locked Rank One Editing for Text-to-Image Personalization
https://research.nvidia.com/labs/par/Perfusion/
Key-Locked Rank One Editing for Text-to-Image Personalization
They also claim that it only takes about 8 seconds to generate various good images.