The Checkpoint Issue #8 – StyleGAN sliders and lions with hands

Welcome to this week’s Checkpoint, featuring real-time StyleGAN manipulation, new developments in AI-generated video and a contender to beat DALLE-2.

The Clearing, VQGAN+Clip / Pytti
News and articles
Google Imagen – Google recently announced Imagen, a text-to-image AI model that creates more realistic images than DALL-E 2, the previous state-of-the art.
Open source version of Imagen started – Sadly Google opted not to release the code or model due to bias fears. But happily, Lucidrains (who was also behind the DALL-E 2 implementation), has begun an open source version. Follow the progress at the link.
LAION Aesthetic – A subset of the LAION 5B dataset that only contains aesthetic images (as determined by AI, of course). Instructions for using it are on the Github page, and it’s also available in Majesty Diffusion.
CogVideo – Impressive-looking new model for creating text-to-video. No code as yet, but you can try out CogView, the still image version, here. Plus the lion drinking water is hilarious.
Flexible diffusion modelling of long videos – Another video creation model, but this one can generate photorealistic, coherent videos that last over an hour after being given just a few starting frames. Paper.
Greg Brockman
Any sufficiently advanced magic is indistinguishable from AI.
Featured notebook – Pixel Alchemist
PixelAlchemist is a fun interface to StyleClip, providing the ability to edit images in realtime with custom prompts and handy controls. The notebook comes with a bunch of different models, including FFHQ, churches, cars, animals and posters, but it’s also possible to load your own.
Dragging a slider and seeing the effect immediately really brings home how powerful these models are, and it’s a lot of fun to experiment with.
