What is GLIDE?
GLIDE (Guided Language to Image Diffusion for Generation and Editing) implements the CLIP model and uses a diffusion model to generate images from what is essentially noise.
CLIP (Contrastive Language–Image Pre-training) was released with DALL-E 1 in 2021 and is a neural network which efficiently learns visual concepts from natural language supervision. It can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3.
While the GLIDE model produced an impressive set of images from text prompts, it was limited to upscaling 64x64px images which further limits the final resolution. The new DALL-E 2 model increases the base resolution and allows effective upscaling to 1024x1024px.
The GLIDE model CoLab (features a reduced set of hyperparameters and limited output resolution compared to DALLE2) produced some interesting results when I asked it to create landscapes on mars. While a couple returned rather thick clouds (not something I suspect is common on mars, the other attributes seemed correct, disclaimer: I am not an astrophysicist !
GLIDE seemed to excel at creating nebulas, returning some of the most visually pleasing and beautiful images of the collection.
When asked to make an abstract impressionist style painting it came up with these rather beautiful collections. The intention was to generate a composition with some content that showed how the computer interpreted the concept of "abstract ideas". It produced no organized content instead repeatedly delivering chaotic yet coherent images with beautiful color compositions, something that may fit easily in an art gallery here in Miami.