kaliyuga cross-posted this post in The Latent Space 4 years ago


ruDALLE: A Fully-Trained, Accessible, Multi-Billion-Parameter Answer to OpenAI's DALL-E Model (in Russian)

in Alien Art Hive4 years ago (edited)

A few months back, I wrote a quick introduction to VQGAN+Clip, a text-to-image AI tool that was sort of an attempt to recreate some aspects of openAI's DALL-E model. DALL-E is, in brief, a model that generates absolutely stunning images using just a text prompt.

Unfortunately, though, DALL-E is not publicly available. Although there are now a number of really good options for doing this sort of thing using other tools, nothing so far has really come close to the quality seen with DALL-E.

Screen Shot 2021-11-02 at 1.47.01 PM.png
Examples of DALL-E's output taken from OpenAI's blog

Today, however, Sber AI and Sber Devices have launched ruDALLE, and it looks like it might be a contender. There are two sizes; ruDALLE Malevich (XL), a 1.3 billion parameter model, and ruDALL-E Kandinsky (XXL), a 12 billion parameter model which the creators claim is comparable to OpenAI's DALL-E.
Although text inputs must be in Russian, it's pretty easy to translate between English and Russian using free online tools, and the results look to be amazing.

From the Malevich (XL) Model:
Screen Shot 2021-11-02 at 2.09.17 PM.png

From the Kandinsky (XXL) Model:
Screen Shot 2021-11-02 at 2.12.09 PM.png

If you want to try ruDALLE yourself, you have a few options (as of November 3, 2021):