ruDALLE: A Fully-Trained, Accessible, Multi-Billion-Parameter Answer to OpenAI's DALL-E Model (in Russian) [EDIT November 3, 2021]

in Alien Art Hive4 years ago (edited)

A few months back, I wrote a quick introduction to VQGAN+Clip, a text-to-image AI tool that was sort of an attempt to recreate some aspects of openAI's DALL-E model. DALL-E is, in brief, a model that generates absolutely stunning images using just a text prompt.

Unfortunately, though, DALL-E is not publicly available. Although there are now a number of really good options for doing this sort of thing using other tools, nothing so far has really come close to the quality seen with DALL-E.

Screen Shot 2021-11-02 at 1.47.01 PM.png
Examples of DALL-E's output taken from OpenAI's blog

Today, however, Sber AI and Sber Devices have launched ruDALLE, and it looks like it might be a contender. There are two sizes; ruDALLE Malevich (XL), a 1.3 billion parameter model, and ruDALL-E Kandinsky (XXL), a 12 billion parameter model which the creators claim is comparable to OpenAI's DALL-E.
Although text inputs must be in Russian, it's pretty easy to translate between English and Russian using free online tools, and the results look to be amazing.

From the Malevich (XL) Model:
Screen Shot 2021-11-02 at 2.09.17 PM.png

From the Kandinsky (XXL) Model:
Screen Shot 2021-11-02 at 2.12.09 PM.png

If you want to try ruDALLE yourself, you have a few options (as of November 3, 2021):

Sort:  

Looks like we all have some more fun to go try out! Thanks for sharing and for giving us the inside scoop. =)

Great. Thanks for sharing this. I will give it a try. So fun

@dbddv01 @castleberry let me know how it works for you--it's super duper slow on free Colab, and I think it's trying to generate a whole bunch of images. Not sure how to get it to only generate one at a time :(

Hi kaliyuga,
to get one image at a time, you have to modify the loop in the colab accordingly

for top_k, top_p, images_num in [
(128, 0.95, 1),
]:

image.png

I just tried it and it produce a picture after 21 minutes (free version) on a 12Gb gpu.
I have to run some other tests but i think here it produces a 128*128 sized picture and upscaled it.
here the result :
rudalle.png

Note: the colab use the Malevich model (XL) = 2.6 Gbytes model, i don't think the XXL can run under free Colab, presumably it would go Out of Memory.
I saw on the github another notebook, allowing to put some init images as prompts... Will explore further when i have time.

This worked super well for me!! Thank you so much :)

I will try and give it a go tomorrow and let you know what i come up with!

Actually, I found a workaround to get three images instead of the huge number you can get with the paid version of Colab--just stop the execution after the first 100% progress bar fills up and then run the visualizer

Awesome, just trying the ruDALLE site! After I gave a prompt, it said it'll take 30 minutes.