Running Diffusion Model Locally on My PC

Hello and welcome back to my blog!

I remember how impressed I was a few years ago with text-to-image models like Midjourney when it came out. That, along with some other models, provided the spark for generative AI and introduced a lot of new users to the powers of this technology.

These models have improved a lot lately, and I keep testing them online to see how the image generation has improved over the years. Most of these models are not free to use or come with limits. That was the reason I wanted to run it locally, so I could bypass all the limitations of the online services. Fortunately, I have a GPU with 12 GB of VRAM, which is good for running the latest Stable Diffusion models locally. My NVIDIA RTX 3060 12 GB VRAM card is just enough to run these models.

I looked up a guide for installation. The easiest method I found is by using a WebUI like Automatic1111.

Installation was smooth

The installation took some time as it downloaded the various packages and large models to my system. The model I chose to download was the Juaggernaut XL model, which is a community-trained model on top of the latest Stable Diffusion XL architecture. I did some research and found out that this works well for beginners like me without fiddling too much with post-processing steps. If you are on an AMD GPU, it might take some more steps to properly set up everything. But for me, it was a really simple process.

I tested this model with a lot of prompts, and I want to showcase some of those here. All of them were generated using this model.

This was an attempt at macro photography. Not too bad. It was just a 512px by 512px pixel upscaled to 1024px by 1024px. That's why you can see a lot of random blur. You can generate a 1024x1024 image; it might be a slower generation and will take a lot of VRAM as well.

How about faces? We have all seen these models getting better at generating faces. They still struggle with smaller details, especially fingers. But for faces, I should say, my results were better than I was expecting.

I tried to generate some portraits with a pencil artwork style. I don't know how to write good prompts, but even then, these results are not bad.

Line Arts are cleaner with very few visible issues in the generated artwork. There are always a couple of mistakes here and there; it is not perfect. But with some post-processing steps, you can even fix those.

I love how these Text-to-Image models were great at making artwork based on a specific style. Here is an old wooden house in a forest during autumn in the style of oil painting. It turned out well.

That roof is a bit weird. But the fact that all of it is generated in a few seconds locally, without any creative or technical limits that you find online, is amazing.

It still struggles with Texts!

One of the things that didn't improve much is the text on pictures. I know why it is so hard for these models to spell words correctly. Because of how they form these images, they make a random noise and iterate over it to come up with patterns and details later. This makes it super hard for the text generation. There are some specialised models that you can use.

I tried to spell Hive, and look what it generates.

Look at this neon city scene that I generated. The signboards are all nonsense words.

But overall, it is such a good tool for artists for quick experiments, seeing if a concept can work or just for some references. But I would still not consider it as a replacement for human artists. It is still a long way from that. But for fun and experimentation, it is a good tool to have. I might start using it to create a few cover images if I don't find any on free image sites.

I will leave you with a couple of creative shots.

Let me know your thoughts about generative artworks and whether you would love to have it running locally on your system, without internet or policy restrictions, and other limitations.

Here is a guide for installing Stable Diffusion on your system locally. Credits to the original creator.

Until next time...

_{All the content is mine unless otherwise stated.
Images were generated using the Juggernaut XL AI model via Stable Diffusion (AUTOMATIC1111 WebUI) and are for illustrative purposes.
Banner created in Canva.}

This post has been manually curated by @bhattg from Indiaunited community. Join us on our Discord Server.

Do you know that you can earn a passive income by delegating to @indiaunited. We share more than 100 % of the curation rewards with the delegators in the form of IUC tokens. HP delegators and IUC token holders also get upto 20% additional vote weight.

Here are some handy links for delegations: 100HP, 250HP, 500HP, 1000HP.

_{100% of the rewards from this comment goes to the curator for their manual curation efforts. Please encourage the curator @bhattg by upvoting this comment and support the community by voting the posts made by @indiaunited.}.

This post received an extra 20.00% vote for delegating HP / holding IUC tokens.

Sort:

Trending

[-]

zoe01 (65) 4 days ago

Well, this post is very informative. The part about text generation made me laugh x'D , it’s always funny how AI struggles with spelling. This is gonna be helpful for beginners.

$0.02

2 votes

pravesh0 (75) 2 days ago

There are some models that are good at spelling now. But fundamentally, all of them struggle with it.

$0.00

indiaunited (69) 4 days ago