Tutorial- Creating Cute Cat Images using Stable Diffusion
#How to Create Awesome Images with Stable Diffusion
I published this tutorial on my blog in early February '23. It was well received. Therefore, I thought of adding that blog post as a chapter in this book. Hope this simple tutorial will benefit many novice users.
In this post, I will introduce you to some commands to help you create images using Stable Diffusion, which is an Artificial Intelligence based image generator. Extra points for choosing your favourite cat picture from this post !
The arrival of Artificial Intelligence (AI) based image generation or AI Imaging tools over the past several months have taken the world of AI Art by storm. Dalle2, Midjourney and Stable Diffusion have become well known names (and tools) in this space in a short span of time. I began experimenting with AI imaging tools from early August 2022. Since then, I have created thousands of images. Majority of them have been generated using Stable Diffusion. There are several tools such as lexica.ai, Prompthero, and the prompt database at Stability.ai, which can help the beginner figure their way around with generating AI images.
At the time of publishing this post, the most recent version of Stable Diffusion is Version 2.1. We will use the Image generator form Stable Diffusion for this post.
This Cat Does Not Exist (in real world, that is!)
Below post is based on my own experimentation and experience when I was exploring AI Image generation techniques for my upcoming book, titled
#An Eye for AI : How to create amazing images using AI Imaging
You can learn more about this book by visiting artwithai.in
What is Stable Diffusion and where to get it?
Stable Diffusion is a powerful AI imaging tool that allows users to generate stunning images quickly and easily, with minimal effort. You can experiment with Stable Diffusion to generate AI images using different materials, art styles and lighting. There are many tools for generating AI based images. Stable Diffusion is open source, and that has prompted many developers to create tools that use Stable Diffusion at the back end. These include free, freemium and premium tools.
PlaygroundAI and Stable Diffusion space on Huggingface are example of free sites. You can generate images for free on these sites.
Thumbsnap, Starryai and Nightcafe are examples of freemium sites. They all offer a certain amount of free credits that can be used to generate images.
Supermachine is an example of a premium or paid tool that offers Stable Diffusion and many other imaging algorithms or apps.
For the sake of simplicity, we will use an existing web-based tool that is either free or freemium. For generating images for gaathastory and my blog, I typically use Supermachine. I have a paid subscription for this awesome AI Image generation service. In the free or freemium space, my personal choices, in no particular order, are Thumbsnap, Nightcafe, and PlaygroundAI.
Oil Painting of a cat. Image generated using art.elbo.ai
For first timers looking to step into the world of AI Imaging tools, the Stable Diffusion web version could be a good starting point.
Advanced users might want to try out more complex prompts at Dreambooth or Midjourney, two more awesome AI Imaging sites.
A grungy yet cool cat !
How to Generate AI Images Using Stable Diffusion?
An introduction to Prompt Engineering
In recent weeks, there have been many posts, tutorials and other information on the importance of something called as "Prompt Engineering". Let us quickly understand what this term means.
In plain and simple terms, prompt engineering refers to the art and science of crafting the right combination of commands that will help you generate great images.
Some folks suggest providing as detailed and clear set of instructions as possible. While others suggest starting with very simple commands, and slowly adding complexity or details, one or two modifications at a time.
I like the latter approach. If we start with very complex set of commands, not everyone might understand what each command stands for, or how it can enhance an image. In particular, when we are dealing with many moving parts such as medium, material, lighting and art style.
Confused? Let us try and look at the above statement with a simple example. Below is an oil painting of a cat. We will use this image as a template for generating images using Stable Diffusion.
AI generated image of a cat
#Example of Generative Art: Painting of a Cat
First of all, how about we do something different, say we start with a simple command which will create an oil painting of a cat ? The prompt will read something like below:
Oil Painting of a Tabby cat (or a Persian Cat)
Persian Cats? or...
A Tabby cat?
I liked the Tabby cat images more, we will stick with them for the rest of this post. You will notice that Stable diffusion web tool can generate upto 4 images at a time, for the rest of our example we will only use one of the four options.
Note: Not to be confused with 'Smelly Cat'.
Smelly Cats? That is a whole new blog post altogether
#Adding more elements to the Prompt
Remember what I mentioned earlier about starting simple and adding one or two elements at a time? Let us give it a try !
We refine our above command by defining the background, lighting, and finish.
The modified command is as follows:
Oil painting of tabby cat, dark background, matte finish, volumetric lighting
The result? Below is the best of the four options that the AI tool gave us. Not too bad, but not stunning either.
Tabby Cat with a dark background
Now, we refine our command further, as below:
Oil painting of a tabby cat, dark background, highly detailed, volumetric lighting, 8k
A stunning oil colour painting of a tabby cat, dark background, highly detailed, 4k
#Do you prefer materials other than oil paint?
What if we change the material from oil to ink?
A stunning ink painting of a tabby cat, dark background, highly detailed, volumetric lighting, 8k
Ink Painting of a Tabby cat, generated using Stable Diffusion 2.1
And another trial, back to oil painting
A stunning oil colour painting of a tabby cat, dark background, highly detailed, 4k, digital art, concept art
Painting of a cat, generated using Stable Diffusion 2.1
The cat in the above picture looks a little lost or sad. We can do better. Therefore we now add a painting style or an artist's name to the prompt. You may find the names Greg Rutkowski, WLOP, Alphonse Mucha and artgem mentioned many times. For our painting, we will try a couple of artists,namely, Raja Ravi Verma and Roberto Ferri.
For the final iteration, I decided to try another tool, Thumbsnap. I also changed the aspect ratio of the image to a portrait style, instead of a square. The result was less than delightful. Therefore, we go back to Stable Diffusion. With the above command, I could not choose between the two images that were generated. Therefore, presenting both of them below.
Portrait of a cat generated using Stable Diffusion 2.1
Another portrait of a cat generated using Stable Diffusion 2.1
#Wrapping it up: AI image generation is an Art and a Science
In the above example, you might have noticed that generating an awesome image requires a bit of trial and error. In our case, it took us about 4 or 5 attempts to get the right look and style for the cat. It gets more complicated if we try to render images of humans or more complex patterns such as a spaceship or a building. In case of AI generated images of humans, the common grips is that the hands or face is deformed, or there are simply too many limbs. But remember, AI image generation is a new albeit rapidly evolving field. I am sure that in a few more versions, the AI generation tools will get things right.
Tip: In the above example, you can try adding options such as biopunk or synthwave style in order to get some lovely variations in the images. Below is an example of output using Synthwave style
Portrait of a cat generated using Stable Diffusion 2.1
Once you are satisfied with the results, you can download the image in png or jpg format. Some image rendering sites such as Nightcafe and StarryAI also allow you to share the images via social media or email.
Did you find this introduction to AI Image generation using Stable Diffusion useful? Would you like me to write similar short intros to Midjourney, Dalle2 and some other AI imaging tools? I would love to hear from you !
#Bonus: More Image prompts for AI Image generation
I have mentioned earlier in this post that you can experiment with materials, lighting, art form and artists, and image quality when you are generating prompts for AI Imaging. I have generated several images that have greatly enhanced the 'look and feel' for our podcasts at gaathastory. Below is an image from a recent episode of Baalgatha, a podcast of children's bedtime stories.
We can create images using different materials (such as watercolour, ink, acrylic), art styles (such as baroque, renaissance, steampunk, synthwave) and experiment with image quality (such as 4k, matte finish) and lighting (such as octane render, volumetric lighting).
Below is my template for generating simple yet delightful images using AI image generators. This should help a majority of the users learn the ropes of creating images.
A beautiful picture of a cute cat, oil painting by Paul Gaughin, biopunk style, 4k, matte finish, digital art, highly detailed
A beautiful painting of a cute cat, watercolour by Edvard Munch, synthwave style, 4k, volumetric lighting, digital art
Have fun trying out AI imaging! I would love to see what images you generate!