Kannan Janardhanan joins Avenga as Director of Account Management
Avenga, a global technology solutions partner, announces Kannan Janardhanan as Director of Account Management in North America.
The hot news in the data science and Machine Learning space stirred the entire global AI community.
“Stable Diffusion is now available as a Keras implementation, thanks to @divamgupta!” announced the famous François Chollet. I could not stop thinking about it, not even for a second.
Technology journalists wrote articles, and enthusiasts shared and reshared tweets and blog posts about it. And, AI-generated images flooded the Internet.
What is it? It’s a very effective Stable Diffusion algorithm implementation.
What it does is generate images based only on the text prompt provided by the user.
Image 1. An AI generated image from the prompt “Avenga Labs article cover cyberpunk style”
You can think of it as an AI painter who is told to paint something unique using English words. It is amazing that it can run locally on laptops and even mobile devices. This is how efficient the implementation is.
How it works internally is a very complex piece of advanced tech. This time I’m more focused on how well it performs, how to make it work on a local computer and what images it can generate.
Image 2. An AI generated image from the prompt “Avenga Labs article cover cyberpunk style”
Avenga Labs could not wait and ignore this hype, as we did with GPT-3 in the past, so I decided to try it myself. I wanted to see how it works in practice and to pave the way in making it work first, and then share the results with you.
Let’s start with the installation adventures.
The common first step is to download the repository from GitHub.
git clone https://github.com/divamgupta/stable-diffusion-tensorflow
After that, I wanted to try to launch it on a Windows laptop with a GPU using CUDA for ML acceleration, then with Mac Mini M1 which uses post-Intel architecture and has a built-in GPU.
Let’s start with a NVIDIA GPU which is readily available at home, which is why I switched to a Windows 11-based gaming laptop (NVIDIA RTX 3070).
At the time of writing this, it was recommended that we install an older version of TensorFlow.
pip install tensorflow==2.9.2
CUDA was installed without any issues.
However, I encountered a myriad of issues with the DNN part of the CUDA framework.
I fixed them by downloading the CUDNN package using my NVIDIA Developer account, extracting the cudnn*.dll files, and then copying them to:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin.
Unfortunately, it still did not work and I had to fix an ZLIB DLL issue as well.
I had to download ZLIB DLL for Windows 64-bit from the official source.
It still refused to work and this time it crashed because of the insufficient VRAM on the GPU. The GPU had 6 GB of RAM which should have been sufficient, but for some reason the script was unable to allocate even 3 GB of that memory.
I had to close all the apps using the GPU which included all the web browsers and Visual Studio Code (!). I had to work with just a plain console window.
This moved me back a little bit, but it still refused to work.Another idea? I tried to make the generated images smaller than the default (512 pix x 512 pix) size. Using trial and error, I figured out the maximum width of 512 pixels and height of 256 pixels as the:
python text2image.py –W 512 –H 256 –prompt=”Sunset at sea” –output=”sunset_at_sea1.png”
I tried to make it work on my favorite home computer which is a Mac Mini M1 16 GB.
Data science-related workloads used to be a compatibility nightmare, but now almost everything works and even uses built-in Metal APIs for GPU acceleration.
Let’s start this variant of the experiment with a TensorFlow installation.
I needed to install a Miniconda for an M1 edition. I downloaded the .sh version and it worked perfectly for me.
Miniconda — conda documentation
I started with upgrading the ML libraries.
pip install numpy –upgrade
pip install pandas –upgrade
pip install matplotlib –upgrade
pip install scikit-learn –upgrade
pip install scipy –upgrade
pip install plotly –upgrade
conda install -c apple tensorflow-deps
pip install tensorflow-macos
pip install tensorflow-metal
A Stable Diffusion GIT repository contains the version of the requirements file that is specific for a M1 architecture.
Let’s change the directory to the root of the Stable Fusion repository.
pip install -r requirements_m1.txt
It… crashed, again and again… The last thing that came to my mind was to upgrade the Python version.
conda install python
And, it finally worked! And, I could stick with a 512×512 image size without resorting to reducing the resolution. So in other words, this was a much faster and easier installation process compared to the Windows laptops with CUDA support.
M1 generated the images in between two and three minutes compared to around thirty seconds on the NVIDIA GPU, which is still acceptable.Finally, I could not wait to see the first results! Let me share them with you.
Let’s start with something I know.
People working at the office in front of their computers. OK, I can live with that as the head of Avenga Labs. The faces are distorted in the picture, unfortunately.
They are happy indeed, very ‘body’ positive, and not bothered at all with how their faces look.
The AI painter fully understood the intent and this does look like a lesson at school. Regrettably, the faces are distorted (again).
Let’s switch to paintings and see how the engine performs as an AI artist.
Perfect! Let’s try it again.
Is an elephant on another moon? We did not exclude that possibility.
Excellent!
Excellent!
I don’t know what is going to happen with our planet, and neither does the AI. However, I consider this generated image a success.
The humans are represented by two women and the alien friend is single, which is fully in line with the prompt.
The paintings were of much better quality and were more pleasing to the eye than the engines trying to generate pixel-perfect photographic images.
It is a map of a city, but I’ll let you decide if it was a success.
Perfect!
I could go on and on, but let’s wrap it up.
Soon we’ll be able to ask our phones to generate compelling photos of vacations at places we have never visited before in order to share them on social media. Is it good or bad, I don’t want to be a judge of that.
But what’s in it for business? I can think of generating synthetic images for Machine Learning purposes (medical CT images for instance).
The other application I can think of is to generate custom images for marketing purposes. They will be original and unique, and unlike ones downloaded from the internet.
As usual with new technological capabilities, only time will tell which business applications will stay with us and grow.
The problem I see at the moment is its unpredictable behavior, because I had to generate multiple images and pick the best one to share.
It’s amazing how AI image generative techniques have progressed. The things that were totally impossible only a few years ago and which required supercomputers with tons of TPUs can now run on relatively small devices, laptops, and even a M1-based Mac Mini.
Everything uses local models, does not access the Internet, requires no cloud services, etc. We could even use an entirely CPU-based generation, but it simply took too long (around 20 minutes per image).
If you have half an hour or two to spare, please give this fantastic model a try. It’s still not there yet, at least when it comes to human faces, but it’s closer than ever to computers painting exactly what we ask them to paint while using only local models.
* US and Canada, exceptions apply
Ready to innovate your business?
We are! Let’s kick-off our journey to success!