Get started with making AI art in 2022
If you are interested about the current AI graphic art scene, this is meant for you. I wrote a short post aimed at newcomers as the very first thing in this blog many moons ago – when I was a rank beginner myself. Since then, things have advanced so rapidly that I felt a completely new “guide to the perplexed” was needed, helping people to start navigating in this wonderful new space.
I’ll start with an introduction, and then we’ll go over information and resources on three levels of engagement that correspond roughly to hobbyist, artist and finally artist-developer. This posts focuses heavily on the most recent development in the scene, CLIP (Contrastive Language-Image Pretraining) which is now so ubiquitous in social media so as to be almost synonymous with AI-generated art.
But let’s begin with some definitions and context. If you are in hurry, feel free to skip the introduction and go straight to the links and resources below, you can always come back later.
What do I mean by AI Graphical Art
AI Art here is shorthand for visual art generated by neural networks. Neural networks are data processing structures that are loosely inspired by the mode of working in the animal nervous system, that can be trained to perform various tasks such as image classification, speech synthesis, source separation of instruments in audio file etc. The study and discipline of designing and training neural networks is called deep learning.
AI Art should not be confused with deepfakes – videos showing real people engaged in actions that they never actually committed. AI Art may employ, but is separate from style transfer – sharing the visual outlook of an image with another, making them stylistically similar. These approaches all use neural networks to generate artistic, or naturalistic result and combined together, they may be seen to belong to the larger field of synthetic media. In my opinion, what distinguishes AI Visual Art from the rest is the artistic intent belonging to the person who operates the deep learning system (the “AI”); it is not that she or he wants primarily to simulate something already existing in real world or mix up already existing artistic styles – but to bring out something truly new and unique, to enhance and combine one’s own creative instinct with the machine-derived “synthetic” expression, In a sense, AI artist gives the deep learning system “a voice”, means of expression, but in actuality generative neural networks are simply new artistic tools, somewhat analogous to synthesizers in music production.
(Speaking of music production, for the purposes of this guide, I’m going to handle only graphic art, since that is where most of the action is concentrated now, so I’ll be disregarding for now all the marvelous work that has been done with AI audio/music in the past years. But if you are interested you can pick up info from earlier posts in this blog, especially those dealing with OpenAI Jukebox and Lucid Sonic Dreams.)
Many would agree that the Deep Dream program by Alexander Mordvintsev (2015) was the first true AI art system. The central feature of this system was to take a network trained to recognize different visual pattern, and turn it inside out, so that the individual neurons of the network are simulated and made to “dream up” visual elements that resemble exactly those things that they have learnt to recognize. It is this “hallucinatory” and creative aspect of neural networks, that encapsulates what is meant by AI Art here.
AI Art and GAN art
Chances are that you have come across the word “GAN art”, which is also sometimes used to cover all post-Deep Dream AI art. GANs are special types of neural networks, consisting of “generator” and “discriminator” networks. In the case of image-generating GANs, the generator network acts as an artist, and the discriminator acts as a critic during the training. When the generator becomes so good that the discriminator can’t tell apart whether an image was produced by the generator or originated from a human source, we can then use the generator to produce art.
Among well-known and still widely used GANs that were introduced around 2017-19 are StyleGAN and BigGAN. StyleGAN is skillful at making very realistic human faces – if you haven’t already, go ahead and try it out.
BigGAN is based on a large dataset of images of 1000 categories. The typical use of GAN’s is to find objects that “do not exist”, meaning we can make new images that are “between” different images in the original dataset, often with bizarre and hilarious result. Artists may train GANs with datasets they have prepared themselves using original material, and then produce new images with this technique, known as latent vector interpolation or more colloquially, “latent walk”
CLIP-steered AI art
In January 2021 OpenAI introduced in the connection of their DALL-E imaging system a very novel type of network called CLIP, that has caused a revolution and ensuing explosion of AI art on the internet. I think it is probable that you are reading this now precisely because you have seen results of CLIP and wish to know more. CLIP has been trained using a massive dataset of images and their captions to figure out how well a given text corresponds to an image – or vice versa, or image to image and so on, since once images and texts are read in by the network they are on equal footing. What is truly magical about it is that after being trained, it can evaluate and score sentences and images that it has never encountered before, and this means that we can use it to gradually transform any image to agree with a prompt – a text or another image, that serves as the target. (There is no agreed-upon word for this new AI art trend, which is rapidly escalating in popularity – I have suggested “Promptism”, but I encourage you come up with your own term!)
To put CLIP into work, we need to select an image generator. This can be pretty much anything, but neural networks are the most efficient to work with. At the time of this writing, most used generator is VQGAN developed by the University of Heidelberg, another type of GAN, which is particularly good in rendering realistic textures. Also pixel-art-like generators are rather popular, while others prefer to take more direct approach and work with image frequencies (Vadim Epstein’s Aphantasia pioneered this kind of thing) or color spaces etc.
It is not simple thing to offer CLIP-generated content on web, but NightCafe is perhaps the most prominent web service that does just that, so that is the easiest place to start out. I must warn you that it can be very addictive!
You don’t need to know anything about the underlying technology to produce quality images. Very big part of making graphics with CLIP is to think up suitable prompts – putting some thought into phrasing what you want to see and supplying some stylistic direction as to how you want it to be executed can bring up endless array of pleasing results. This sort of play with words is called prompt engineering. Have a look at this guide by @remi_durant, which shows how just adding the names of different art periods cause significant variation to the depiction of same subject.
This very extensive investigation into different style prompts is already a popular source of ideas:
CLIP + VQGAN keyword comparison by @kingdomakrillic
Another great site, that allows you to generate AI art on the spot is Artbreeder. It does not use CLIP, but is based on GANs. Users take as starting point the earlier creations of users and can interpolate between them to make new creations such as images of faces, landscapes and album covers. Beautiful animations are also possible. Having been around from 2018, Artbreeder occupies a venerable position and a great starting point to start your own forays into the realm of GAN art.
Getting more involved: notebooks
If you have played with the above services and feel hungry for more, a good next step is to get acquainted with Google Colaboratory.
That is the mainstream way of doing AI art with flexibility. To do any serious generation with neural networks, you need an access to a high-grade graphics card, and this is what Google provides. I recommend you check out their free option first, and then, if you want to go deeper, subscribe to the paid tier – the $10 per month that the Colab Pro costs gets you a long way. Of course, if you have a suitably beefy GPU already at hand, then you can run the notebooks locally.
Colab (actually Jupyter) notebooks are basically small interactive and annotated development environments running Python programming language. You are expected to input values into a form and then push buttons to make things happen. It is totally possible for you to get by without any coding skills, but be aware that as the software repositories and libraries keep changing, notebooks can break, and if you have even a little working knowledge of Python, that can get you past problems faster than asking questions on discord or googling for errors.
There are lots of AI art-related notebooks, many of them used for specialized tasks like image upscaling. These days, the most celebrated ones of them use CLIP. Among the very first CLIP image-generation notebooks was Big Sleep, r eleased in January 2021 by Ryan Murdock, (the link is to a simplified version of the notebook by lucidrain) It used BigGAN as image generator and was for many as their first introduction to CLIP. Big Sleep served as inspiration for the CLIP+VQGAN (z+quantize) notebook by Katherine Crowson, the progenitor of the vast majority of CLIP notebooks now available. Released in April 2021, it exploded in popularity in the following summer when a Spanish version of the notebook featured on YouTube. There exists now numerous spinoffs and variations of the original z+quantize notebook.
Crowson is also creator of another approach to CLIP-steered generation, the Guided Diffusion notebook, which uses a totally different way of generating images. The diffusion images tend to be very elegant (when successful) and clear, with crispy outlines. This notebook too has a number of successors, one notable being Quick Diffusion notebook by Dan Russell.
To find more, this blog does a thorough job of evaluating CLIP notebooks, with example images and descriptions
Then there are two extensive lists maintained by Reddit user u/Wiskkey
List of sites and programs that use CLIP
Situated somewhere in between web service and Colab notebooks in terms of ease of use is RunwayML, a learning platform with a nice web interface to try out different AI generators and models.
How does it all work?
To break into new domains of expression and productivity and to boldly go where no one has gone, you need to be able to modify existing notebooks, which means you need to have intermediat-ish knowledge of Python and some of its standard libraries such as Numpy. In almost all the notebooks discussed here, PyTorch is the deep learning framework being used, so mastering the technical basis of AI art today means learning PyTorch.
At the time of this writing, in my opinion the number #1 way to dive in the topic is to join the AiAiArt Discord.
There you will find an extremely well done set of Colab notebooks written by Jonathan Whitaker, together with companion videos, that can teach you all you need to compose your own notebooks, and the channels are a treasure trove of further information.
As a alternative, there is the Udemy course Generative A.I., from GANs to CLIP, with Python and Pytorch
by Javier Ideami. It is very fun but not at all shallow course that ranges from visionary futurism to the nuts and bolts of the mathematical equations. Concentrates on coding GANs, but deals with CLIP too.
Although both the AiAiArt and the and Ideami’s course are geared for beginners, there might be parts where you feel that a more simplified explanation is needed. For that purpose, a book that I recommend is Make your first GAN with PyTorch by Tariq Rashid, which is about the gentlest introduction to GANs and Pytorch that you can find. The same author also has written even more comprehensive book that is introduction to coding neural networks. I haven’t read it, but I have no doubt it is on the same level of accesibility as the GAN book.
As for learning PyTorch and deep learning in general, Deep Learning with PyTorch Step-by-Step by Daniel Voigt Godoy is easily the best guide that I’ve found. I love how this huge hands-on tutorial it is structured, it starts from the ground level, then after showing the basic things, it goes straight into computer vision topics and in the end you get to know transformers and word embeddings, all of which play important part in the inner workings of CLIP.
What other discords are worth of joining?
First mention belongs to EleutherAI and its art section: New developments can be usually found here. I recommend you become a listening pupil here – taking in part in discussion presupposes in-depth understanding of computer vision and machine learning.
For a much more newbie-friendly server, I recommend Latent Space. There are people with good coding knowledge here, but in general the atmosphere here is more artistic and jocular. Also features a number of notebooks that are hard to locate elsewhere.
VQLIPSE discord, this is maintained by Henry (Я) aka @sportsracer48, the author of the awesome Pytti notebook, which is absolutely a must if you’re into animation (To get access to the beta versions, here is a link to Patreon). Many enlightening discussions and loads of eye-popping renders.
So, I hope you have found these links along with the history lesson I gave, helpful and my descriptions informative, Lots of tutorials and other introductions can be found by googling “CLIP VQGAN”. I intend to update this post every now in a while, so if you feel I have missed something essential or useful, feel free to ping me, either in twitter or by sending me an email (johannezz.music at gmail). Now go forth and make some neural art!