The Faking Of “The Woman In The White Dress”
It is customary these days for even small productions to put out a “The Making of” video to increase the public interest. I’m going to skip that part, since visually-wise, the making of “The Woman In The White Dress” is just not that interesting (as pictured above). Instead I’m going to elucidate in this blog post the creative processes that went into making the video.
But what’s with “Faking”, you ask. Well, first of all I adore bad puns, dad jokes etc. but secondly and thirdly, isn’t that word apt, when talking about art generated by neural networks (henceforth NN’s) as the topic is entangled with the concept of “deep fake”, which rises the question of what is genuine and what is fake, and what does it matter anyway? The whole grail of the synthetic media movement is nothing less than a perfect fake. By the time we have attained the goal, anybody can not only do a blockbuster Hollywood movies in their bedrooms, but also broadcast totally convincing deep fake news. Reality might become a scarce commodity like oil. Maybe there will be wars over reality/realities. (Maybe we are already having them?).
Even though the tools are still in their infancy, I believe “The Woman In The White Dress” could very well fool someone who has not kept up with the evolution of machine learning technologies to genuinely believe that it is a “genuine”, “experimental” animated film, made perhaps by some art student, or maybe even by some accomplished avantgarde artist, albeit rather shoddily (unless that were a deliberate aesthetic choice). What interests me is how would that change anyone’s perception of the video? Would it be stronger or weaker? As a thought experiment, consider the possibility that I’m lying when I say that the video is made by NN’s – that the story + music is actually written by me – I’m just using the buzzword “AI” to attract people’s attention, and also covering up my blunders by passing the responsibility to the yet immature state of today’s AI. Would that change anything?
Indeed, what is real in “AI art” versus conventional art, what is fake, what is deep fake and what is superficial fake, and how does virtual reality sit with all this?
A little background. I see my current Promptist efforts as continuation of a leisurely pursuit that goes back a long way – doodling. Some years ago I would sometimes sit down with a sheet of paper and a felt-tip pen, and do this: First I scribbled a little doodle somewhere near the center of the paper:
Next I turned the paper upside down and tried to see a some kind of resemblance there. Maybe it looks like a bug, so I’d make some additions to make it vaguely bug-like.
Then I rotated the paper back to it’s original orientation, and inspected the doodle. If it did not remind me of anything special, I’d just elaborate simpler forms I’d see emerging and decorate what was already there.
Then again 180 degrees, and so on, and so forth, and sometimes I’d iterate for so long that the paper became filled with a mass of fractal-like level of detail.
It was like an adventure into a galaxy of strange exotic forms, without any idea of where it would advance, without any use of any cultivated artistic skill – the only things I needed were a kind of imagination and some patience to fill in the most minuscule areas. And a pretty obsessive character, I guess. But what fascinated me the most was that it didn’t feel like it was “I” who was drawing. Whenever I rotated the paper, I was presented with a fresh new image and I could see forms that I were not a result of a conscious intention.
It was a sort of game where I’m able to detach myself, to brush myself outside. It felt that I was summoning up imagery that was hidden somewhere inside the fabric of the paper, deep within its microscopic folds.
The resulting drawing appealed to me, not despite, but because of it’s strangeness, as I’ve always liked psychedelic art and surrealism. And in fact the earliest form of surrealism developed from exercises in “automatic writing” in the context of psychoanalysis – or channeling the subconscious, which was what I was doing then, and what I believe I am doing now, only instead of using plain white A4 sheet I’m now using statistical models that encode probabilistic distributions based on huge datasets of images and text.
In place of a felt-tip pen, I use “Halluzinator”, my customized CLIP+VQGAN notebook. The nice thing about it is that I can make interactively movies that morph from prompt to prompt. So I start with a prompt, iterate it for some time, then inspect the result, and if it starts to look something else, I replace the prompt and then it changes to something else. It all happens within one notebook cell with richer UI. It’s fun in the exact same way that rotating, projecting form and drawing them was. I also think it the process has therapeutic qualities, like Jungian sandplay.
On to the genesis of “The Woman In The White Dress”. I had much fun making the “AI music videos” that I started to make about a year ago. Apart from the surreal imagery, they usually also incorporated technical demos of some new experimental notebook features. The last one that I did was especially satisfactory graphically, even though it was just a one long zoom using just one prompt. It is now the most watched music video on my YT channel. I felt that it signaled an end of an era, and I would not keep repeating myself, I needed to do something more challenging. How about I try a longer piece? With a machine generated synopsis? I made GPT-3 generate some stories for me. I forget what the prompts were, but the first one was about the end of the world.
In that story, scientific community predicted that the world would be destroyed on a certain day. People exhibited varying reactions to this information – some prayed fervently, some did heavy partying, still others just carried on their usual daily routines. But then day of the doom arrives and nothing special happens. The story concludes with a lengthy altercation between the people and the scientists. “You told as the world was going to end.” “That’s right, we told you it was going to end”. “But everything is the same.” “No, everything is not the same.” “The world did not end.” “That’s right.” and this goes on and on, it got stuck on the groove.
The second one was a bizarre fable involving dogs:
The conversation shifted quickly to the subject of the royal dogs, as the two men tried to recall the exact number of them. “There are two, one male and one female,” one of the king’s aides explained, referring to the king’s hunting dogs. “This is the male,” he said, pointing to the large dog that had suddenly arrived. “And that is the female,” he said, pointing to the smaller dog. “No, you’re wrong,” the king replied, “that is the male and that one is the female.” And so it went for a while. Finally, the king turned to the aide and said, “Why don’t you take these two dogs and drown them?” The aide walked away in horror. “Ah, you don’t believe me,” the king said. “Go, call the prime minister and ask him. I’m serious,” he said. So the aide went to the prime minister’s house and told him of the king’s instructions. “I’m sure the king is only joking,” the prime minister said. “He can never do such a thing to his dogs.” The aide returned to the king and told him that the prime minister had said he was only joking. “Very well,” the king said. “Go and ask the prime minister to come over here.”
The prime minister was woken up and told to get dressed quickly. He rushed over to the king’s house and met him at the gate. “You said I would never kill my dogs,” the king said, “but then you said I was only joking. I wasn’t joking and the dogs are dead.” The prime minister looked in amazement at the king’s large hunting dog who was lying on the ground, dead. He was about to open his mouth and speak but the king hit him on the head with a cricket bat. The prime minister fell to the ground, dazed. “Now, you tell me the truth,” the king said, still holding the cricket bat over the prime minister. “I’m sorry, Your Majesty,” the prime minister groaned. “I was wrong about the dogs. Please forgive me.” The prime minister was helped to his feet and asked if he would like some medicine. “A cup of hot tea would be fine,” he said. The king turned to his aide and said, “Go and tell my wife to make some tea and bring it out here. And if you see those dogs again, tell them to go and bury their dead.”
The story of the woman in the white dress came in next. These two prior stories I mention not only because they are rather interesting (well at least I think so) but also because, as they were generated during the same session, GPT-3 may have reused some of their themes in “The Woman In The White Dress”. Dogs certainly reappear there, but possibly also the apocalyptic atmosphere of the first story. The setting of “The Woman In The White Dress” reminds me of a lone man (the fact that he is male is stated only at the very end) in a world where everyone else has died, walking around empty places and talking with a woman in his imagination. But of course, GPT-3 does not know, does not mean any of this stuff. It is just a brainless, heartless autocompleter without any dramatic or artistic message. I did not write the script but any interpretation I give to it is absolutely my own.
Like many GPT-3 outputs, it had a superficial coherence, brought in by a repetition of certain features and a rudimentary narrative structure, but on closer look, it all dissolved into dream logic, with qualities and persons exchanging their place – GPT-3 retains certain words and themes but seems to have no tracking of who was who and who did what.
The essence of Promptism is the personal selection of what results of neural imaging to present to the world. You are the proud parent of the baby and also responsible of what it does. On redditt and on twitter we see those CLIP+VQGAN static images, funny, monstrous, strange and novel ones. It is a bit like being handed a picture painted by a precocious, gifted child; one wonders what the child will accomplish when grown up, but presently so many of the details of the painting are odd and out of place, childish but original. But a longer piece like this, not just one image made with a NN but tens of thousands of them, coming with music and words executed by other “gifted kids” too – I mean really? Isn’t that taking it too far? Well I already elevated absent-minded doodling into baroque monumental art. I enjoy a prolonged stay in multiple uncanny valleys superimposed on top of each other, at times cancelling, at times reinforcing each other.
But if I want to capture a snapshot of the state of artistic generator NN’s, I must be sincere and resist the temptation to start tinkering with the GPT-3 output by attempting to make it somehow more palatable to the audience, or more in tune with all the current fads. Here, before beginning the work where I could detach my ego, I worried about all the references to death and blood and the explicitly suicidal lines. Someone might ask why I feel compelled to animate such morbid statements. What about those dreams of being a woman on the beach? Does that tread over to a sensitive area? It’s as if I was trying to bring up these really deep questions, but they are not leading to any grand philosophical conclusions. Instead what we get is a totally banal “romantic” ending – a couple holding hands in the rain and they dash off to hotel room, presumably to make love. GPT-3 seems to be programmed to insert some type of stock ending into its otherwise uncontrolled narratives, just as Jukebox often finishes its free-form sessions with fadeouts or applauses. But maybe, if I were a genius filmmaker, I would unearth a pure gold in the ending and make it so that everything clicks together and blows everyone’s minds.
Despite or because of me not being a genius filmmaker, I decided I would try to pull it off. After all, these potential points of uncomfort are just samples from mashed-up data that comes from internet – they merely provide a reflection of our own chief preoccupations, sex and death, and the poor AI does not (yet) know any sophisticated literary methods, etiquette, or principles of good taste.
I divided the story up into chapters, and I did them in linear order from start to finish. The opening sequence turned out pretty good, mainly because the music turned out well. “The Woman In The White Dress” is essentially a long music video, and in general, it is the soundtrack that makes any animation come alive and not the graphics. I was looking for something that might work as a film music and found an artist totally unknown to me, “Clint Mansell” and that gave me just the lush, expansive, melodic soundscapes that really contributed in creating atmosphere and so I ended up using Mr. Mansell in the Jukebox setting for the entire video, except for the “underwater” sequence in the middle. There I switched to “Brian Eno” so as to underline that the sequence (another long immersive zoom) is kind of separate video within the video, providing some more variety in the flow.
Then second scene, the one with the red-rimmed cup in the church and raining on the stone steps. It came together as if by magic, minimal shuffle on the timeline, the music and images seemed to be destined for each other and just worked. That’s what I aspire towards, when the movie makes itself up without my conscious intervention. Only then the deal was closed and I knew I was going to do the entire thing.
The thing progressed very breezily. Endings of the scenes gave me some trouble and I redid most of them number of times, because well, endings require special care, they seal the mood and meaning in a piece. Some of the spoken lines turned out too difficult to articulate for the virtual voices, so I had to think up alternative ways of expressing the same thing. Any alterations in the original script are solely because of this (aside from some repetitive and uninformative stuff that I also omitted)
Working mostly during evenings and weekends, the video was substantially finished in about two and a half weeks, music and all, which gives some idea how this technology can at some point improve efficiency and cause disruption in the film industry. But now the fun part was over, and lots of more tedious work awaited. I had to upscale the frames to a higher resolution, which I never did with earlier videos, and I had to arrive at decent workflow by trial and error. The audio had to be remastered. The music comes out from Jukebox as muffled monophonic, and it is very hard to make it sound better. I’m not a qualified mixing engineer, so I often cheated: I swept the imperfections under the rug by drenching one or more sound stems in large reverb.
To emphasize the contrast: making the actual movie is tremendous fun. I can pretend that I’m a film mogul running a studio of robots: my scriptwriter GPT-3, my photographer, Halluzinator and with my synthesized actors. At this stage, it’s a solipsistic effort, just picking and choosing what I think is interesting without a thought of other people. The Nobel Prize-winning author Doris Lessing said that she wrote to know what she was really thinking, I could say that I do these animations to know what I really like.
Postprocessing on the other hand is all about making the video presentable to larger audience, and that’s where the initial doubts return in full force. Maybe this stuff is no good and they will hate it… Should I re-do the whole mess now that we’ve got ViT-B/16? Or maybe it’s too good and they don’t deserve it! I procastrinated and avoided getting involved. I concentrated on preparing a new version of my tool, Halluzinator, since I planned to release it at the same time as the video. Halluzinator is a bit of a mess and is not very intuitive for newcomer, and I’m still working on it. Be that as it may, to reward you, my reader, for your persistence in deciphering this rather long post, I will give away an invite to the previously-private Latent Space discord, the place where the CLIP elite artists congregate. You will find there links to all previous versions of Halluzinator and many other wonderful Colab notebooks too and you will be kept updated on all the exciting trends happening in this space. https://discord.com/invite/Hb5g7B855Q
Whenever I had brought a video into completion, I would then relax and enjoy some refreshments and watch and re-watch it a number of times. Issued from my hands, now flying on its own, I could once again detach my ego and watch it with fresh eyes, from various angles. I would also start to see all kinds of hints and allusions there, oracles and intimation. Above I told you what I think the general setting is in the “The Woman In The White Dress”. But I didn’t tell you how I interpret the plot and ending, but it is in plain sight and you can figure it out from my choice of imagery, and if you don’t that’s perfectly all right. In fact I encourage you to shut your eyes and just listen, you might come up with your own meanings, which of course are the best meanings of all. So where does my part begin and end, what is real and what is generated? In truth it’s all very real. What the machines spout out comes ultimately from the collective work of everyone, the data that we have produced. It is just my personal unconscious casting a shadow on the statistical, collective dreaming, from which I haven’t yet quite woken up.
If you haven’t seen it yet, and don’t know what on earth was that, sorry about that. https://youtu.be/ZeQkHd3VJeQ