You will remember the iconic beginning of the movie ‘Matrix’ (Wachowski, 1999), where a curtain of symbols took us into a world of synthetic images and voices. Such a credible and realistic environment where nothing could give away to those who inhabited it that it was unreal, except, perhaps, for some failures that manifested themselves in the form of ‘deja vu’.
‘I already know kung-fu’ is the name given to the scene in which Neo, the protagonist of ‘The Matrix’, suddenly learns the rules and movements of this martial art thanks to the preload of a computer model on your system, something like a kung-fu AI.
Like Neo, now you too are able to generate an image in a matter of seconds in the style of Vang Gogh. Anyone can do it. That is, anyone with internet access or a computer. The AI does it for you.
The symbols that encoded the world of the Matrix, and that in our computers could ultimately be represented as the basis of the binary language of zeros and ones, now represent a meeting place for voices, proteins, images and newspaper articles… In these At times, artistic and scientific production come together like never before, and without complexes, in a tsunami of numerical matrices where data is related, affected and generated.
Since the first computer systems, the idea of designing machines that could create themselves was always present. The field of computational creativity has specifically dedicated itself to studying the relationship between creativity and artificial systems in a field of confluence with disciplines as interesting as cognitive psychology.
Currently, the star algorithms in the generation of art are based on deep learning AI, an artificial intelligence capable of creating new data from patterns and structures found in other pre-existing data. What many did not suspect is that all these years in which we have stored our photos for free, in addition to labeling them, have served not only for the algorithms to be able to identify a cat in the image, but also due to the number and variability of images of ‘seen’ cats have been trained to generate versions of cats with amazing attributes.
But is this ability to generate variations enough to call ‘it’ art? Can the AI-generated image of a cat in the style of Johannes Vermeer be considered art or should more be required of it? Any incursion of a new technology removes the foundations around the figure of the work, the artist, and his creative process. In the year 1935, Walter Benjamin reflected on the work of art and the concept of aura in the era of mechanical reproduction. Currently, the debate is no longer so much about the mechanical reproducibility of an original, but whether that original has actually been generated by a machine or not.
The generative algorithms do it based on patterns that they have extracted from other artistic works, being the limit between what is really generated or partially copied a constant source of discussion and debate. In this text, the intention is not to establish a verdict, but to deal with some points that can provide arguments for reflection.
Within the generation of digital art we can find two approaches based on the degree of intervention of the artist. A work can be generated by programming the parameters that configure the objects in a scene, intervening at the pixel or polygon level; or directly incorporating mathematical equations that allow defining, for example, geometric structures or defining physical behavior. And, on the other hand, we have a type of digital art generated from learning other works of art. At this point the artist does not have to worry about mathematically parameterizing the object and/or its behavior, since he can obtain an almost immediate result by entering text or another reference image. Between both generative models, the greatest differential degree is undoubtedly the mode of learning.
If we consider the algorithm as a student, the important thing as teachers would be to offer it the ideal learning conditions. In the field of AI, this would consist of providing representative data in terms of quantity and variability and, on the other hand, making sure that the student does not memorize but is able to understand.
If you had to remember one of the following sequences, which do you think you would remember best?
–The first sequence: abc,abc,abc,abc,abc,abc?
–Or the second sequence: baa,caa,abc,aba,bab,cba?
It is very likely that you answer the first one, right? And this is due to the fact that you found a pattern, that of remembering ‘abc’ six times. This way of compressing the information has a double advantage: on the one hand, the most obvious, the fact that it requires less storage space; and, on the other hand, that of forcing the extraction of non-obvious structures and characteristics in our data.
In the same way that there are schools with different learning methodologies, there are AI training designs for this purpose: those that are based on reconstructing the original (VAE); those that compete with each other through a generating network; and another that is an evaluator (GAN) and acts as a base for the generation of the famous ‘deep Fakes’.
There are also other strategies, such as those based on diffusion models, where the original content is corrupted by noise and the original network is forced to rebuild it guided by some other type of content such as text or image. This type of approach is the basis of AI known as Dall·e-2, Stable Diffusion or Midjourney, all three capable of generating content with high quality and variability.
That of ‘latent space’ is one of the key terms in this revolution and constitutes something like the Platonic world of ideas contained in numerical matrices. After training, the network has crystallized the most relevant input data and a certain semantic level translated into distances within the distribution of that space.
When we say that two objects are similar in a generation model, what it really means is that within that latent space they will be closer than two others with a different meaning.
For its part, to generate content we can do it from Colab notebooks, integrated into applications such as Photoshop or, directly, from the company’s website, as is the case with Dall·e-2. Most of the time we will generate the image using text, which works like a key between a noise image and a recognizable image. That is the reason why these text entries in the form of secret codes can be commercialized. There is also a community and websites like Léxica where it is possible to find structured references such as: the key object you want to generate (cat), the style (conceptual art) and a reference artist (Marcel Duchamp), as well as attaching other words that improve the result as that of referencing some artistic portals such as Artstation.
Google, Meta, Microsoft, OpenAI, StabilityAI are key players in the industry in AI-powered imaging tools, but out of all of them it was StabilityAI that dared to break away from the rest by participating alongside Runwayml, LAION and the group of CompVis research in the generation of StableDiffusion, an open source model released to the community from where it began to develop improvements and new features to the original model.
Another aspect in which the type of access to the models differs is the level of customization in the generative process. We can control, for example, how literal the generation is with respect to the input text or image (creative margin), as well as the number of iterations in the process in order to add more precision to the result (perfection margin).
Despite the fact that we use natural language in the generation, it is often difficult to achieve the result that we expected, for example, that the figures are in a certain pose or that there is consistency between two generated images. That is why improvements are constantly being added, such as the recent ControlNet, which allows you to guide the generation with text entries, sketches or body poses, among others. Hugging Face, which is another of the agents in this revolution, allows you to test all these features in friendly interfaces within your website.
The Text_to_Image generation sequence is the most usual and the one on which we have focused the most, but we can generate Text_to_Audio, Text_to_Text.
So far we have talked about data, learning strategies and the way to navigate through this latent space, but what other comparisons could we establish? The time has come to talk about the process.
The artist is presumed motivated to project part of his personality into the work while the algorithm must be guided. A human being has vital experience, while what an algorithm has at its disposal is data that, moreover, has mostly been extracted in a haphazard manner from the Internet, something that, on the other hand, could change when most of the data may come from machines present in the world, such as robots.
Another aspect from which we can assess the work directly is through its final result or product. Here it is more difficult to discern. The reason is that although the algorithms require guidance, we cannot guarantee that the product is a copy, because the task of the models is precisely to learn from the data, not to adjust to it. From a philosophical point of view, wouldn’t there be something similar in this process to what some humans do when they are creating? That is, seeing many works and then producing variations from that experience.
The production of artificial intelligence is not limited to still images and every day it expands towards music, video, 3D… And it is obvious that the waiting times in the generation are less and less.
Thanks to these advances, are you considering an on-demand movie generation service? Will it be possible to generate infinite variations of the ‘Star Wars’ series with even a character similar to you? Is it dangerous to create a bubble of customized art that limits our experience and experimentation towards the discovery and experimentation of new artistic currents?
It is up to us to decide if we co-create with AI or settle for its results. The question of whether it is art that is generated by AI will continue to be asked, but what does seem almost indisputable is that the algorithm itself already is.