Grégory Chatonsky
Our guest author Grégory Chatonsky is an artist whose work has explored the possibilities of artistic expression with digital media since the mid-1990s. An ongoing subject in his practice is the exploration of Artificial Intelligence and particularly the concept of “artificial imagination,” which exposes the machine’s ability to produce content beyond human capabilities and push the limits of art.
In this text, he presents a critique of the latest advances in artificial intelligence aimed at producing more realistic images, which may lead in turn to the banality of all images.
Every week a new text-to-video generation and translation code becomes available on Colab [1]. We keep on experimenting, eager to produce new images and explore these new possibilities. We try to make them our own to avoid some of the visual naïveties that are spread daily on Twitter and Discord. But gradually the field of visual possibilities seems to be narrowing, with Dall-E 2 and affiliates [2]. By becoming more “credible”, the images also become more boring. The technological progression and the aesthetic motivation seem to go in opposite directions, as if each one had its own goals.
Undoubtedly, the codes developed by creative computer scientists, who most often have little knowledge of the history of art, meet requirements that are antagonistic to those of art. Computer practice consists in taking up challenges (exploits), in realizing objectives and in not questioning their presuppositions, so that one inherits more often than not an underlying ideological structure that tends to naturalize what is a social and cultural construction.
Thus, the generation of images in neural networks seems to have as a major objective the capacity to produce “natural” images from texts, i.e. images that seem to have been made by human operators with a technical mediation (painting, drawing, photography, etc.) and not generated by solitary machines. Inspired by Turing’s test, this finality conceals that this test took into account, in its two versions, its performative effects. Indeed, Alan Turing did not want the machine to be an intelligence like a human being (this faculty being moreover uncertain in the latter), but that the latter grants, affects, attributes to the machine an intelligence if he ignores that it is a machine. The recognition of the arbitrariness of the attribution is fundamental here, because it is what defines the conditions of possibilities which must be built and deconstructed.
Thus, the images in neural networks become more and more coherent, banal, until they strangely have a family air with those of Beeple. An average aesthetic fruit of the thoughtless juxtaposition of our culture, a latent space that can be statistical (technical) or cognitive (human). They seem to lose the strangeness of pixels and Surrealism, to repress their psychedelic or hallucinatory character of a Deep Dream [3], since it is a question of overcoming what appears as defects and oddities, so that one does not notice the difference between the alleged original and the alleged copy. One then sees only fire. In fact, there is nothing to see anymore, except a symptom of our time and its hypermnesia.
There is behind the computer exploit a generalized instrumentality, a deterministic construction of the world, which affects the aesthetics itself. It supposes here a linear conception of the representation, of the mimesis, of the Vorstellung: the images would not have effect on themselves. The images of Dall-E 2 seem less disturbing than those of Disco Diffusion or VQGAN Clip, so much they are mastered and normal. One becomes nostalgic for a technology that is only a few weeks old. The technological evolution is an instant ruin, at the very moment of its appearance it is a disaster. Gone are the germinations and the metamorphoses, the imperfections and the monstrosities. The silhouettes and the objects are cut out on a background, each thing is distinguished from the others, the image becomes clearer and more “credible”, but we know well that this credibility is not natural and that it does not go without saying, it is a cultural construction and historically, geographically located.
But it is precisely in the contingency of this construction that the true work of art underlines, whereas the technological development of the generation of images rests on the belief of an essentiality of this one. Coders therefore often pursue a decontextualized and essentialized visual purpose. The original images are considered as data that must be translated. That the perception of these “original” images can be retroactively influenced by the automated productions remains unthought of. That the translation of a text into images belongs to a long Western theological tradition of making images express a sacred text is obscured. This is the reason why “prompts” are often more interesting than visual results. If we were to catalog all the “prompts” that flood Twitter [4], we would probably get a good representation of the visual imagination of our time: what words do people think of to make an image? They don’t see that the defects, the metamorphoses, the amorphous are so many aesthetic potentialities, that the strange familiarity between human and technical productions is also made of distances and differences consisting in an anthropo-technological gray zone: human and technical have always influenced each other, the imagination will have been the name of their meeting through a material support.
When neural networks will be able to generate an image that cannot be distinguished from a human creation, it will be because images created by humans have been transformed, in their biggest banality and instrumentality, as an aesthetic by default. While we believe to be producing new images, we will be in fact modifying the perception of all the past images to which we refer. Our technical present will influence our cultural past. Also, we will have forgotten that there is no human production that is not technical and no technical production that is not human. We will then be able to produce images as stereotyped as those of the influencers, of Beeple, of these instagrammable painters of which we do not know if it is the paintings or the faces which make their fleeting success. We will then be able to be submerged by the flow of images, to create images of images, to take up the thread of all our visual culture through the latent space of statistics. We will then find something to do and we will invent enough errors and shifts to continue experimenting.
Notes by the editor:
[1] Colaboratory is a tool that allows users to write and run code on Python using Google’s cloud services. It facilitates running complex tasks that would otherwise be difficult to process on a personal computer, and also share the code.
[2] Dall-E is an AI system developed by Open AI that creates realistic images from a text description. Its first version was announced on January 5th, 2021. The second version was announced in April 2022, presenting spectacular results.
[3] Deep Dream is a computer vision program created by Google engineer Alexander Mordvintsev that was released in 2015 and became popular for its ability to create dream-like images based on algorithmic pareidolia.
[4] Some Text to Image AI projects invite Twitter users to send “prompts,” descriptions of the images they would like to see generated by the AI.