In the world of artificial intelligence, the quest to take plain text and spin it into an art work stuns as much as it scares. The recent global release of the image-generating program Stable Diffusion 3 Medium by Stability AI shows, more clearly than ever before, how capable the technology is of resulting in images that mystify as much as they thrill. It is a story of technological wonder, public clamour, and a discussion of what the future holds for AI art.
It is a visionary technology that is near and dear to the hearts of techies and artists alike, because AI systems such as Stable Diffusion turn a written description of a scene into a work of art. It’s a portal between text and pictures, and it works so well that it can make an AI generative model seem like a limitless tool for creativity.
But it’s a peculiar journey nonetheless, and media attention last month surrounding the release of Stable Diffusion 3 Medium has largely been focused on the strangeness of the human bodies it renders. This is due to the model’s capacity to produce images from text with a level of detail and specificity that shouts with the power of its own technological breakthroughs, while also showing just how easily human bodies are warped by the model in ways that depart quite considerably from naturalism.
From Reddit threads to Twitter threads, the reaction has swung between amusement and criticism. The ease with which Stable Diffusion 3 managed complex figurative imagery and then suddenly floundered over anthropomorphic anatomy has not gone unnoticed. Comment threads on Reddit and elsewhere that parse the model’s output, especially regarding human limbs, reveal a collective bafflement at this regression in the evolution of artistic AI.
Critics of the company were quick to invoke Stability AI’s older and less restricted predecessors and competitors, asking why a focus on ethically curated datasets and an adherence to the total exclusion of NSFW content had limited the model’s ability to render humans with fidelity. TheSims 3’s hyperreal faces and the anime-inflected result of Stable Diffusion 3’s latest iteration cannot be more dissimilar.
The mechanics of AI models used are deep – deep sets of data and algorithms. What makes it hard to get portraits of humans right is broad – from low-quality training datasets, and filtering that prunes NSFW content to make sure the AI isn’t storing more than it should. Overzealous NSFW filters might remove too many screenshots with healthy human anatomy, thereby pruning the exposure of the model.
‘Regardless of the controversy, the practical fact that AI can now offer the visual form of our whimsical imagination in a matter of seconds is impossible to deny.’ Or – given the prospects for AI improvement – few years. The debates over Stable Diffusion 3 will be a key milestone on that road. They will serve as a reminder to ‘humanise’ AI development in ways that serve – rather than diminish – our potential.
We should pay particular attention to the way that the adjective ‘ease’, recurring throughout this discussion, denotes both the simplicity and efficiency with which such complex systems generate content from textual prompts. But, as with Stable Diffusion 3, ease may not in fact guarantee fidelity to human expectations of verisimilitude or artistry. In fact, this mismatch between ease of development and difficulty of attaining verisimilitude or artistry seems emblematic of AI development more generally.
Now, as we stand at the forefront of these new artistic worlds, aided by AI, the way forward is both exhilarating and challenging. The ease with which AI can enclose a dream or a nightmare invites us to reimagine the limits of imagination; to define the contours of creativity in a world of artificial intelligence, to ask what can be art in the age of AI.
© 2024 UC Technology Inc . All Rights Reserved.