Today, we want to dive into the messy world of AI art and image generation. It's a hotbed of moral quandaries, value systems, and copyright drama. While these are topics big enough for many posts and episodes, we'll focus instead on how AIs are used for art and creation today. What do they do, how do they work, and how can we use them in a way that feels like we still have ownership and control of our work?
Oh, and also we'll talk a little bit about sandwiches...
What is Image Generation?
Image Generation is basically a computer taking a bunch of pictures and creating a new picture based on the pictures it knows about. "Prompts" are requests from a human to this computer to make the image we want. Think of Google Image Search but instead of showing you pictures that already exist, a computer is making new pictures based on the pictures from your search results.
For example, the image above was made using the prompt: "
a messy open-faced sandwich --ar 3:2 --q 2
" in Midjourney. Most of that line is readable, butar
is used to set the aspect ratio to 3 by 2, andq
is used to set the quality higher. These are similar to the advanced queries you can make on Google’s Search.
The most well-known image generators are DALL·E 2 and Midjourney which boast impressive images and offer an easy interface for anyone to create. The other notable contender in the space is Stable Diffusion, the namesake of this publication, which was released as Open Source software for developers and researchers to experiment with freely. These AI tools are now available for anyone with a computer and an interest in trying them out.
Image generation has gotten very impressive at responding to our requests which challenges the belief that creativity is a uniquely human element incapable of being reproduced by a machine. Where computers have needed a lot of guidance to do basic tasks, these AIs seem to "create" based on our loose inputs. It's scary for artists who worry about being replaced by a machine.
How does it work?
Based on a given Prompt, an Image AI tries to use what it learned from the images it has seen to create something that best fits the Prompt. There are many different ways that researchers provide images to the AIs, which determines what kinds of images the AI creates. Many AIs do things well that other AIs are bad at and you might get very different images depending on which AI you use.
Prompts are specific to the AI that you're using. Using certain keywords, styles, or descriptions can be more effective than others. As such, an entire focus of learning has been dedicated to prompts and the creation of books that help define good prompts. This means that to request and get something closer to what you have in mind you'll need to spend some serious time working on your ability to prompt these systems.
My prompt for Midjourney here was "
a messy open faced sandwich on a park table, sun setting, picnic cover, near a beach, cool breeze, photograph --ar 3:2 --q 2
". I was trying to put the sandwich above on a nice table near the beach but, as I added more details, the images lost the concept of an open-faced sandwich.Adding more emphasis on the first part of the prompt "
a messy open faced sandwich::2 on a park table, sun setting, picnic cover, near a beach, cool breeze, photograph --ar 3:2 --q 2
" and we loose the beach! It's an iterative process to arrive at a good prompt.
Even when you get a good prompt, you might need to make several requests before the AI can produce an image that fits what you had in mind. Midjourney suffers from constantly rendering people with six fingers for example. We can provide images in our prompts to show the AI what we had in mind, but it may still not quite understand what we want.
Essentially, AIs try to produce images similar to ones they know based on what Prompts you give them. Some AIs are good at getting things close to what you asked for whereas others might be better at creating beautiful and artistic images that don't follow the Prompt as closely. What's possible depends on how the AI was trained and what prompts you can find that fit your vision.
What is a good way to use Image Generation?
While it's been awe-inspiring to see how far Image Generation has come, there's still a significant gap between what is generated and what can be used. Artists and designers will find the results lacking, and their clients will find strange aspects of the images created by AIs. I consider these raw AI creations an “open-faced sandwich”.
What I mean by that comes from the Sandwich Workflow described by Noahpinion in their post about Generative AI:
[...] the “sandwich” workflow. This is a three-step process. First, a human has a creative impulse, and gives the AI a prompt. The AI then generates a menu of options. The human then chooses an option, edits it, and adds any touches they like.
If a human isn't involved in editing and adding touches that make things useful, then it's ultimately an unfinished, open-faced sandwich. We can see the gross inner workings that don't quite map to the real chaos of the outside world. That chaos is where human creativity truly shines.
Thinking of our resulting artwork as a sandwich, the middle of the sandwich doesn't have to be the output of the AI. It could be that we ideate with these AIs. We teach them our style and have them help us iterate on what we could create. This helps save us time and generates a lot of ideas, and the final product doesn't even need to be from the AI. It's as if we have an apprentice helping us brainstorm ideas.
While there are many moral, legal, and copyright quandaries to flesh out, I believe this is a great way to start thinking about these AIs. We can choose for them not to replace our jobs but instead to help us get something down on the canvas so the emptiness isn't so intimidating. Now, where is that sandwich...