Fiddling with generative art models

personal
professional
ai
generative modeling
deep learning
raspberry pi
legal disclaimer
zelda
family
Author

Shannon Quinn

Published

Posted on the 3rd of August in the year 2022, at 11:54am. It was Wednesday.

Generated with the text: `4K, wide angle, hyper detailed, starfield, nighttime sky, crescent moons, stars, nebulae, cosmic` by MidjourneyAI.

Much like a lot of the internet over the past couple months, I’ve been tinkering with some AI art generators, mainly Midjourney (if only because I’m waitlisted on DALL-E 2).

It’s been surprisingly fun, if challenging. This isn’t really my area of research, but I understand the basic underpinnings:

These models are trained using vast swaths of image-and-text pairings, so the model can develop an association between features in images and the corresponding text that describes them. Obviously this process is very complicated, and also requires not only a great deal of data but also a considerable amount of computing power, so it’s still a very active area of research and is largely out of reach of your every day researcher or individual training their own model. But once the model is trained, users can provide snippets of text, which the model then translates into an image.

Therein lies the magic. There’s a lot of randomness involved in the image generation process, since even a text input as simple as “cat” could mean a LOT of different things, image-wise. That’s partly a feature: two people who give the exact same text input will likely get slight variations of the same concept, and possibly get wildly divergent results.

But it also makes for a challenging user experience: there’s a sort of “lingo” that you have to wrap your head around, a “way of talking to the model” that goes beyond simple word-concept mappings, that only comes with extended use of the models. It gets harder the more specific of an idea you have: your conceptual understanding of the idea may not bear any resemblance to the concept the model learned!

(man this is getting philosophical! let’s see if I can make this a bit more concrete with some examples)

Generated with the text: `empty beach, dark storm in distance, one flower blooms from the sand to the side, high detail, atmosphere, Arnold render, ultra-realistic, dramatic lighting, glow, cinematic lighting, epic, 8k` by MidjourneyAI.

Generated with the text: empty beach, dark storm in distance, one flower blooms from the sand to the side, high detail, atmosphere, Arnold render, ultra-realistic, dramatic lighting, glow, cinematic lighting, epic, 8k

This was actually an image that I generated based off an image someone else had already generated (yeah, we can take public images and run them back through the model with new or additional text).

From here, I ran a couple more of my own modifications:

Generated by MidjourneyAI.

Generated by MidjourneyAI.

I really liked the second one, though with the third I love the ambiguity in the landscape: is it water, or is it snow? Hard to say in that lighting.

From here, I started focusing a bit more on landscapes under a starlit sky (like the cover photo). I eventually settled on this one, which I really kinda love (so I made the highest-res version of it that I could):

Generated with the text: `wooded mountainous winter landscape under cosmic night sky with distant glowing small town in a valley, nebulae, crescent moon, planetary rings, cinematic, atmospheric, 8k, dramatic lighting, ultra realistic` by MidjourneyAI.

Generated with the text: wooded mountainous winter landscape under cosmic night sky with distant glowing small town in a valley, nebulae, crescent moon, planetary rings, cinematic, atmospheric, 8k, dramatic lighting, ultra realistic

I will say, it definitely takes some doing to get a feel for how it works–what I had in mind was nowhere near what the model spat out, even though I love what it created. Still, there weren’t really any planetary rings (I was thinking of Gaal’s homeworld in the AppleTV+ Foundation series), and the “glowing small town in a valley” seems to have been diffusely spread over the entire valley, almost like fireflies, rather than distinct. Beautiful, no question, but not exactly what I’d had in mind.

One final bit I tried before my trial ran out was a completely different direction:

Generated with the text: `raspberry pi computer made out of raspberries, modern, futuristic` by MidjourneyAI.

Generated with the text: raspberry pi computer made out of raspberries, modern, futuristic

I see a lot of super awesome art being generated from very terse prompts, so I was trying to capture that a little bit here (as opposed to the downright locquacious landscape prompts above).

Some of the earlier images I created had more to do with my family. This first one was created shortly after my wife left her previous job to pursue her longtime dream of being a full-time fiction writer (it was seriously one of the first things I ever learned about her all those years ago): I was inspired to see if the model could capture something about what that looked like, in my head at least.

Generated with the text: `a writer working at her computer surrounded by a world of her own creation` by MidjourneyAI.

Generated with the text: a writer working at her computer surrounded by a world of her own creation

And I really love this next one, as it evokes less of the specific act of writing and more of how it’s about creating.

Generated with the text: `writer standing on the boundary between worlds, the world she lives in and the world she creates` by MidjourneyAI.

Generated with the text: writer standing on the boundary between worlds, the world she lives in and the world she creates

We’ve also been huge fans of Zelda for quite awhile (which longtime readers already know), and not too long ago I commissioned a fan account to remake a family photo as if it were in the Zelda universe.

Commissioned artwork of our own family, redrawn to look like we live in the Zelda universe.

Quite the handsome family, doncha think?

So I played with that idea a bit, and while the results weren’t great–I get the feeling the generator has a harder time dealing with very specific real-world concepts, as opposed to more general categories of things–I did get this pretty cute rendering of what is most definitely a Hyrulean family.

Generated with the text: `zelda champions from breath of the wild, family, 1985, highres` by MidjourneyAI.

Generated with the text: zelda champions from breath of the wild, family, 1985, highres

It’s fun to tinker with. In all honesty I’ll probably keep playing with it here and there as the mood strikes; if you want to follow along, there’s a public feed of all the pictures I generate here (note: it’s “public” in that you have to create a free account to see it), since there’s no way I’m going to shell out the $ required to make my images private.

Citation

BibTeX citation:
@online{quinn2022,
  author = {Quinn, Shannon},
  title = {Fiddling with Generative Art Models},
  date = {2022-08-03},
  url = {https://magsol.github.io/2022-08-03-fiddling-with-generative-art},
  langid = {en}
}
For attribution, please cite this work as:
Quinn, Shannon. 2022. “Fiddling with Generative Art Models.” August 3, 2022. https://magsol.github.io/2022-08-03-fiddling-with-generative-art.