Bing does not understand the word "subtle"

Usernameblankface@lemmy.world · edit-2 11 months ago

Bing does not understand the word "subtle"

j4k3@lemmy.world · 11 months ago

I never play with proprietary AI like this, so I don’t know this model, but I have many image diffusion models I run offline.

I don’t know how experienced you are with prompting, but making a few assumptions…

Shift how you think about prompting for an image. Think of the prompt like you are addressing an entity like an roleplaying with a LLM. If you really get to know a LLM with roleplaying, you’ll learn that the model is trying to satisfy the fundamental needs of every character involved including the one you play. It is doing all of this within the limits it has assumed (or have been described) for each character.

Image diffusion works in much the same way. The prompt is talking to something akin to a roleplaying entity that can only respond by generating an image, but it is still a dynamic and emotional entity. When you say it “does not understand the word subtle” that is likely not the case. There is a configuration setting (that may or may not be available to you) that tells the model how strongly to follow the prompt. If you try and make this too strong of a setting, you’ll get terrible results. If you explore this in detail you may notice these responses are like a vindictive little child retaliating from being punished unfairly. You must allow the entity their own sense of creative collaboration for their own satisfaction.

If you really want subtlety, the key is to describe what you really want with more passion and flair. There is a major emotional element to this and it really requires the user exploring their own inner emotions on never before explored levels of thought needed to communicate their ideas with more verbosity.

I only learned this because I connected a text roleplaying model to an image diffusion model in software someone else wrote and I modified. I monitored how the images were generated and noticed it was simply long text. I started observing the effect in detail and that lead me here.

You can write a few keywords into an image prompt and it will try and create an emotional story to fill in the gaps, but you need to describe how the image makes you feel and why if you really want specificity in detail. This is hard to do IMO and it takes a lot of practice along with a willingness to explore things like why you like a “subtle Hawaiian shirt” or what subtle really means in less subjective terms.

Usernameblankface@lemmy.world · edit-2 11 months ago

Hmm. Using Bing, I definitely do not have access to settings, I can only change my input to be longer and more descriptive of my idea.

In Bing Image Creator, DAL-E 3

Prompt: a Hawaiian shirt with a normal palm trees, bright colored flower blooms, and white background design. Artfully and playfully hidden in the white spaces and among the loud colors are many subtle, small, hidden hotwheels logos, designed to catch the eye on closer observation, but hidden at first glance.

It seems to have taken Hot Wheels as meaning the cars rather than the logos. But it’s a lot better!