In Bing Image Creator, DAL-E 3

Prompt: a Hawaiian shirt with subtle designs celebrating Hot Wheels 50th anniversary

Changing the subtle to hidden didn’t get much better results.

  • j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    9 months ago

    I never play with proprietary AI like this, so I don’t know this model, but I have many image diffusion models I run offline.

    I don’t know how experienced you are with prompting, but making a few assumptions…

    Shift how you think about prompting for an image. Think of the prompt like you are addressing an entity like an roleplaying with a LLM. If you really get to know a LLM with roleplaying, you’ll learn that the model is trying to satisfy the fundamental needs of every character involved including the one you play. It is doing all of this within the limits it has assumed (or have been described) for each character.

    Image diffusion works in much the same way. The prompt is talking to something akin to a roleplaying entity that can only respond by generating an image, but it is still a dynamic and emotional entity. When you say it “does not understand the word subtle” that is likely not the case. There is a configuration setting (that may or may not be available to you) that tells the model how strongly to follow the prompt. If you try and make this too strong of a setting, you’ll get terrible results. If you explore this in detail you may notice these responses are like a vindictive little child retaliating from being punished unfairly. You must allow the entity their own sense of creative collaboration for their own satisfaction.

    If you really want subtlety, the key is to describe what you really want with more passion and flair. There is a major emotional element to this and it really requires the user exploring their own inner emotions on never before explored levels of thought needed to communicate their ideas with more verbosity.

    I only learned this because I connected a text roleplaying model to an image diffusion model in software someone else wrote and I modified. I monitored how the images were generated and noticed it was simply long text. I started observing the effect in detail and that lead me here.

    You can write a few keywords into an image prompt and it will try and create an emotional story to fill in the gaps, but you need to describe how the image makes you feel and why if you really want specificity in detail. This is hard to do IMO and it takes a lot of practice along with a willingness to explore things like why you like a “subtle Hawaiian shirt” or what subtle really means in less subjective terms.

    • Usernameblankface@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      9 months ago

      Hmm. Using Bing, I definitely do not have access to settings, I can only change my input to be longer and more descriptive of my idea.

      In Bing Image Creator, DAL-E 3

      Prompt: a Hawaiian shirt with a normal palm trees, bright colored flower blooms, and white background design. Artfully and playfully hidden in the white spaces and among the loud colors are many subtle, small, hidden hotwheels logos, designed to catch the eye on closer observation, but hidden at first glance.

      It seems to have taken Hot Wheels as meaning the cars rather than the logos. But it’s a lot better!