Mistral likely does “prompt enhancement,” aka feeding your prompt to an LLM first and asking it to expand it with more words.
So internally, a Mistral text LLM is probably writing out “sure! Here’s a long prompt with no dog: …” and then that part is fed to the image generator.
Other “LLMs” are truly multimodal and generate image output, hence they still get the word “dog” in the input.
I think all the big image generators support negative prompts by now, so if it interpreted “no dog” as a negative for “dog”, then it will check its outputs for things resembling dogs and discard those. No free will, just a much more useful system than whatever OP is using.
it just did what you wanted, since you asked for an image. free will would be if you asked it not to generate an image but it still did, if it just generated an image without you prompting it to, or if you asked for an image and it just didn’t respond
I asked mistral to “generate an image with no dog” and it did
The fact that it chose something else to generate instead makes me wonder if this is some sort of free will?
Mistral likely does “prompt enhancement,” aka feeding your prompt to an LLM first and asking it to expand it with more words.
So internally, a Mistral text LLM is probably writing out “sure! Here’s a long prompt with no dog: …” and then that part is fed to the image generator.
Other “LLMs” are truly multimodal and generate image output, hence they still get the word “dog” in the input.
I think all the big image generators support negative prompts by now, so if it interpreted “no dog” as a negative for “dog”, then it will check its outputs for things resembling dogs and discard those. No free will, just a much more useful system than whatever OP is using.
Hmmm
That’s a land shrimp.
There could be a dog behind any one of those bushes though.
it just did what you wanted, since you asked for an image. free will would be if you asked it not to generate an image but it still did, if it just generated an image without you prompting it to, or if you asked for an image and it just didn’t respond
free will is when it generates an image of a billboard saying “suck my dongle, fleshbag”
fair enough