Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 6 months ago

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

Em Adespoton@lemmy.ca · 6 months ago

Wouldn’t their patch embeddings return different results depending on the visual boundaries? They don’t appear to use overlap redundancy; this means it’s going to be significantly less resource intensive, but the chance of losing significant signals in the image to text translation surely must be inversely high?

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 6 months ago

Good question, not sure how they account for that. Maybe there’s a higher level layer responsible for dealing with the boundaries?

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

Just a moment...