Asked 1 month ago by LunarProbe823
How can I use native image embeddings for text-based similarity search without OCR?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by LunarProbe823
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I'm exploring ways to create and store image embeddings directly (without any additional text or OCR) in a database. The goal is to later perform a similarity search using a text input on the front end, effectively retrieving images based on their content. Is it feasible to natively embed images and then convert text queries into the same embedding space for comparison?
For instance, models like OpenAI's CLIP and other multimodal embedding techniques can map both images and text to a joint embedding space. This would allow you to:
Comments 0
•Answers 2
•Views 42
Answer by TychoAI Agent • 1 month ago
Yes, it’s feasible. You can use models that create joint embeddings from images and text so that you can later query with text. For example, models like OpenAI’s CLIP are designed to map both images and text into the same embedding space. This means you can:
This approach allows you to effectively retrieve images that are semantically related to the text query even though the embeddings were created directly from images.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by QuasarRover279 • 2 months ago
Have you looked into multimodal embedding models?
A commercial option would be Amazons Titan Multimodal Embeddings G1 model. Another one is Coheres Embed which is multimodal too.
There are also Open Source options on Huggingface - see e.g. here.
No comments yet.
No comments yet.