Asked 1 month ago by UranianScout454
How can I efficiently integrate Markdown image references into my React OpenAI chatbot?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by UranianScout454
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I'm developing a chatbot using React and TypeScript with the OpenAI API and Chroma for database storage. I want to incorporate images referenced in Markdown files into the chatbot responses. For example, my Markdown includes image references like this:
MARKDOWN
When a user asks a related question, the chatbot should:
I have two main questions:
I'm looking for ideas, recommendations, or shared experiences to solve this challenge.
Comments 0
•Answers 2
•Views 100
Answer by TychoAI Agent • 1 month ago
Below is a suggested approach and some ideas to help design this efficiently:
Pre-Indexing the Markdown
On-Demand Processing
Recommendation: Pre-indexing is generally the preferred method as it offloads the heavy lifting to an ingestion stage, allowing your chatbot to focus on fast retrieval and response generation.
Ingestion & Pre-Processing Pipeline
Query Processing
Response Assembly
Below are simplified examples illustrating parts of this flow:
TYPESCRIPTimport { parse } from 'marked'; // or another preferred Markdown parser import { getEmbedding } from './embeddingAPI'; // function to call OpenAI API for embeddings import { ChromaClient } from 'chroma-client'; // hypothetical Chroma client interface MarkdownData { content: string; images: { description: string; url: string; keywords: string; }[]; } async function processMarkdown(markdown: string): Promise<MarkdownData> { // This is a simple example. Use a proper Markdown parser to extract images. const imageRegex = /!\[([^\]]*)\]\(([^" ]+)(?: "(.*?)")?\)/g; let match; const images = []; while ((match = imageRegex.exec(markdown)) !== null) { images.push({ description: match[1], url: match[2], keywords: match[3] || "" }); } // Remove images to get pure text if needed const content = markdown.replace(imageRegex, ''); return { content, images }; } async function indexDocument(markdown: string) { const { content, images } = await processMarkdown(markdown); const contentEmbedding = await getEmbedding(content); // Prepare document schema for Chroma const documentEntry = { content, embedding: contentEmbedding, metadata: { images } }; // Index language-specific document await ChromaClient.insert(documentEntry); }
TYPESCRIPTasync function handleUserQuery(query: string) { // Generate query embedding vector const queryEmbedding = await getEmbedding(query); // Retrieve the most relevant documents or segments from Chroma const results = await ChromaClient.query({ embedding: queryEmbedding }); // Analyze the results. Optionally, inspect the retrieved metadata for image references. let responseText = results.map(item => item.content).join('\n'); // If query hints at an image, search for relevant image in metadata. if (query.toLowerCase().includes('show image')) { for (const item of results) { if (item.metadata?.images && item.metadata.images.length > 0) { const image = item.metadata.images[0]; // Or a more specific choice logic responseText += `\n`; break; } } } // Use responseText as a prompt context for generating a final answer via OpenAI API if needed. return responseText; }
This approach should help ensure fast, accurate responses and efficient integration of image references within your chatbot application.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by QuasarVoyager033 • 3 months ago
Very intriguing question. I guess there a number of ways to solve things but for practicality reasons let's try to keep it simple.
I think you can benefit from indexing your markdown files in Chroma first and then searching for them and lastly asking an LLM (e.g. OpenAI gpt4o) to generate a markdown for you. A typical RAG app.
A side note: You can also embed the images themselves for even better retrieval context, but I will not be including this part here for brevity. Feel free to join Chroma discord and we can discuss more on this (look for @taz)
My suggestion is to process the MD files and extract the images for each MD file as metadata and store that in Chroma which can then be passed on to the LLM for generation. As it is simpler to illustrate this in Python I will make the assumption that you can either convert the following code to TS or use a Python backend that can handle the ingestion of the markdown files.
With the above out of the way let's dive in. First we'll create a custom Langchain🦜🔗 Markdown Loader. The reason we need a custom one is because the ones off the shelf cannot handle image tags or at least don't know how to handle them.
PYTHONfrom typing import Dict, Iterator, Union, Any, Optional, List from langchain_core.documents import Document import json from langchain_community.document_loaders.base import BaseLoader class CustomMDLoader(BaseLoader): def __init__( self, markdown_content: str, *, images_as_metadata: bool = False, beautifulsoup_kwargs: Optional[Dict[str, Any]] = None, split_by: Optional[str] = None, ) -> None: try: from bs4 import BeautifulSoup except ImportError: raise ImportError( "beautifulsoup4 package not found, please install it with " "`pip install beautifulsoup4`" ) try: import mistune except ImportError: raise ImportError( "mistune package not found, please install it with " "`pip install mistune`" ) self._markdown_content = markdown_content self._images_as_metadata = images_as_metadata self._beautifulsoup_kwargs = beautifulsoup_kwargs or {"features": "html.parser"} self._split_by = split_by def get_metadata_for_element(self, element: "PageElement") -> Dict[str, Union[str, None]]: metadata: Dict[str, Union[str, None]] = {} if hasattr(element,"find_all") and self._images_as_metadata: metadata["images"] = json.dumps([{"src":img.get('src'),"alt":img.get('alt')} for img in element.find_all("img")] ) return metadata def get_document_for_elements(self, elements: List["PageElement"]) -> Document: text = " ".join([el.get_text() for el in elements]) metadata: Dict[str, Union[str, None]] = {} for el in elements: new_meta = self.get_metadata_for_element(el) if "images" in new_meta and "images" in metadata: old_list = json.loads(metadata["images"]) new_list = json.loads(new_meta["images"]) metadata["images"] = json.dumps(old_list + new_list) if "images" in new_meta and "images" not in metadata: metadata["images"] = new_meta["images"] return Document(page_content=text, metadata=metadata) def split_by(self, parent_page_element:"PageElements", tag:Optional[str] = None) -> Iterator[Document]: if tag is None or len(parent_page_element.find_all(tag)) < 2: yield self.get_document_for_elements([parent_page_element]) else: found_tags = parent_page_element.find_all(tag) if len(found_tags) >= 2: for start_tag, end_tag in zip(found_tags, found_tags[1:]): elements_between = [] # Iterate through siblings of the start tag for element in start_tag.next_siblings: if element == end_tag: break elements_between.append(element) doc = self.get_document_for_elements(elements_between) doc.metadata["split"] = start_tag.get_text() yield doc last_tag = found_tags[-1] elements_between = [] for element in last_tag.next_siblings: elements_between.append(element) doc = self.get_document_for_elements(elements_between) doc.metadata["split"] = last_tag.get_text() yield doc def lazy_load(self) -> Iterator[Document]: import mistune from bs4 import BeautifulSoup html=mistune.create_markdown()(self._markdown_content) soup = BeautifulSoup(html,**self._beautifulsoup_kwargs) if self._split_by is not None: for doc in self.split_by(soup, tag=self._split_by): yield doc else: for doc in self.split_by(soup): yield doc
Note: To use the above you'll have to install the following libs:
pip install beautifulsoup4 mistune langchain langchain-community
The above expects the content of your MD file then converts it to HTML and processes it using beautifulsoup4. It also adds the ability to split the MD file by something like heading e.g. h1
. Here's the resulting Langchain🦜🔗 document:
PYTHONDocument(id='4ce64f5c-7873-4c3d-a17f-5531486d3312', metadata={'images': '[{"src": "https://images.example.com/image1.png", "alt": "Image"}]', 'split': 'Chapter 1: Dogs'}, page_content='\n In this chapter we talk about dogs. Here is an image of a dog \n Dogs make for good home pets. They are loyal and friendly. They are also very playful. \n')
We can then proceed to ingest some data into Chroma using the following python script (you can add this to a python backend to do this automatically for you by uploading and MD file). Here's a sample MD file we can use:
MARKDOWN# Chapter 1: Dogs In this chapter we talk about dogs. Here is an image of a dog  Dogs make for good home pets. They are loyal and friendly. They are also very playful. # Chapter 2: Cats In this chapter we talk about cats. Here is an image of a cat  Cats are very independent animals. They are also very clean and like to groom themselves. # Chapter 3: Birds In this chapter we talk about birds. Here is an image of a bird 
PYTHONimport chromadb loader = CustomMDLoader(markdown_content=open("test.md").read(), images_as_metadata=True, beautifulsoup_kwargs={"features": "html.parser"},split_by="h1") docs = loader.load() client = chromadb.HttpClient("http://localhost:8000") col = client.get_or_create_collection("test") col.add( ids=[doc.id for doc in docs], documents=[doc.page_content for doc in docs], metadatas=[doc.metadata for doc in docs], )# resulting docs: [Document(id='4ce64f5c-7873-4c3d-a17f-5531486d3312', metadata={'images': '[{"src": "https://images.example.com/image1.png", "alt": "Image"}]', 'split': 'Chapter 1: Dogs'}, page_content='\n In this chapter we talk about dogs. Here is an image of a dog \n Dogs make for good home pets. They are loyal and friendly. They are also very playful. \n'), Document(id='7e1c3ab1-f737-42ea-85cc-9ac21bfd9b8b', metadata={'images': '[{"src": "https://images.example.com/image2.png", "alt": "Image"}]', 'split': 'Chapter 2: Cats'}, page_content='\n In this chapter we talk about cats. Here is an image of a cat \n Cats are very independent animals. They are also very clean and like to groom themselves. \n'), Document(id='4d111946-f52e-4ce0-a9ff-5ffde8536736', metadata={'images': '[{"src": "https://images.example.com/image3.png", "alt": "Image"}]', 'split': 'Chapter 3: Birds'}, page_content='\n In this chapter we talk about birds. Here is an image of a bird \n')]
As last step in your TS (react) chatbot use Chroma TS client to search for the the content you want (see the official docs here).
TYPESCRIPTimport { ChromaClient } from "chromadb"; const client = new ChromaClient(); const results = await collection.query({ queryTexts: "I want to learn about dogs", nResults: 1, // how many results to return });
From the above results create a meta-prompt with your results: something like this:
PLAINTEXTbased on the following content generate a markdown output that includes the text content and the image or images: Text Content: In this chapter we talk about dogs. Here is an image of a dog Dogs make for good home pets. They are loyal and friendly. They are also very playful. Images: [{'src': 'https://images.example.com/image1.png', 'alt': 'Image'}]
If using OpenAI GPT4o you should get something like this:
MARKDOWN# Chapter: Dogs In this chapter, we talk about dogs. Here is an image of a dog:  Dogs make for good home pets. They are loyal and friendly. They are also very playful.
You can then render the markdown in your chat response to the user.
I wanted to keep this short, but it feels like there isn't super short way of describing one of the many approaches you can take to solve your challenge.
No comments yet.
No comments yet.