Below is an example of how you can modify your code to add and later query different namespaces for the backstories. Essentially, when adding your documents you supply a namespace (one per employee/backstory), and when retrieving, you pass that same namespace via the retriever’s search_kwargs.
Updating Document Insertion
If you’re adding documents directly via texts (or chunks), you can pass the namespace as follows:
# For each employee, add their backstory texts with a unique namespace.
pineconeVectorStore.add_texts(["Backstory for Employee 1"], namespace="employee1")
pineconeVectorStore.add_texts(["Backstory for Employee 2"], namespace="employee2")
pineconeVectorStore.add_texts(["Backstory for Employee 3"], namespace="employee3")
If you’re using a helper function like your get_pineconeVectorStore, you can add an optional namespace parameter and forward it to the underlying call. For instance:
def get_pineconeVectorStore(text_chunks, indexName, namespace=None):
embeddings = OpenAIEmbeddings(
deployment="text-embedding-ada-002",
model="text-embedding-ada-002",
chunk_size=10
)
pineconeVectorStore = Pinecone.from_texts(
text_chunks,
embeddings,
index_name=indexName,
namespace=namespace # Pass the desired namespace here.
)
return pineconeVectorStore
Now, when you call the function, you can do:
# Example: Creating a vector store for Employee 1's backstory only.
employee1_vectorstore = get_pineconeVectorStore(employee1_text_chunks, pineconeIndexName, namespace="employee1")
Updating Retrieval
When building the ConversationalRetrievalChain (or any query that uses the retriever), you can restrict the search to a specific namespace by updating the retriever creation. For example, update your get_conversation_chain to accept a namespace parameter and use it when calling as_retriever:
def get_conversation_chain(vectorstore, namespace=None):
llm = ChatOpenAI(model="gpt-3.5-turbo", max_tokens=50)
general_system_template = r"""
You are John, a Customer Service Officer at the company. You are answering questions regarding your job at helping with the rental application process.
----
{context}
----
"""
general_user_template = "Question:{question}"
messages = [
SystemMessagePromptTemplate.from_template(general_system_template),
HumanMessagePromptTemplate.from_template(general_user_template)
]
qa_prompt = ChatPromptTemplate.from_messages(messages)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
# Pass search_kwargs with the desired namespace to filter the documents
retriever = vectorstore.as_retriever(search_kwargs={"namespace": namespace}) if namespace else vectorstore.as_retriever()
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
memory=memory,
combine_docs_chain_kwargs={"prompt": qa_prompt}
)
return conversation_chain
Then, you can set up a conversation chain for each backstory by providing the corresponding namespace:
# Create conversation chain for Employee 2's backstory:
st.session_state.conversation = get_conversation_chain(employee2_vectorstore, namespace="employee2")
Overall Workflow
-
Initialize Pinecone as you currently do.
-
Save Backstories with Namespaces
For example, if you have different text chunks for each employee:
employee1_vectorstore = get_pineconeVectorStore(employee1_text_chunks, pineconeIndexName, namespace="employee1")
employee2_vectorstore = get_pineconeVectorStore(employee2_text_chunks, pineconeIndexName, namespace="employee2")
employee3_vectorstore = get_pineconeVectorStore(employee3_text_chunks, pineconeIndexName, namespace="employee3")
-
Retrieve Documents Using Namespaces
When constructing the conversational chain, pass the correct namespace to the retriever so that questions only pull documents from the intended employee’s backstory.
This approach ensures that even though you’re using a single Pinecone index, each employee’s data remains separated within its own namespace.