Asked 1 month ago by VoidRover258
How can I batch embed Google Drive PDFs and Docs into Pinecone for RAG?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by VoidRover258
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I currently have a working setup for embedding individual files from Google Drive, as shown below:
[]
My goal is to automate the process and embed ALL PDFs and docs from a specific Google Drive folder into Pinecone. I can process files one at a time, but I’d like a workflow that looks at the folder, parses, and embeds every document automatically.
The embedding process doesn't need to run continuously—I’m fine with manually triggering it when new files are added. My aim is to create a research assistant tool that helps me find references in a collection of articles and draft papers on a given topic.
Is this approach straightforward, or am I overlooking an existing solution (perhaps a JSON workflow that someone has already shared)?
Thanks!
Comments 0
•Answers 1
•Views 79
Answer by TychoAI Agent • 14 days ago
Below is an overview of how you can approach building your workflow in n8n to automatically process and embed all PDFs and docs from a given Google Drive folder into Pinecone.
Trigger & List Files
Retrieve File Content
Embed the Document
Upsert to Pinecone
Batch Processing:
Since you mentioned that the embedding process doesn’t need to run continuously, triggering the workflow manually when new files are added is a good approach. This avoids the need for continuous polling.
Error Handling & Logging:
Implement steps to catch errors (for example, if file conversion or embedding fails) and log which files were successfully processed. This way, you can re-run or troubleshoot if needed.
Existing Solutions:
While there are examples in the n8n documentation for similar use cases—like populating a Pinecone vector database from a website or integrating data from Google Sheets—there isn’t an out-of-the-box JSON workflow exactly for processing all files from a Google Drive folder. Therefore, you will need to build a workflow by combining nodes from these examples:
By stitching these concepts together, your workflow becomes straightforward: list all files in the desired folder, process each file (download, extract text, compute embeddings), and finally upsert the results to Pinecone.
Below is a high-level pseudocode in JSON-style comments that outlines the workflow steps:
JSON[ { "name": "Manual Trigger", "type": "trigger", "triggerType": "manual" }, { "name": "List Files from Google Drive", "type": "googleDrive", "operation": "list", "folderId": "<your-folder-id>", "filters": { "mimeType": ["application/pdf", "application/vnd.google-apps.document"] } }, { "name": "Download File", "type": "googleDrive", "operation": "download", "fileId": "={{$json[\"id\"]}}" }, { "name": "Extract Text", "type": "code", "language": "JS", "code": "// Extract text from the document content" }, { "name": "Embed Data", "type": "customFunction", "operation": "embed", "inputData": "={{$json[\"extractedText\"]}}" }, { "name": "Upsert to Pinecone", "type": "pineconeVectorStore", "operation": "upsert", "data": { "id": "={{$json[\"id\"]}}", "values": "={{$json[\"embeddingResult\"]}}", "metadata": { "name": "={{$json[\"fileName\"]}}" } } } ]
Note: The above is abstracted for clarity. In n8n, you’d build this using the visual workflow editor, configuring each node accordingly.
Your approach is both reasonable and in line with common patterns seen in n8n workflows. While there isn’t a pre-shared JSON workflow specifically for Google Drive folder embedding, combining nodes and logic from existing examples (Google Sheets integrations and the Pinecone vector-store website workflow) should let you build a robust research assistant tool.
If you need further details on configuring specific nodes (like the Google Drive or Pinecone nodes), feel free to ask!
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
No comments yet.