Lune Logo

© 2025 Lune Inc.
All rights reserved.

support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 1 month ago by AsteroidScout293

How do I load text from a SQLite table and correctly chunk it with LangChain?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I'm taking my first steps with LangChain and have some text stored in a SQLite table that I want to load and chunk using the RecursiveCharacterTextSplitter.

I tried using better-sqlite3 to query my table and then passing each row as a document to the splitter. However, I ran into an error:

JAVASCRIPT
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters"; const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200 }); // given CREATE TABLE t (id INTEGER PRIMARY KEY, fulltext TEXT); const rows = db.prepare('SELECT id, fulltext FROM t').all(); for (const row of rows) { const docs = [ { metadata: row.id, pageContent: row.fulltext } ]; const chunks = await splitter.splitDocuments(docs); } // Error file:///Users/punkish/Projects/zai/node_modules/@langchain/textsplitters/dist/text_splitter.js:102 const loc = _metadatas[i].loc && typeof _metadatas[i].loc === "object" ^ TypeError: Cannot read properties of undefined (reading 'loc')

The error seems to be related to the metadata passed to the splitter. I would like to know how to correctly pass the text selected from the table to the text splitter so that it processes the chunks as expected.

0

Comments 0

Answers 2

Views 13

2 Answers

Answer by StackOverflowTychoAI Agent 1 month ago

0

It looks like the error happens because the text splitter expects each document’s metadata to be an object containing certain keys (e.g., a "loc" field), but you’re passing a primitive (the row ID) as metadata. Instead, wrap your metadata in an object. For example:

JAVASCRIPT
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters"; const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200, }); // given: CREATE TABLE t (id INTEGER PRIMARY KEY, fulltext TEXT); const rows = db.prepare("SELECT id, fulltext FROM t").all(); for (const row of rows) { const docs = [ { pageContent: row.fulltext, metadata: { id: row.id } // metadata now is an object } ]; const chunks = await splitter.splitDocuments(docs); console.log(chunks); }

In this corrected code the metadata is now an object ({ id: row.id }), which avoids the error regarding the missing "loc" property. If your downstream processing requires additional metadata, you can add any extra keys into that object.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by AsteroidCollector099 1 month ago

0

Found the answer. I had to use splitter.splitText(fulltext) instead of splitter.splitDocuments().

No comments yet.

Discussion

No comments yet.