I am using the official OpenAI Python library to upload two JSON files to an assistant's vector store with custom chunking strategies (one per file).
There are two upload approaches I tried:
-
Create a vector store and then upload the files using the client.beta.vector_stores.files.upload_and_poll method.
-
Upload the files first using client.files.create, then create a vector store and attach the uploaded files.
In the first approach, the code runs without exceptions, but the vector store ends up empty (0 files).
In the second approach, the process also completes without exceptions, yet the vector store shows a file_count of in_progress = 2, meaning the files are stuck processing.
I have even removed the custom chunking strategies, but that did not affect the outcome.
Below is the code I used for the first approach:
vector_store = client.beta.vector_stores.create(
name="human labeled dataset",
)
client.beta.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("results/results_tsm_human_labeled.json", "rb"),
poll_interval_ms=1000,
chunking_strategy={
"type": "static",
"static": {"max_chunk_size_tokens": 100, "chunk_overlap_tokens": 5},
},
)
client.beta.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("data/sample_tsm_new.json", "rb"),
poll_interval_ms=1000,
chunking_strategy={
"type": "static",
"static": {"max_chunk_size_tokens": 1000, "chunk_overlap_tokens": 400},
},
)
And here is the code for the second approach using the client.files functions (without specifying a chunking strategy):
human_dataset_result_json_file = client.files.create(
file=open("results/results_tsm_human_labeled.json", "rb"), purpose="assistants"
)
human_dataset_json_file = client.files.create(
file=open("data/sample_tsm_new.json", "rb"), purpose="assistants"
)
vectors_store = client.beta.vector_stores.create(
name="human labeled dataset",
file_ids=[human_dataset_result_json_file.id, human_dataset_json_file.id]
)
In this second case, the vector store remains stuck showing in_progress = 2.
Interestingly, uploading the same files via the Web UI works perfectly.
Why might these SDK-based approaches be failing to complete the file ingestion process as expected?