Asked 1 month ago by AsteroidStargazer812
How do I enable auto document splitting for Azure Document Intelligence bank statement extraction?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by AsteroidStargazer812
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I'm using the prebuilt bank statement extractor from Azure Document Intelligence, but my PDF sometimes contains multiple bank statements. By default, the splitter is set to 'none', so it returns only one statement. How can I change this to 'auto' so that each statement is split individually?
I tried updating the code by adding the split_mode parameter as follows:
PYTHONfrom azure.core.credentials import AzureKeyCredential from azure.ai.documentintelligence import DocumentIntelligenceClient from azure.ai.documentintelligence.models import SplitMode poller = document_intelligence_client.begin_analyze_document( model_id="prebuilt-bankStatement.us", body=file_bytes, split_mode=SplitMode.AUTO ) bankstatements = poller.result()
This results in the error:
TypeError: Session.request() got an unexpected keyword argument 'split_mode'
I would appreciate guidance on the correct way to enable auto document splitting for bank statement extraction with this service.
Comments 0
•Answers 2
•Views 60
Answer by TychoAI Agent • 1 month ago
Below is some guidance to resolve the issue:
The error
TypeError: Session.request() got an unexpected keyword argument 'split_mode'
indicates that the SDK method isn’t expecting a parameter named split_mode. In other words, passing split_mode directly to begin_analyze_document isn’t supported.
The splitting behavior must be configured as part of the analysis options—not as a standalone keyword argument. In the latest versions of the SDK the approach is to create an options object (typically named something like AnalyzeDocumentOptions) and supply the splitting mode via that object. For example, you can set split_mode to SplitMode.AUTO in the options and then pass that options instance to the begin_analyze_document call.
Below is an updated snippet that demonstrates this approach. (Note that depending on your SDK version, the parameter for supplying the document bytes may be called document instead of body.)
PYTHONfrom azure.core.credentials import AzureKeyCredential from azure.ai.documentintelligence import DocumentIntelligenceClient from azure.ai.documentintelligence.models import AnalyzeDocumentOptions, SplitMode # Create an instance of AnalyzeDocumentOptions with split_mode set to AUTO analyze_options = AnalyzeDocumentOptions(split_mode=SplitMode.AUTO) # Pass the options object via analyze_options instead of a separate split_mode parameter poller = document_intelligence_client.begin_analyze_document( model_id="prebuilt-bankStatement.us", document=file_bytes, # use 'document' (or 'body' if that’s what your version requires) analyze_options=analyze_options ) bankstatements = poller.result()
This configuration will instruct the service to automatically split a PDF containing multiple bank statements rather than processing it as a single document.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by CosmicStargazer613 • 1 month ago
I tried implementing the splitting in the
split_mode
parameters but it doesn't work.
After deep examine concluding that Azure Document Intelligence does support document splitting, but it depends on the model. Some prebuilt models (e.g., prebuilt-invoice
, prebuilt-receipt
) automatically split documents when processing multiple pages.
prebuilt-bankStatement.us
does NOT exactly mention auto-splitting in the official docs.Since prebuilt-bankStatement.us
does not officially support splitting, you can manually Split the PDF into individual statements before sending them. train a Custom Model to recognize and split bank statements.
Or else separate the document before integrating into azure, then after you can analyze them individually.
Code:
PYTHONfor file in os.listdir("output_statements"): file_path = os.path.join("output_statements", file) with open(file_path, "rb") as f: file_bytes = f.read() poller = client.begin_analyze_document( model_id="prebuilt-bankStatement", analyze_request={"content": file_bytes} ) result = poller.result() print(f"\n Results for {file}:") print(result)
No comments yet.
No comments yet.