Asked 1 year ago by AuroraCaptain733
How can I extract variable formatted executive compensation tables from long PDFs using fine tuning?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 year ago by AuroraCaptain733
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I have many long (100+ page) PDFs that each contain 1-2 specific tables on executive and non-executive director compensation/renumeration. However, the exact page location and table format differ from document to document, making it challenging to extract them using standard techniques.
I tried using a Python PDF-to-text package followed by the OpenAI API to locate the table, but the results were not meaningful. I'm considering using a fine tuning model to better illustrate what I'm looking for. Does anyone have suggestions or alternative approaches for this scenario?
The two tables of interest from the example are shown below:
Here is my current code:
PYTHON# Your Python code here
I have new code and have posted it in reply to this.
Comments 0
•Answers 0
•Views 1.3k
No comments yet.