Closed

Write python script which can extract PDF annotations and export data to text files

I have a batch of several hundred PDF files which have been annotated (mostly highlights, but some comments) using a variety of different software platforms (iOS Goodreader, Skim, Adobe Acrobat, Apple [login to view URL]). This script will parse through a range of PDF files specified, and perform relevant actions quickly and without errors. Script should primarily use built in functions, but may use a minimum number of reasonably standard libraries (e.g. PyPDF2, poppler via python-poppler-qt5, PDFMiner, etc.). Desired actions are as follows:

(1) Extract annotations from PDF files and export markdown formatted list of comments and highlights to individual files for each PDF (naming convention, `[original PDF name][login to view URL]`). Note: if text is embedded in PDF, provide page number, author (of annotation) and date/time (ISO 8601 format) annotation was generated followed by full text of highlighted text / full text of comments etc.; if text is not embedded in PDF (e.g. page images only), provide list with page number and user and date/time each annotation was generated.

(2) Extract annotations from PDF files and export markdown formatted list of comments and highlights to a single markdown file (naming convention, `[login to view URL]`), following conventions noted above.

(3) Extract annotations from PDF files and export dataframe as formatted data to JSON or CSV table, with page number, annotation type, content, author, and date/time (ISO 8601 format)

(4) Raw dump of all annotations from all files in original XML format to text file (with name of files included in dump)

Allow for user to set basic options at command line execution:

(a) identify format of output (choice of 1-4 above)

(b) filtering input PDF files using simple regexp on filenames specified on command line

(c) specify whether to overwrite existing markdown files

Work delivery:

All files to be deposited in github repository ([login to view URL]) for testing and final delivery (via github pull requests). Code will be well commented. Freelancer should be comfortable developing code which will be placed under an open license (BSD or CC-BY). Final script (as below) should pass pylint with score of 7 or higher.

Milestones:

- successful execution of script on macos which parses sample PDF files (to be provided) following actions 1-4 as defined above

- script passes pylint with a score of 7 or higher, final pull request to deposit script with documentation README which includes operation and installation (on MacOS using Python 3.6+)

Skills: Python, Mac OS, Software Architecture, PDF

See more: python script extract data web page, script extract pdf, python script extract data, python script extract data website, need help write python script operate telit module, need write python script telit gc864quad module, python script extract web data, write perl script extract data, php script extract pdf text page page, python mail extract pdf attachment, pdf excel export data, pdf import export data, python script extract data emails, script extract pdf data, write php script echosign pdf, python script open read file parse data, python script extract website table, export data text file fillable pdf, write python script, python script extract data from file

About the Employer:
( 2 reviews ) Edgbaston, United Kingdom

Project ID: #23509103

14 freelancers are bidding on average £156 for this job

zekovicm

Hi there,I am Data extraction expert from Bosnia & Herzegovina,Europe. I have carefully gone through with your requirements and I can create software that will extract data from that PDF . Check out my profile and for More

£179 GBP in 3 days
(44 Reviews)
6.7
exansoft

Hello, I'm experienced Python developer with strong epertise in extraction and/or scraping data from PDF files. Usually this is an interesting task but it may be not trivial. Its complexity depends on the pdf file itse More

£350 GBP in 7 days
(50 Reviews)
6.2
hsh564cf84accd96

I am writing this proposal in order to work for you in Software and Web Development. We are highly trained professional developers seeking to freelance and earn online. Having a flair in programming and development I More

£20 GBP in 7 days
(8 Reviews)
3.4
BakshiGulam

Hi Jeremy Kidwell, I've 7+ years of software engineering experience working for companies like Alcatel Lucent, Dell, HPE, Juniper Networks, etc. as well as freelancer. I read through your requirement and would like to More

£250 GBP in 30 days
(3 Reviews)
3.6
sunset524

Hi, there. I think you have to use pdfminer library. for task 1,2,3,4, I'll finish every task. Let's discuss more detail. Thanks

£300 GBP in 7 days
(4 Reviews)
3.2
hayteekeys

Hello, I am an MSC majored in mathematics. I have rich exp in ACM/ICPC and deep understandings of algorithms. I attended at ACM regional contest several times and won medals there. I am an MSc majored in mathematics. ( More

£20 GBP in 1 day
(2 Reviews)
2.3
AleksandarDikic

Hello. I have some experiences for similar projects with yours. I have rich experiences as a Python developer for 12 years. I have developed 250+ projects based Python, Machine learning and 7 of them are used for huge More

£150 GBP in 7 days
(4 Reviews)
2.1
egormaksimenko

I read your project spec and understood your requirements. I have developed similar projects like this and I can show you the demo have done. One of them is to extract the text from pdf files using pdfminder and dete More

£250 GBP in 5 days
(3 Reviews)
1.9
rvtechsolution

Hi I am iOS and Android APPs expert having 7+ years experience using objective-c and swift programming language and ready to Write python script which can extract PDF annotations and export data to text files. P More

£150 GBP in 3 days
(2 Reviews)
1.7
sysseccon

Hello, Python and Django expert here! My Expertise in both Front-end & backend which contains Python, PHP, Codeigniter, Laravel, MVC, REST API, HTML/CSS/JAVASCRIPT, Angular Js, React, Wordpress, Woocommerce/Ecommerce More

£300 GBP in 15 days
(1 Review)
1.0
Yangwudi

I have a solution to do it! We have use poppler library, I think so. Hi, dear client I've read your project carefully. As a 10+ years experienced python/php/pdf expert, I think I could do it perfectly. I'd like to hea More

£60 GBP in 2 days
(1 Review)
0.6
maddy4994

Hi, I have good experience with scraping using python libraries for product listing website and provide scraped data in csv format. I am using scrappy, beautifulsoup and requests library to do this job. I have read yo More

£35 GBP in 1 day
(0 Reviews)
0.0
greesol

Hello!I I am very interested in your post project. I am really looking for this kind of project for a long time in freelancer since i have rich experience on it. I think this project is very suitable for me and i am su More

£20 GBP in 1 day
(0 Reviews)
0.0
dataspro

Hello!! We are DSPro, a software development agency specialized in providing services and products to companies, through cutting edge technologies such as Cognitive Computing, Backend systems, Data Pipelines, M More

£103 GBP in 7 days
(1 Review)
0.0