Description
1) Develop an web page on CoreUI front end to upload a PDF file and a dropdown menu with list of 3 Forms (IRS Form 1120, IRS Form 1120S, IRS Form 1165) and a button Upload.
2) After the File uploads set up a basic Pre-Processing (Clean Up) and Parsing (OCR) of Page 1 of IRS Form 1120) including:
a) Clean up of Image, such as:
i) Turn to Black & White / Crop / Extract Lines (Tools: OpenCV, Numpy)
ii) Set up coordinated “Sub-Image Areas” (e.g. Set Up Sub-Image Area for “output box” for 1a. Gross Receipts and Sales (See word file for full description and pictures)
iii) Parse Sub Image Area with Tesseract OCR
3) Set up a Page with a PDF Viewer (e.g. [login to view URL]) and “Editable Output” (value from 1a. Gross Receipts and Sales) so that the user can confirm the Parsing
4) Add a Button Save to save to the database both the Original File, Cleaned up File, Parsed Text and Verified by user Text
Tesseract OCR Documentation: [login to view URL]
CoreUI framework documentation: [login to view URL]
Other Tools:
● OpenCV: [login to view URL]
● Numpy: [login to view URL]
● PDF Miner: [login to view URL]
● Image Magick: [login to view URL] (Constrained Based Algorithm to figure out where the line is at)
Dear,Sir
How are you?
I am very interested in your project and am ready for starting your project for now.
I will work very hard and best for you.
Best Regards
We have already done many projects related to OCR with tesseract and we have extensive experience with parsing PDF reports from pathology reports. Thus, we are extremely confident regarding this project.
About us - we are a team of data scientists and developers with over 10 years of development experience in web/mobile/desktop applications in Java, Python, C++, Javascript.
Please contact us over chat for further discussions.
https://www.freelancer.com/projects/php/License-Plate-Detection-With-Chinese/
https://www.freelancer.com/projects/Python/Qualitative-Comparative-Study-Face-13933384/
We are team of developers worked in python, opencv, numpy, scipy, machine learning. lets discuss it over chat