there will be a PDF file in a standard format you need to develop a tool that extracts data and gives output in a format readable by an application developed in php.
Details about the PDF file format:
In every PDF Report, broadly we have two types of data, the "data field name" and the "data".
For example:
Data field name: Date of Birth
Data: 17-09-1970
There are seven(7) categories of data fields in any report:
1. Header information (which contain consumer name, report date/time/Control Number)
2. Score (this could be a numeric value ranging between 300 to 900 (or) alphabetic -NH or NA (or) sometimes 0 or -1)
3. Personal information
4. Contact Information
5. Employment information
6. Account Information
7. Enquiry Information
We experienced one or the other categories may be missing in a particular report.
For example: some reports will not have the employment information category. Some reports will not have Enquiry information category etc.
Also, within that category generally the list of "data field names" should be constant. However, we experienced that in many cases the data field names will also change. Where as we can prepare an exhaustive list of "data field names" that may appear in any given report.
For example: Let us say the Personal Information category has 7 data field names viz. Name, Date of Birth, Gender, Identification Type, Number, Issue date and Expiration data. This is an exhaustive list, there will not be any additional field name for sure. Some reports will have only first 3 field names, where there is no identification number available for that person. Some reports will have all 7 field names.
We can consider the option like this:
We should input the tool with the exhaustive list of categories and "data field names" full list. The tool will pick up data values against each "date field name" and give output and ensure an exception rule that if any category or data field name is completely not available, the tool should just ignore and output data value as blank.
Also one very important thing in "Account Information" category is, multiplicity.
For example: Let us say there are 25 "data field names" in a particular account (viz. ICICI credit card) which is an exhaustive list, the tool will capture the data relevant to that field name. However a report may have one credit account or 100 credit accounts. The tool should be able to capture all that data.
I have done pdf scraping before and can probably do this using python. Please send me a few examples of these pdf files. I will update my bid accordingly.
I am software architect and have more than 6 years of professional experience. We can use python or perl to read data from pdf. We can discuss in details about the design and function specs.
I am looking for a position that would recognize my talent and provide me an opportunity to put in my best for the benefit of the company. I am an experienced word press developer and feel my skills will greatly benefit your jobs. I have also worked as a website developer throughout my career. If hired by you I will put in my best work. I am ready to be hired by you and start work on your jobs.
I am an experineced python /django developer and have been building web applications and
web based API's
I have done similar work in my previous company project and therefore comfortable with this type of project .
I plan to use python modules as required. If you can provide me with a sample of your work/PDF file and what format you want for your PHP applicaiton, i can provide you a better and scalable solution.
message me at == mohdirshadmi4 at gmail dot com
PS:
We are a group of three expert Python/Django/Php Developer having experienced in their respective field