517476 CV/resume parser

In Progress Posted Aug 15, 2011 Paid on delivery
In Progress Paid on delivery

re-posting.

We require a CV parser that will extract:

1) Name,

2) First line of address

3) Second line of address

4) County

5) Country

6) Post code/zip code/area code/district code

7) Email address,

8) Telephone number,

9) Nationality

10) Website

11) Personal objectives

12) Hobbies and interests

13) References

14) Achievements

15) Career history - for each company they have worked in we need separate xml records for:

15.1) company name

15.2) job title

15.3) start date

15.4) end date - NULL if the job is their current date

15.5) job description

16) Education history - for each education or training course we need separate xml records for:

16.1) start date

16.2) end date

16.3) Education level - for example, in England this could be: Degree, Secondary, Higher, Other.

16.4) If degree, then degree level - for example, in England this could be: high school, non postgrad, associate, post grad, bachelor, master, doctorate, other

16.5) School/college/institution name

16.6) Qualification - e.g. GCSE, BSc. etc.

16.7) Subject and grade. For example, Maths grade A.

16.8) Any summary for that qualification that may be available.

17) Any skills that can be detected from the CV - for example, Java, Sage, MS Word, Running etc.

I would expect an authenticated REST interface (written in php) that we can send CVs to. I would suggest you find a lot of different CV formats from the internet and make sure the parser can handle them. It will need to parse any format of CV sent to it so it requires a lot of intelligent pattern matching.

It needs to be able to open up the CV using Apache Tika - it can open PDFs, word and other text documents, parse the CV, and then output the parsed data into HR-XML (you should research the elements for this standard before bidding) format.

The parser also needs to be able to parse different languages, including:

dutch

chinese

hindi

japanese

german

spanish

french

russian

Brazillilan Portuguese

Malay Indonesian

Arabic

Bengali

It should be written primarily in PHP on a LAMP stack - it should be able to parse a complete CV within 1.5 seconds at the most.

I will provide a full spec to the winning bid but I expect the parser to be able to read keywords from a database, but not rely on the keywords in the database to parse the CV as the CV may not have the keywords in them. Remember, everyone writes their resume/CV differently :-)

I would be willing to pay a deposit but the remaining of the payment would be dependent on a demonstrably working system. I would also want to inspect code at regular intervals throughout the project to ensure that the developer's code and design quality meets my client's needs.

I would expect the application to interact with a DB but via memcache.

Note, I do not believe this is a simple project. This project is for software engineers rather than script kiddies. Please only provide a quote if you believe you can really do this and please be realistic on your timescales.

Apache Arabic Translator French Translator German Translator Japanese Translator Java Odd Jobs PHP Portuguese Translator Research Russian Translator Spanish Translator XML

Project ID: #2263410

About the project

Remote project Active Jul 11, 2012