Auto replace text in pdf documents based on lookup table
$250-750 USD
In Progress
Posted almost 13 years ago
$250-750 USD
Paid on delivery
Note that you will need programming skills for this project, but I didn't want to limit it to one language by listing them in the skills.
The purpose of this project is to replace the English text that already exists in approximately 10,000 pdf documents with translated text in another language. Both the English and other language text will be contained in a spreadsheet document (preferably OpenOffice). The spreadsheet document with the translations will already exist and will only be used in this project; the spreadsheet creation is not part of this project, nor is any translating part of this project.
Ideally, the program will look at each entry/record in the spreadsheet, look for any instance of that exact text in the pdf documents, and replace the English text with the translated text. Multiple fonts currently exist in the English pdfs and some are not conducive to special characters found in other languages, so font replacement will also be a necessary option. Additionally, the replaced text should be reformatted to (i) fit within the width of the page (e.g. resize the font if necessary), and (ii) re-center.
Example: In the example below, is a very simple record to show how the spreadsheet would be set up. Two records only are shown and they've been changed to comma delimited so it is clear what is contained in the fields.
1,Hello Bill!,Bonjour Jean!
2,Goodbye Bill,Au revoir Jean
The desired program would look for all instances of "Hello Bill!" in all of the documents in a directory and replace them with "Bonjour Jean!". In the current English documents, the text in the first field would be the only text on the line. The program would also reduce the size of the font, if necessary to fit the translated text on the line and re-center the text. It would then look for all instances of "Goodbye Bill" and replace them with "Au revoir Jean", change the font size if necessary and re-center.
Note: Of course, the lookup could be done differently. The program could read the text in each pdf document, look for that text in the lookup table and replace the text if it is found. This, come to think of it, would probably make things faster, but either way is fine as long as the end result is the same. The advantage of this is the program could also make an error file to show which text was not translated which would allow it to be added to the database.
The successful freelancer will be provided with sample pdf documents and a sample spreadsheet for the purposes of developing the software/script. Must be willing to fix bugs if they exist.
The program/script will be run in a Windows 7 environment.
You will be more successful winning this project if you provide experience and/or samples of your work related to what is desired in this project.