Flexible Text Parser

Completed Posted Dec 20, 2006 Paid on delivery
Completed Paid on delivery

Input format of text files to be specified at each run in XML in the form of labels.

1. Record separator (in the form of a pattern and/or implicit by record end)

2. For each record

1. Max number of rows (optional)

2. format of rowK

i. WordN = labelM

1. 1. A label may be in the Type/Value format in which case it should be allowed to be optional (for example when the Type tag is missing in text. The format of a Type/Value variable may be T:V or T=V or T V etc)

2. Identification of non-record data should also be possible.

After each record is parsed, there should be a routine where the record can be printed in new format. This routine can be empty or contain test code and writes to the excel file itself will be done later on at integration time and are not part of this project. Only the creation of a flexible parser form part of this project.

Sample data (see end of attached file)

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

* * *This broadcast message was sent to all bidders on Wednesday Dec 27, 2006 8:08:26 AM:

The programming language is C#/.NET 2.0. We use Visual Studio Express at this time.

* * *This broadcast message was sent to all bidders on Wednesday Dec 27, 2006 11:57:07 AM:

Someone smart pointed out a few things on the attached example. Here are the things to keep in mind:

Some text fields start at fixed character positions in the line. Some fields start at fixed token positions. The fields may be variable or fixed length (in which case sometimes they may get truncated, like the date in the example). The fields may be preceded by a type: keyword. A format specifier for a field may or may not be required. In case a format specifier is given the field may also be made optional (since format checking allows detection of the presence of the field).

The parser input file must have lexical analysis rules so that the user can configure all these cases properly (as the user will have most information).

Each text input file may be parsed by a grammar at a higher level. The interesting rules above apply to the lexical analysis part.

Summarising , a good solution to this problem will have the following pieces: a grammar for the text file, lexical analysis rules (including the ones I mentioned above), a call back function which is called at the end of each record.

* * *This broadcast message was sent to all bidders on Wednesday Dec 27, 2006 12:21:24 PM:

Hello,

Here are revised specs based on last broadcast message.

Revised Requirements for Flexible Text Parser based on messages received from RAC bidders.

Input to the required program is a text file and an XML config file. The config file specifies the grammar and lexicon rules for the text file. You can use any third party license allowable for commercial use. We can even allow you to contribute your source code back with your name (you will do it of course) if the third party code license requires. The expectation is that any third party code you use is stable, is discussed with us and agreed upon (all options will need to be evaluated by us).

A text file example is given at the end of this document. You may assume that after a bunch of descriptive text come the records. Each record will be specified by a grammar and at the end of parsing the record, the program m ust allow for a callback with all the fields (pretty much like bison/flex works on UNIX). The code (written by you or generated) must be in C#. If any other language is used, we will need COM APIs to it to invoke it from C#/.NET 2.0 which is a managed environment. If anything breaks down you are responsible. The goal is to get it working in a C#/.NET 2.0 environment.

Grammar for each record is specified using rules (look at bison config file for UNIX). We will need all features for rules (like empty rules, multiple rules for a single record etc). The lexical parser must allow for the following things, as pointed out in a recent broadcast message:

Some text fields start at fixed character positions in the line. Some fields start at fixed token positions. The fields may be variable or fixed length (in which case sometimes they may get truncated, like the date in the example). The fields may be preceded by a type: keyword. A format speci fier for a field may or may not be required. In case a format specifier is given the field may also be made optional (since format checking allows detection of the presence of the field).

The parser input file must have lexical analysis rules so that the user can configure all these cases properly (as the user will have most information).

Each text input file may be parsed by a grammar at a higher level. The interesting rules above apply to the lexical analysis part.

Summarising , a good solution to this problem will have the following pieces: a grammar for the text file, lexical analysis rules (including the ones I mentioned above), a call back function which is called at the end of each record.

## Platform

XP/2000

Engineering Microsoft MySQL PHP Software Architecture Software Testing Windows Desktop

Project ID: #3964917

About the project

6 proposals Remote project Active Jan 1, 2007

Awarded to:

denisvvw

See private message.

$350 USD in 21 days
(134 Reviews)
7.4

6 freelancers are bidding on average $138 for this job

marchent

See private message.

$106.25 USD in 21 days
(169 Reviews)
6.3
vw2244935vw

See private message.

$51 USD in 21 days
(39 Reviews)
4.5
amovw

See private message.

$127.5 USD in 21 days
(22 Reviews)
4.1
vw2190129vw

See private message.

$127.5 USD in 21 days
(4 Reviews)
3.2
smartcoder12

See private message.

$68 USD in 21 days
(2 Reviews)
2.3