Perl Script to Automatically & Properly Space English Words

Chiuso Pubblicato Jan 12, 2007 Pagato alla consegna
Chiuso Pagato alla consegna

I have a text file containing 80,000 rows. Each row contains 1 or more words that are combined (unspaced). An example input file can be found at: [url removed, login to view] I need a perl script that can automatically & properly space each row, and save the output to a new file "[url removed, login to view]" There are various methods to accomplish this task, all of which most likely involve checking the words against a dictionary file. There are three requirements: - Program must be written in Perl. - 95 out of every 100 words should be properly spaced, in other words, only 5 errors per 100 words are allowed. - an 80,000 row file should take no longer than 5 minutes to complete. --------------------------- In your bid, please describe the methodology you are using to properly space the words: For example, if you are planning on only running words through a dictionary, you are going to run into problems. For example: "therestopenlarge" If there exists more than 1 solution, then output both solutions to a new file on the same line with a tab between the solutions. For example, the new file would contain this row: There stop enlarge The rest open large I believe the other difficulty will be with smaller words, like "of" and "the" or words less than 4 characters. For example, "the" is a substring of "there", so spacing would only work if "the" and "re" both words (re is not a word, so no spacing would occur). Generally, each file has a theme, such as "equipment" in the example file. I thought an idea would be to go through the entire fire and create a lexicon of common words, and then space words > than 4 characters that occur frequently. Not sure how that method would work.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

PERL

Ingegneria Linux MySQL PHP Architettura Software Testare Software

Rif. progetto: #2804734

Info sul progetto

5 proposte Progetto a distanza Attivo Feb 2, 2007

5 freelance hanno fatto un'offerta media di $75 per questo lavoro

skunkwrks

See private message.

$85 USD in 10 giorni
(28 valutazioni)
5.3
mtateam

See private message.

$68 USD in 10 giorni
(19 valutazioni)
3.9
leptonixvw

See private message.

$85 USD in 10 giorni
(4 valutazioni)
2.5
mikearma10

See private message.

$85 USD in 10 giorni
(0 valutazioni)
0.0
hooande

See private message.

$51 USD in 10 giorni
(2 valutazioni)
0.0