Extracting Text from HTML with Encoding resolution

Cancellato Pubblicato Jun 1, 2010 Pagato alla consegna
Cancellato Pagato alla consegna

Hello,

I need a script that takes as input an HTML file (with possible many encodings UTF, ASCII, etc. in English and non English languages) and output only text in ASCII Encoding (respecting charachters such as üöà...). It is part of a small web crawler. It collects x number of texts for linguistic analysis.

The main focus is Encoding.

Perl 5.8/5.10

Best Regards

Perl

Rif. progetto: #701904

Info sul progetto

6 proposte Progetto a distanza Attivo Jun 4, 2010

6 freelance hanno fatto un'offerta media di $73 per questo lavoro

freelance4hire80

see pm........

$80 USD in 1 giorno
(39 valutazioni)
6.1
dxxd116

I am an expert in data extraction with Perl. Looking forward to working on this project.

$90 USD in 1 giorno
(1 Recensione)
3.4
amsak

Pls see PM

$70 USD in 10 giorni
(5 valutazioni)
3.0
sitescraper90

I have the code base ready. I do lots of non-English language crawling on sites that use all sorts of character encoding sets (iso-8859-1 to utf-8) and have managed to solve this problem for good.

$100 USD in 1 giorno
(0 valutazioni)
0.0
Lampbird

I have 7 years expericens on perl and can finish this project on time.

$45 USD in 2 giorni
(0 valutazioni)
0.0
rsunlight

I read your requirement carefully .I have 2 year experience in Perl scrapping .I have experience of famous websites scrapping example Expedia ,orbitz,travelocity etc ......

$50 USD in 5 giorni
(0 valutazioni)
0.0