IL PART-OF-SPEECH TAGGING IN PRATICA

La lepre ha fatto un salto.

Tokenizzazione:

La
lepre
ha
fatto
un
salto
.

POS-tagging:

La	ART
lepre	NOM
ha	AUX
fatto	VER
un	ART
salto	NOM
.	PUN

Lemmatizzazione:

La	ART	il
lepre	NOM	lepre
ha	AUX	avere
fatto	VER	fare
un	ART	un
salto	NOM	salto
.	PUN	.

**********

tagwrapper.pl Documentation

This script performs tagging of English, German, Italian, French and
Spanish by invoking the appropriate taggers and producing output in
the format expected by CWB:

<corpus>
<text id="...">
<s>
The DET the
dogs N dog
...
</s>
...
</text>
...
</corpus>

where, independently of the tagger output, the positional attributes
are always arranged in the order: word pos lemma (tab delimited).

The text elements are going to be present only if the -d option is
used (see below).

The relevant taggers must be in the path of the user. They are:

tree-tagger-english
tree-tagger-german
ita_tree_tagger_wrapper.pl
analyzer

The tagsets are those used by these taggers, unless replacement tags
are provided in the parameter files bundled in the __DATA__ section of
the script.

Usage:

tagwrapper.pl -l langcode [-d delimiter] inputfile > taggedoutput

tagwrapper.pl -h | more

-l langcode: one of en de it fr es

-d delimiter: if a line begins with delimeter, the first string
 following delimiter is used as an id and a corresponding text element

Usage:

tagwrapper.pl -l langcode [-d delimiter] inputfile > taggedoutput

tagwrapper.pl -h | more

-l langcode: one of en de it fr es

-d delimiter: if a line begins with delimeter, the first string
 following delimiter is used as an id and a corresponding text element
 is introduced in the output

-h: prints this information and quits

The script is controlled by various parameter files that are bundled
at the bottom of the script in the __DATA__ section.

Copyright 2005, Marco Baroni and Sara Piccioni

This program is free software. You may copy or redistribute it under
the same terms as Perl itself.

**********

Dati i corpora creati con il nostro metodo, la stringa-delimiter sara'
CURRENT URL. Per esempio:

tagwrapper.pl -l it -d "CURRENT URL" corpus.txt > corpus.tgd

Taggers usati via tagwrapper.pl:

TreeTagger:
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
					DecisionTreeTagger.html

FreeLing:
http://garraf.epsevg.upc.es/freeling/

Tagsets:

Italiano, tedesco, inglese, francese: vedi pagina del TreeTagger

Spagnolo:
http://sslmit.unibo.it/~baroni/termsett/05_1/spanishtags.txt