EvaLatin 2020: data

Sprugnoli, Rachele

EvaLatin 2020: data

Please use the following text to cite this item or export to a predefined format:

Sprugnoli, Rachele; Pellegrini, Matteo; Cecchini, Flavio Massimiliano and Passarotti, Marco, 2020, EvaLatin 2020: data, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-526

Share

Authors

Sprugnoli, Rachele ; Pellegrini, Matteo ; Cecchini, Flavio Massimiliano and Passarotti, Marco

Item identifier

http://hdl.handle.net/20.500.11752/OPEN-526

Project URL

https://github.com/CIRCSE/LT4HALA/tree/master/data_and_doc

Demo URL

https://github.com/CIRCSE/LT4HALA/blob/master/data_and_doc/gold_EvaLatin/Horatius-Carmina_GOLD.conllu

Referenced by

https://www.aclweb.org/anthology/2020.lt4hala-1.16.pdf

Date issued

2020

Type

corpus,

text

Size

341,419 tokens,

16 files

Language(s)

Latin

Description

Training and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period.

Publisher

CIRCSE Research Centre, Università Cattolica del Sacro Cuore

Acknowledgement