EvaLatin 2020: data

Please use the following text to cite this item or export to a predefined format:
Sprugnoli, Rachele; Pellegrini, Matteo; Cecchini, Flavio Massimiliano and Passarotti, Marco, 2020, EvaLatin 2020: data, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-526
Date issued
2020
Size
341,419 tokens,
16 files
Language(s)
Description
Training and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period.
Acknowledgement
Collections
 Files in this item
Name
EvaLatin-dataset.zip
Size
2.92 MB
Format
application/zip
Description
Zip
MD5
d6b806a96bd69e2ad35bfa90174ddd33
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator