EvaLatin 2020: data
Please use the following text to cite this item or export to a predefined format:
Sprugnoli, Rachele; Pellegrini, Matteo; Cecchini, Flavio Massimiliano and Passarotti, Marco, 2020, EvaLatin 2020: data, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-526
Authors
Item identifier
Referenced by
Date issued
2020
Size
341,419 tokens,
16 files
Language(s)
Description
Training and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period.
Acknowledgement
European Union
Project code:EC/H2020/769994
Project name:LiLa - Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- EvaLatin-dataset.zip
- Size
- 2.92 MB
- Format
- application/zip
- Description
- Zip
- MD5
- d6b806a96bd69e2ad35bfa90174ddd33

The file preview has not been generated yet. Please try again later or contact the system administrator test@test.sk