Please use the following text to cite this item or export to a predefined format:
Sprugnoli, Rachele; Pellegrini, Matteo; Cecchini, Flavio Massimiliano and Passarotti, Marco, 2020, EvaLatin 2020: data, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-526
dc.contributor.author | Sprugnoli, Rachele |
dc.contributor.author | Pellegrini, Matteo |
dc.contributor.author | Cecchini, Flavio Massimiliano |
dc.contributor.author | Passarotti, Marco |
dc.date.accessioned | 2021-03-09T10:26:37Z |
dc.date.available | 2021-03-09T10:26:37Z |
dc.date.issued | 2020 |
dc.description | Training and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period. |
dc.identifier.uri | http://hdl.handle.net/20.500.11752/OPEN-526 |
dc.language.iso | lat |
dc.publisher | CIRCSE Research Centre, Università Cattolica del Sacro Cuore |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/769994 |
dc.relation.isreferencedby | https://www.aclweb.org/anthology/2020.lt4hala-1.16.pdf |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.label | PUB |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.source.uri | https://github.com/CIRCSE/LT4HALA/tree/master/data_and_doc |
dc.subject | Latin |
dc.subject | POS tagging |
dc.subject | Lemmatization |
dc.title | EvaLatin 2020: data |
dc.type | corpus |
local.branding | OPEN |
local.contact.person | Rachele Sprugnoli rachele.sprugnoli@unicatt.it Università Cattolica del Sacro Cuore |
local.demo.uri | https://github.com/CIRCSE/LT4HALA/blob/master/data_and_doc/gold_EvaLatin/Horatius-Carmina_GOLD.conllu |
local.files.count | 1 |
local.files.size | 0 |
local.has.files | yes |
local.language.name | Latin |
local.size.info | 341,419 tokens |
local.size.info | 16 files |
local.sponsor | euFunds EC/H2020/769994 European Union LiLa - Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin info:eu-repo/grantAgreement/EC/H2020/769994 |
metashare.ResourceInfo#ContentInfo.mediaType | text |
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- EvaLatin-dataset.zip
- Size
- 2.92 MB
- Format
- application/zip
- Description
- Zip
- MD5
- d6b806a96bd69e2ad35bfa90174ddd33

The file preview has not been generated yet. Please try again later or contact the system administrator test@test.sk