Please use the following text to cite this item or export to a predefined format:
Sprugnoli, Rachele; Pellegrini, Matteo; Cecchini, Flavio Massimiliano and Passarotti, Marco, 2020, EvaLatin 2020: data, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-526
dc.contributor.authorSprugnoli, Rachele
dc.contributor.authorPellegrini, Matteo
dc.contributor.authorCecchini, Flavio Massimiliano
dc.contributor.authorPassarotti, Marco
dc.date.accessioned2021-03-09T10:26:37Z
dc.date.available2021-03-09T10:26:37Z
dc.date.issued2020
dc.descriptionTraining and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period.
dc.identifier.urihttp://hdl.handle.net/20.500.11752/OPEN-526
dc.language.isolat
dc.publisherCIRCSE Research Centre, Università Cattolica del Sacro Cuore
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/769994
dc.relation.isreferencedbyhttps://www.aclweb.org/anthology/2020.lt4hala-1.16.pdf
dc.rightsCreative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.urihttps://github.com/CIRCSE/LT4HALA/tree/master/data_and_doc
dc.subjectLatin
dc.subjectPOS tagging
dc.subjectLemmatization
dc.titleEvaLatin 2020: data
dc.typecorpus
local.brandingOPEN
local.contact.personRachele Sprugnoli rachele.sprugnoli@unicatt.it Università Cattolica del Sacro Cuore
local.demo.urihttps://github.com/CIRCSE/LT4HALA/blob/master/data_and_doc/gold_EvaLatin/Horatius-Carmina_GOLD.conllu
local.files.count1
local.files.size0
local.has.filesyes
local.language.nameLatin
local.size.info341,419 tokens
local.size.info16 files
local.sponsoreuFunds EC/H2020/769994 European Union LiLa - Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin info:eu-repo/grantAgreement/EC/H2020/769994
metashare.ResourceInfo#ContentInfo.mediaTypetext

Collections

 Files in this item
Name
EvaLatin-dataset.zip
Size
2.92 MB
Format
application/zip
Description
Zip
MD5
d6b806a96bd69e2ad35bfa90174ddd33
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator