Please use the following text to cite this item or export to a predefined format:
Favaro, Manuel; Biffi, Marco and Montemagni, Simonetta, 2022, TrAVaSI_VoDIM Corpus, CLARIN DSpace, http://hdl.handle.net/20.500.11752/ILC-985
dc.contributor.authorFavaro, Manuel
dc.contributor.authorBiffi, Marco
dc.contributor.authorMontemagni, Simonetta
dc.date.accessioned2023-01-09T08:44:35Z
dc.date.available2023-01-09T08:44:35Z
dc.date.issued2022
dc.descriptionThe TrAVaSI_VoDIM Corpus is a sample of the corpus built for the Vocabolario Dinamico Dell’Italiano Moderno (VoDIM, Marazzini and Maconi, 2018), gathering Italian texts from 1861 to the present day, after the Unification of Italy. TrAVaSI_VoDIM is balanced and representative of different prose domains (art, gastronomy, law, newspapers, literature, popular fiction, science), for a total of about 21.000 tokens. TrAVaSI_VoDIM is morpho-syntactically annotated and lemmatized. The annotation, conforming to the Universal Dependencies standard (UD, De Marneffe et al. 2021), has been carried out semi-automatically. First, TrAVaSI_VoDIM was automatically annotated with the Stanza “combined” model for Italian. Automatic annotation was then manually revised. The resulting corpus has also been used to retrain Stanza to deal with historical varieties of the Italian language: achieved results are encouraging.
dc.identifier.urihttp://hdl.handle.net/20.500.11752/ILC-985
dc.language.isoita
dc.publisherIstituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
dc.publisherAccademia della Crusca
dc.relation.isreferencedbyhttp://ceur-ws.org/Vol-2769/paper_86.pdf
dc.rightsCreative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.urihttp://www.ilc.cnr.it/it/content/travasi
dc.subjecthistorical annotated corpora
dc.subjectlinguistic annotation
dc.subjectUniversal Dependencies
dc.titleTrAVaSI_VoDIM Corpus
dc.typecorpus
local.brandingILC
local.contact.personSimonetta Montemagni simonetta.montemagni@ilc.cnr.it Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
local.files.count1
local.files.size236994
local.has.filesyes
local.language.nameItalian
local.size.info21000 tokens
local.sponsorOther 249795 Regione Toscana (POR FSE 2014-2020 - Asse A - Priorità A.2 – Obiettivo A.2.1 – Azione A.2.1.7) Trattamento Automatico di Varietà Storiche di Italiano (TrAVaSI)
metashare.ResourceInfo#ContentInfo.mediaTypetext
 Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator