Please use the following text to cite this item or export to a predefined format:
Marinelli, Rita; et al., 2024, PAROLE reference corpus, CLARIN DSpace, http://hdl.handle.net/20.500.11752/ILC-1010
dc.contributor.authorMarinelli, Rita
dc.contributor.authorBiagini, Lisa
dc.contributor.authorBindi, Remo
dc.contributor.authorGoggi, Sara
dc.contributor.authorMonachini, Monica
dc.contributor.authorOrsolini, Paola
dc.contributor.authorPicchi, Eugenio
dc.contributor.authorRossi, Sergio
dc.contributor.authorCalzolari, Nicoletta
dc.contributor.authorZampolli, Antonio
dc.date.accessioned2024-07-19T13:19:26Z
dc.date.available2024-07-19T13:19:26Z
dc.date.issued2024-03-20
dc.descriptionThe PAROLE project (Preparatory Action for Linguistic Resources Organization for Language Engineering) has produced a set of harmonized corpora and lexicons for a large number of European languages. Each corpus, made up of 20 million words, was built up as reference corpus for Human Language Technology applications, to provide full information about a large variety of text types in the language considered, to represent the use of contemporary language and to become the first nucleus of an electronic text library. The texts have been stored using a common format following the standards recommended in the CES (Corpus Encoding Standard), according to flexibility and multifunctionality criteria. The texts belong to a wide range of media and genres, selected in proportions aimed at reflecting their prominence within the society, classified according to medium, genre, topic and time of production. For more info see also Goggi, Sara, Lisa Biagini, Remo Bindi, and Sergio Rossi. 1997. ‘Italian Corpus Documentation - LE-PAROLE WP2.11’, October. https://zenodo.org/records/8167985. Marinelli, Rita, Lisa Biagini, Remo Bindi, Sara Goggi, Monica Monachini, Paola Orsolini, Eugenio Picchi, Sergio Rossi, Nicoletta Calzolari, and A. Zampolli. 1996. ‘The Italian “Parole” Corpus : An Overview’. Linguistica Computazionale Computational Linguistics in Pisa-Special Issue I (XVI/XVII, 1996/1997): 401–21. https://doi.org/10.1400/18167. https://www.ilc.cnr.it/wp-content/uploads/2022/05/Z224.pdf The corpus is annotated at textual level, with some Named Entities annotation. A portion of this corpus was annotated morpho-syntactic information and is available here: Sara Goggi, Sara Goggi remo Bindi, Lisa Biagini e Sergio Rossi, 1997, Corpus Parole (3 milions words), ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa, http://hdl.handle.net/20.500.11752/ILC-1001.
dc.identifier.urihttp://hdl.handle.net/20.500.11752/ILC-1010
dc.language.isoita
dc.publisherIstituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
dc.relation.isreferencedbyhttps://zenodo.org/records/8167985
dc.relation.isreferencedbyhttps://doi.org/10.1400/18167
dc.relation.isreferencedbyhttps://www.ilc.cnr.it/wp-content/uploads/2022/05/Z224.pdf
dc.rightsCreative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectCorpus
dc.subjectReference corpus
dc.subjectPAROLE project
dc.subjectSGML
dc.subjectDatabases
dc.titlePAROLE reference corpus
dc.typecorpus
local.brandingILC
local.contact.personFrancesca Frontini francesca.frontini@ilc.cnr.it Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
local.demo.urihttp://dbtvm1.ilc.cnr.it/corpus/parole.htm
local.files.count1
local.files.size79848617
local.has.filesyes
local.language.nameItalian
local.size.info20948736 words
local.size.info30399 articles
metashare.ResourceInfo#ContentInfo.mediaTypetext
 Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator