Please use the following text to cite this item or export to a predefined format:
Perišić, Olja; Stanković, Ranka; Vitas, Duško; Krstev, Cvetana and Moderc, Saša, 2022, It-Sr-NER: CLARIN compatible NER and geoparsing web services for parallel texts: case study Italian and Serbian, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-980
dc.contributor.authorPerišić, Olja
dc.contributor.authorStanković, Ranka
dc.contributor.authorVitas, Duško
dc.contributor.authorKrstev, Cvetana
dc.contributor.authorModerc, Saša
dc.date.accessioned2022-09-22T12:55:44Z
dc.date.available2022-09-22T12:55:44Z
dc.date.issued2022-09-12
dc.descriptionIt-Sr-NER-corp is the Italian/Serbian bilingual corpus with 10,000 aligned sentences compiled in the scope of the It-Sr-project from samples of several Italian novels translated to Serbian and vice versa, with the aim of the development of the CLARIN compatible NER web service for parallel text with the case study on Italian and Serbian. The set of 10,000 natural language segments is split into 4 files: 1*1000+3*3000. The corpus comprises of: 1) text versions, Italian and Serbian, with one segment per line 2) TMX (Translation Memory eXchange) bilingual aligned segments 3) monolingual text and TMX files with automatically annotated named entities for six NER classes: demonyms (DEMO), works of art (WORK), person names (PERS), places (LOC), events (EVENT) and organizations (ORG). It-Sr-NER annotation uses a powerful Convolutional Neural Network architecture within the spaCy tool, for Italien WikiNER (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran) and for Serbian SrpCNNER (Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić, Branislava Šandrih Todorović).
dc.identifier.urihttp://hdl.handle.net/20.500.11752/OPEN-980
dc.language.isosrp
dc.language.isoita
dc.publisherUniversità degli studi di Torino
dc.rightsCreative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.labelPUB
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.source.urihttps://github.com/rankastankovic/It-Sr-NER/
dc.subjectNER
dc.subjectTXM
dc.subjectNamed Entity Recognition
dc.subjectaliged corpus
dc.subjectSerbian
dc.subjectItalian
dc.titleIt-Sr-NER: CLARIN compatible NER and geoparsing web services for parallel texts: case study Italian and Serbian
dc.typecorpus
local.brandingOPEN
local.contact.personOlja Perišić olja.perisic@unito.it Università degli studi di Torino
local.demo.urihttps://github.com/rankastankovic/It-Sr-NER/tree/main/corpus
local.files.count1
local.files.size0
local.has.filesyes
local.language.nameSerbian
local.language.nameItalian
local.size.info10000 sentences
local.sponsorOther CLARIN Bridging Gaps project CLARIN ERIC CLARIN Bridging Gaps
metashare.ResourceInfo#ContentInfo.mediaTypetext
This item isPublicly Available
and licensed under:
 Files in this item
Name
It-Sr-NER-corp.zip
Size
6.95 MB
Format
application/zip
Description
Zip
MD5
f4086b834ae5a57a51111d35b750b248
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator