Italian Sense Inventory
Please use the following text to cite this item or export to a predefined format:
Poli, Francesca, 2021, Italian Sense Inventory, CLARIN DSpace, http://hdl.handle.net/20.500.11752/OPEN-557
Authors
Item identifier
Date issued
2021-04-29
Size
12,944 entries
Language(s)
Description
The present Sense Inventory is an Italian language resource automatically derived from two Italian computational lexicons: ItalWordNet (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-62) and PAROLE-SIMPLE-CLIPS (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-88). It was built in collaboration with the CNR Institute of Computational Linguistics as an experiment related to the ELEXIS project (https://elex.is/), with the aim to produce a synthetic and structured inventory of senses to be used for the sense annotation of the ELEXIS WSD test corpus. This Sense Inventory is thus based upon the selection of lemmas occurring in the ELEXIS test corpus and on the merged sense information derived from the two existing lexicons.
The Python program developed for the automatic construction of the Sense Inventory takes as input the ELEXIS dataset, extracts the lemmas from its sentences and searches for all related senses in the above mentioned resources. It also makes use of a sense mapping database of the cited lexicons, 'iwnmapdb', available upon request from CNR-ILC. The extrapolated and checked data are then arranged in a formal structure in which for each lemma - PoS pair the following details are given:
- Not mapped senses extracted from PAROLE-SIMPLE-CLIPS (PSC),
- Mapped senses extracted from the mapping database 'iwnmapdb',
- Not mapped senses extracted from ItalWordNet (IWN).
All fields with no value are filled with None.
The tab separated format thus has the following structure:
LEMMA POS CONCATENATED DEFINITION PSC-IWN USEMID PSC DEFINITION PSC EXAMPLE PSC SEMANTIC TYPE PSC SYNSETID IWN SENSEID IWN DEFINITION IWN
The total number of lemmas (with a ADV/ADJ/NOUN/VERB part of speech) inserted in the Sense Inventory amounts to 3860. There are 12,944 senses and mappings reported in the Sense Inventory, out of a total of 15,672 senses extracted from PAROLE-SIMPLE-CLIPS and ItalWordNet; 3461 mappings were extracted from the mapping database IWNMAPDB and then included in the Sense Inventory as relevant senses.
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- Tesi triennale - Poli Francesca.pdf
- Size
- 1.25 MB
- Format
- application/pdf
- Description
- Adobe PDF
- MD5
- 900fc180961d16aec7e78c3b860d2046

The file preview has not been generated yet. Please try again later or contact the system administrator test@test.sk
- Name
- corrected_si.txt
- Size
- 2.32 MB
- Format
- text/plain
- Description
- Text
- MD5
- 1754a9dae1356f38143adfa4b933a4b5

The file preview has not been generated yet. Please try again later or contact the system administrator test@test.sk