ILC4CLARIN Repository Home

Linguistic Data and NLP Tools

Find

Citation Support (with Persistent IDs)

Deposit Free and Safe

License of your Choice (Open licenses encouraged)

Easy to Find

Easy to Cite

“There ought to be only one grand dépôt of art in the world, to which the artist might repair with his works, and on presenting them receive what he required... ”Ludwig van Beethoven, 1801

What's New

corpusOPEN

NomadLingo1.0 open

Author(s):

Tedesco, Novella ; Bernardini, Silvia and Cervini, Cristiana

Description:

The corpus NomadLingo1.0 contains transcripts of extracts from naturally-occurring conversations which were audio-recorded between November 2023 and April 2024 at social events organised and promoted at digital nomad communities based in Madeira and Canary Islands. The total time of transcribed recording is 11 hours 38 mins. For further information about the texts in the corpus see Section 4. The corpus aims to represent translingual interactions based on the fluid use of English as a lingua franca, other linguae francae such as Spanish, and strategies of transcultural communication like intercomprehension and peer/self-translation.

This item contains 1 file (906.12 KB).

Publicly Available

corpusOPEN

Parlement of Foules, a digital diplomatic edition

Author(s):

Pshenichnova, Iulia

Description:

A digital edition of the Middle English poem “Parlement of Foules” by Geoffrey Chaucer, featuring a diplomatic transcription of the text found in MS Gg.4.27(1), Cambridge University Library. The edition is encoded in XML format according to TEI Guidelines and includes manuscript description metadata, the full transcription, and links to the electronic facsimile hosted on the Cambridge University Library website. The transcription preserves original spelling, punctuation, and scribal choices, with selective expansion of abbreviations.

This item contains 1 file (232.21 KB).

Publicly Available

lexicalConceptualResourceILC

DH ATLAS: Knowledge Graph

Author(s):

Giacomini, Sebastiano ; et al.

show everyone

Description:

A knowledge graph representative of Italian Digital Cultural Heritage projects. The DH ATLAS Knowledge Graph is currently available as a set of Turtle XML files and gathers metadata on a list of examined research products and their related entities. This release includes Turtle (.ttl) serializations of the records created during the Datathon held as part of the ATLAS workshop on March 26, 2025.

This item contains 240 files (470.31 KB).

Publicly Available

Most Viewed Items - Last Month

corpusOPEN

NomadLingo1.0 open

Author(s):

Tedesco, Novella ; Bernardini, Silvia and Cervini, Cristiana

Description:

This item contains 1 file (906.12 KB).

Publicly Available

textOPEN

De lapidibus / digital edition published by digilibLT digital library of late-Latin texts

Author(s):

Damigeron

Description:

Correzione linguistica Nadia Rosso Codifica XML Nadia Rosso HomePage del progetto: https://digiliblt.uniupo.it/ Documentazione: https://digiliblt.uniupo.it/progetto.php

This item contains 1 file (77.52 KB).

Publicly Available

corpusOPEN

DemCorpus-Basilicata: Dementia Corpus

Author(s):

Martinelli, Elena ; et al.

show everyone

Description:

This corpus consists of semi-spontaneous speech data produced by elderly residents of the Basilicata region in Italy. In total, 40 individuals participated: the patient group consists of 20 participants with a diagnosis of dementia (9 cases of Alzheimer’s disease, 2 patients with mixed dementia, 5 patients with not-further-specified dementia, 3 patients with vascular dementia, and 1 patient with frontotemporal dementia), and the control group consists of 20 healthy individuals matched for age, gender, and geographical origin. Three linguistic tasks were administered to all participants: two narrative tasks (the first one was about an excursion or a trip, and the second was about Christmas festivities), and an image description task. This resulted in 8 hours and 50 minutes of recorded semi-spontaneous speech, which was then transcribed, segmented, and annotated using ELAN. This research project was approved by the Bioethics Committee of the Alma Mater Studiorum - University of Bologna (no. 0072032/2022). Due to the Italian privacy policy, raw data of the corpus (i.e., speech recordings, transcriptions, and clinical information of the participants) is not available. Processed data (i.e., tables of acoustic/rhythmic/lexical/syntactic values, with the name of the speakers masked through an alphanumeric acronym to ensure anonymity) are available from the contact person upon reasonable request.

This item contains no files.