Slide 1 of 3 

Linguistic Data and NLP Tools
Find
Citation Support (with Persistent IDs)
Slide 2 of 3
Deposit Free and Safe
License of your Choice (Open licenses encouraged)
Easy to Find
Easy to Cite
Slide 3 of 3
“There ought to be only one grand dépôt of art in the world, to which the artist might repair with his works, and on presenting them receive what he required... ”Ludwig van Beethoven, 1801

Author
Subject
corpusOPEN
Author(s):
Description:
The corpus NomadLingo1.0 contains transcripts of extracts from naturally-occurring conversations which were audio-recorded between November 2023 and April 2024 at social events organised and promoted at digital nomad communities based in Madeira and Canary Islands. The total time of transcribed recording is 11 hours 38 mins. For further information about the texts in the corpus see Section 4.
The corpus aims to represent translingual interactions based on the fluid use of English as a lingua franca, other linguae francae such as Spanish, and strategies of transcultural communication like intercomprehension and peer/self-translation.
Publicly Available
corpusOPEN
Author(s):
Description:
A digital edition of the Middle English poem “Parlement of Foules” by Geoffrey Chaucer, featuring a diplomatic transcription of the text found in MS Gg.4.27(1), Cambridge University Library. The edition is encoded in XML format according to TEI Guidelines and includes manuscript description metadata, the full transcription, and links to the electronic facsimile hosted on the Cambridge University Library website. The transcription preserves original spelling, punctuation, and scribal choices, with selective expansion of abbreviations.
Publicly Available
lexicalConceptualResourceILC
Author(s):
Giacomini, Sebastiano ; et al.
Description:
A knowledge graph representative of Italian Digital Cultural Heritage projects.
The DH ATLAS Knowledge Graph is currently available as a set of Turtle XML files and gathers metadata on a list of examined research products and their related entities. This release includes Turtle (.ttl) serializations of the records created during the Datathon held as part of the ATLAS workshop on March 26, 2025.
Publicly Available
Most Viewed Items - Last Month
lexicalConceptualResourceILC
Author(s):
Description:
Il Lessico Italiano dei Sentimenti è stato sviluppato in modo semi-automatico da ItalWordNet v.2 partendo da una lista di 1.000 parole-chiave controllate manualmente. Contiene 24.293 entrate lessicali annotate con polarità positiva/negativa/neutra. E' distribuito in formato LMF.
The Italian Sentiment Lexicon was semi-automatically developed from ItalWordNet v.2 starting from a list of seed key-words classified manually. It contains 24.293 lexical entries annotated for positive/negative/neutral polarity. It is distributed in XML-LMF format.
Publicly Available