Pavese in
words
Lemmatization
Lemmatization is a complex process that involves morphological and syntactic considerations, hermeneutic sensitivity, and a deep knowledge of the work being lemmatized. The procedure developed by the UniCT team is based on the method of lexicographical processing of literary texts originally devised by Savoca (Savoca 2000) and further implemented at the CINUM laboratory for the project www.pirandellonazionale.it.
The protocol unfolds in three main phases:
Text Verification
The initial phase involves meticulous philological control of the input text, often leading to significant innovations in the field of textual criticism (Savoca–Sichera 1997).Tokenization and Lemmatization
Once encoded in UTF-8, the text is tokenized into individual occurrences. Each token is then lemmatized and assigned a Part of Speech tag. Thanks to a constantly updated lexical database and a reinforcement learning system, human intervention is minimal—usually limited to correcting ambiguities such as homographs (e.g., che as conjunction vs. che as pronoun). Lemma identification relies on controlled vocabularies and probabilistic context analysis, in line with state-of-the-art Natural Language Processing methodologies.Concordance and Lexical Tools Generation
In the final phase, a KWIC (Key Word In Context) concordance is generated, along with other useful lexicographical tools:frequency lists (alphabetically, by occurrence, by POS category)
lists of shared or unique lemmas across works
statistical tables (total number of lemmas and word forms, absolute and relative POS frequency, etc.)
Unlike traditional printed concordances, this natively digital project leverages modern data visualization techniques, allowing for significantly improved usability, broader access to lexical data, and enhanced query and output capabilities.
The current goal is to produce a comprehensive vocabulary of Pavese’s major works: Lavorare stanca, Paesi tuoi, La bella estate, Prima che il gallo canti, Dialoghi con Leucò, La luna e i falò, and Verrà la morte e avrà i tuoi occhi.