From Mexico City to medieval Italy
Born in Mexico City and raised in Basking Ridge, New Jersey, Avilés-García grew up bilingual in English and Spanish, and he fell in love with Italian during summers in Sicily.
So when he was looking for an subject to tackle with his AI language modeling skills, his French and Italian adviser Simone Marchesi steered him toward one of the greatest works in any language: Dante’s “Divine Comedy,” a three-volume journey from Hell to Paradise written between 1308 and 1321.
![Student standing between two female faculty members outside](/sites/default/files/styles/embedded_landscape/public/uploads/20240501_fernandoalvies-garcia_mr_00187_web_0.jpeg?itok=bvHhhly-)
Fernando Avilés-García (center), shown with Gaetana Marrone-Puglia, professor of French and Italian (at left), and Christiane Fellbaum, a lecturer with the rank of professor in computer science, linguistics and the Council of the Humanities. Photo by Matthew Raspanti
Just one problem: Dante wrote in an archaic form of a Tuscan dialect, so even modern Italian language models struggle with the text, and English-trained models fare much worse.
“Dante is the father of the Italian language, but his text is not standard Italian,” said Marchesi, a professor of French and Italian and a 2002 Ph.D. graduate of Princeton in comparative literature. It took months of effort, and collaborations with programmers from the University of Pisa, for Avilés-García to train his model to parse medieval Italian.
“Once you have that, you can run fun and intriguing and promising queries, as Fernando has been doing,” Marchesi said.
Shining a new light
Avilés-García began quantifying words that frequently appear together in the Comedy.
He struck gold when he ran queries on the noun “love” (amore). He guessed some words that would accompany it — Beatrice (Dante’s muse), heart, the verb love (amare), affection, sweet, beautiful, beauty, woman, wife, desire, flesh — then ran the model.
He was surprised that almost none of his guesses regularly appear within 15 words of amore, but many words related to light (shine, star, ray) and darkness (night) do. When he turned back to the text, that unexpected connection unlocked a new insight. “Dante describes Hell as a place devoid of stars,” he said. “Then I started seeing that Hell is defined by an absence of this much bigger thing: love.”
One of the strengths of interdisciplinary AI research at Princeton is the presence of deep expertise in many subject areas. In this case, Aviles turned to one of the world’s leading Dante experts, Marchesi, to ask if the connection between stars and love and Hell was a trite observation that scholars have recognized for centuries or a radically new concept, or somewhere in between.
“What he has found is real, I would say, and not self-evident,” Marchesi said. Most scholars, he added, have focused on the role of stars as navigational tools, and thus Hell as a disorienting place. “Fernando has proved that a larger conceptual constellation is at stake in their absence.”
Marchesi says he is intrigued by the promise of this new language model. “When you get trained for your job as an academic, you get trained to answer old questions,” he said. “The really exciting part is crossing paths with someone who can ask new questions.”
He looks forward to using this AI model and its future iterations in his own research. “Someone who is a Princetonian once is a Princetonian forever,” he said. “I can reach out to Fernando wherever he goes after Princeton and ask questions and get friendly answers. It’s beautiful.”