16–17 Dec 2021 ONLINE
Évora, Portugal
Europe/Lisbon timezone

Processing Spanish Golden Age theatre with Python: Data structures for versified plays

Not scheduled
15m
Évora, Portugal

Évora, Portugal

Contributed Talk Using Python and Julia in Digital Humanities

Speaker

Fernando Sanz-Lázaro (Universität Wien)

Description

The project **** has developed algorithms and data structures using Python 3 to allow distant reading of Spanish Golden Age plays. We depart from plain text files, which have previously been structured in a relatively straightforward fashion. Each line represents an entity, be it metadata or a character, a speech, or a stage direction. Fine distinctions are marked with tabulators and a reduced set of tags. These texts are processed line by line to obtain the information describing any speech or stage direction. The speech lines and their data are stored as a Pandas data frame.

We have implemented the library libscansion that provides the class scansion to process a verse. It takes the verse itself as a string and a list of integers with expected numbers of metric syllables (NoS) sorted according to their probability. The rationale is that same-length verses tend to be grouped. This class has the speech, the rhythmic pattern, the rhyme, and the assonance as string attributes; the NoS and the position of the rhyme stress as integers; and the metric syllables as a list of strings and expected NoS as another of integers.

Scansion includes methods to translate each plain word into a tuple and represent the verse as a list of tuples. Each tuple has two elements: a list of phonological syllables and a PoS-tag (part of the speech). PoS-based rules and a dictionary determine the presence of metric stress in each word to mark tonic syllables. The list is flatted down as a list of syllables and reevaluated according to the expected NoS. Syllables are separated or joined according to the poetic rules of metre adjustment to meet the first element of the list of expected NoS. If it is impossible, the algorithm tries the following values until it succeeds, promoting the match to the first position. Once obtained a suitable syllabic distribution, the attributes are assigned values accordingly.

We iterate over the data frame creating an object for each verse, passing as parameters the speech—or joined speeches for shared verses—and a sorted list initialised with typical metres the first time. The relevant attributes of the object are added to the data frame, and the updated list of expected NoS is used to create the object again with the new verse. The resulting structure is stored as a CSV to be used in distant reading analyses.

Primary author

Fernando Sanz-Lázaro (Universität Wien)

Presentation materials

There are no materials yet.