Analysing the Literature

There are two broad approaches that we are taking in studying the literature. The first, using text mining techniques, aims to help understand the characteristics of the discourse around the fantasias, how it differs from that around comparable works, and how it varies depending on time and the professional domain of the writer (for example, the difference between performers and analysts). The second, using slower, more conventional techniques, seeks to unpick some of the concepts and tools used in describing and analysing the music with the aim of finding ways of expressing these within a computer music information retrieval environment and, ultimately, implementing them.

Text Mining

Thus far, only a small number of texts have been subjected to a fairly limited array of text mining tools (word counts and some collocation analysis). Already, aspects like national and comparison vocabularies seem unusually common but this must be validated against appropriate control data sets, as well as with larger samples, for more fine-grained conclusions to be drawn.

Music Information Retrieval (MIR) vocabulary

Even a small number of texts quickly reveal the diversity of ways in which musical information is processed and communicated by experts. Examining these texts can provide an interesting alternative or supplement to user surveys or machine learning techniques for designing interfaces and tools for Music Information Retrieval (henceforth MIR). We have no intention to be able to reproduce or automatically parse such texts, but suggest that they transmit important knowledge, the structure and content of which is worth modelling.

The first, and most important, thing to note is that most discussion in musicological texts is not directly about the music: we have historical and bibliographical information, argument, value judgement and other vital componenets of the discipline. A key aspect, then, of any retrieval system must be to store and retrieve those aspects of the extra-musical information that are susceptible to machine encoding and analysis.

Of the musical elements of the discussions, we see some common types of description and topics, each with a varying degree of vocabulary overlap. Location specifications and the language of juxtaposition and comparison run through almost all discussions of music and muqst be modelled in at least the majority of their forms within any successful computer system. More music-theoretical language, such as that describing tonality, contrapuntal technique or timbre is also common, but much more domain and even scholar specific. What follows is a brief overview of some of the relevant topics.

Location and specification
At the most basic level, one must specify the piece or piece being discussed before describing it. In order to give more specific information, clearly the piece must be subdivided and areas of it specified. Examples of the latter type of location can be seen in this quote from Andrew Ashbee’s book on the music of Jenkins (location terms are in italics):

‘Neither of the last two paragraphs are splintered in this way, nor does any cadence provide a breathing space. The third section, broad and with fine suspensions, lacks a sense of direction...At 27 bars the final section is too long.’

Juxtaposition and comparison
Comparison of musical material can be internal, in which case it generally reflects formal attributes of the piece, or external, in which case the implication may be either one of a compositional model, as in this quote from Howard (2007)

‘Purcell probably added the fine chromatic conclusion, which follows an F major cadence with an E major chord and then gradually returns home through a maze of chromatic inflections, after studying the second D minor fantasia in Locke’s Consort of Four Parts, which ends with a similar surprise.’

or it may be more generic, as in this example from Adams (1995)

‘the extremity of Fantasia No. 12 (Z. 743), with its chromaticism in the style of a Jacobean ’hexachord‘ fantasia is not typical.’

The comparison is essentially independent of the technique used to make it - in other words, it may be a simple assertion of similarity, or it may give a musical or extramusical justification.
This category has been used to contain the more directly musical aspects of discourse. Topics might include fundamentals (such as time, pitch and notational details), texture, instrumentation, rhythm, tonal space or meloday and counterpoint.

Examples of the full complexity of combinations available in a single sentence include:

‘In Fantasia 4... the double-augmented subject entry in bar 14 is combined with a series of entries in the original note values creating a four-part imitative texture in which each voice plays an equal role.’ (Howard, 2007)

‘Inversion is found in most of Purcell’s four-part fantasias, and it is com bined with augmentation in the opening section of Z739, with single and double augmentation in the opening section of Z735, and with single, double, and triple augmentation in an astonishing passage towards the end of no. 12 Z743 (31 August 1680)’ (Holman, 1995)

Clearly some of these texts will be descriptive and the purpose of studying them in this light would be to present the information they give in a sensitive and appropriate way to the user of a digital resource. In other cases, they can prompt questions about whether what is discussed is true in other cases. Here, the ability to search for the named attributes becomes necessary, and appropriate tools can be designed.


We have developed a database schema for storing metadata pertaining to the digital representations we hold of the score, performance, and literary items, and for storing the numerous relationships which hold between those items.

A diagram of the complete schema is given here.

The principal entities in the schema are:

Records of this table represent musical works and bear fields such as title and catalogue_number.
Records of this table represent a score of a work. The concept is quite abstract and relies on two specialised tables (editions and inscriptions) for describing real scores. However, it provides a useful mechanism for generalising about concepts which pertain to any type of written score.
Records of this table represent any kind of inscribed object which may a container for written_instances. We define two concrete implementations of this concept: musical_publications (for printed publications, and correlating with the editions form of written_instance), and manuscripts (for hand-written sources, and correlating with the inscriptions form of written_instance).
Records of this table represent texts such as analyses, concert or record reviews, and record sleeve notes.
Records of this class represent identifiable entities who may be responsible for producing some item featured in the Purcell Plus corpus. These includes: composers, authors, performers, and conductors.
Records of this class represent the occasion of a musical work having been performed (whatever the purpose).
Records of this class act as a container for a collection of performances records and represent a programmed event of performances given before an audience.
Records of this class represent the act of making a record of performance of a musical work (the performance being represented as a record of the performances table.
Records of this class represent publicly issued recordings of music on physical media (such as CD or vinyl) or published online. The act as a container for a collection of records from the recording table.
Records of this class represent objects which are part of the Purcell Plus collection and which may be digitally encoded (including digital audio, images, text, and scores) or physical. Records bear properties including resource_content_type which indicates what kind of information the object contains (audio, text, notation), resource_nature which indicates whether the resource is an encoded representation of some physical objects (such as a performance or a score) or whether its contains analytical data (such as the results of executing a beat finding algorithm).

As well as these entities which refer to more or less physical objects, the database schema includes tables for describing relationships between them.

Records of this table assert that a musical_works was composed by a persons.
Records of this table assert that a written_instances appeared in a sources, for example that an inscribed was copied into a manuscript.
Records of this table assert that a literary_works was written by a persons.
There are numerous tables whose names take this form, each of which asserts that a literary_works makes some sort of reference to a record of some other table in the database (e.g. text_ref_edition describes a reference to a printed edition of a piece). Currently we store no additional semantics on the nature of the reference.
There are numerous tables whose names take this form, each of which asserts that a resources is of a particular kind of entity.