From CSWiki
Jump to: navigation, search

Tim's Final Project


For this project, I wanted to make use of the enormous amount of online, digitized sheet music available on the public domain ( , ). There's so much free stuff out there, and so many people have been working on it, that it seemed to be asking to be used in a novel way. However, short of some complex graphical pattern matching, I knew that there was no way that I would be able to simply take musical data off of a pdf file. So I turned my attention to the Lilypond typesetting software ( ). Lilypond allows the user to create nice looking sheet music by coding it up (much like latex). Using this code, it's much easier to extract data.

But what data to extract?

I thought it would be interesting to look at Bach chorales, (or other 4-part harmonies), and create a probability scheme that would tell us the likelihood of one chord following another ( a Markov model ). Using this data, I would then theoretically be able to 'create' a new chorale simply by giving the chain an initial chord.

Well, I couldn't find enough Bach chorales in the lilypond format, but I did find a good number of 4-part hymns at (I have emailed Geoff Horton about the project, and am awaiting his reply). These files are perfect because they were all coded by Mr. Horton, and thus all use similar styles, which means my data parsing is similar for all of them. They are all very simple songs, and are all composed with 4-parts.

I wrote a java program to connect to, find the lilypond files, and parse them to get at the needed data. Then, I go through the four parts of the song, chord by chord, and create a transition probability matrix that encompasses all the songs. (This of course is normalized by key so that the tonic chord in first inversion in A major has the same value as a tonic chord in first inversion in C major). The result is a huge list of all the chords present, and a list of the chords that follow them.

Using OSC, I send this data, chord by chord, to Chuck, which plays them. The result is something that actually sounds very much like a hymn (the SinOscs give the 'organ prelude' quality), except that it goes on until you stop the program.

The initial version suffered from some annoying voice leading issues (this is because of how the chord data type is initialized. For example: the leading tone would always resolve down to the tonic, jumping a 7th). I have attempted to correct this by simply making the voices move to the next closest note (up a 2nd, instead of down a 7th).

  • The 'w' and 's' keys change the key of the piece.
  • Holding the 'e' and 'd' keys change the master volume.
  • Holding the 'r' 'f', 't' 'g', 'y' 'h', 'u' 'j' keys changes the volume of individual voices
  • Tilting the computer changes the tempo (the rate at which chords are queried).
    • The '[' and ']' keys double tempo, or cut in half.


Applying markov models to music is nothing new ([1] [2]). Music database organization and query is heavily influenced by this method of analysis. However, it is remarkable how easy it is to apply these simple statistical techniques (yes, I know what I've done isn't nearly as complex as the above articles, but it's a start).

I specifically make my program connect to the internet (instead of just downloading the files), because I want to emphasize the expandability of this technique. With some changes to the parsing code, I think it is very reasonable to assume that what I've done can be applied to other types of music: other files on different websites. With some graphical pattern matching know-how, I don't see why this also can't be applied to the broad range of pdf files available. Instead of having to parse a direct audio signal, wouldn't it be much simpler to catalog music based on the sheet music we already have? Imagine making a transition matrix for all of Beethoven's works - click play, and a new piano sonata comes out every time (Ok, I'll concede that's perhaps a little far-fetched).

One complaint that I have with my code is that it doesn't calculate the next chord based on more than just the one preceding chord. The resultant hymn would make much more sense if, for example, a cadence weren't set up only by the V chord, but first by a ii, then a V chord. This does occur in my program, but only because V chords happen to follow ii chords a lot - there is no notion of an extended chord progression.

It is interesting to note, however, that the resultant hymn still makes sense (most of the time). In contrast, creating a text file based on the transition probability of individual letters would produce something unintelligible. Perhaps if the transitions were based around individual words, (similar to chords instead of notes) then more meaning would be found. Regardless, I think this tells us something quite interesting about the components of music and language: While they are both based around generative hierarchical structures, music is far more malleable. Music is much more abstract, and can mean many things to different people, whereas in language, "metaphors aside, the word "giraffe" always refers to a long-necked quadruped, and never to a washing machine" (Jourdain 276).


You can download the code here.

How to Run

  • Unzip the file.
  • Open up a terminal window and navigate to the location of the downloaded files.
  • Open up miniAudicle, and open the file "" (found in the KeelerFinal folder).
  • type: java GetData in the terminal window.
    • you should see a list of all the .ly files appear one by one as they are parsed
  • now start the virtual machine, and add the Hymnal shred
    • miniAudicle may crash the first time - don't worry, just start it again

Audio Example

(no visual component... :( )


This file is also in the zipped folder above.

It's important to remember that I really don't have much control over how this piece sounds. The chords are chosen by the program. I had to wait for a cadence-like moment to stop recording.

Things that are determined by the user: You'll hear some chords that are either half as long, or twice as long, as well as a key change. The bass and tenor will also cut out at one point.


  • I of course have to credit Geoff Horton for providing the necessary lilypond files. (
    • I have emailed Geoff about this project, and am awaiting a reply.
  • Illposed software for creating nice Java OSC libraries. (illposed osc)
  • Robert Sedgewick, Kevin Wayne, and other instrucutors/developers for COS 226 who provided some very helpful java classes.
  • My friend John Morris who worked with me on a COS 226 assignment when we developed the that I use in this program.

This project represents my own work in accordance with University regulations.