From CSWiki
Revision as of 20:25, 29 March 2008 by Gewang (talk | contribs) (Learning framework)

Jump to: navigation, search

Our fancy ISMIR paper outline.

Notes for us

6 page limit.

Title: Support for MIR prototyping and real-time applications in the ChucK programming language

Key points:

  • There is no general purpose language for MIR prototyping that both gives access to analysis building blocks and allows for low-level coding
  • This sort of framework can be really useful for fast prototyping, flexible coding, and education
  • There is furthermore no such framework that does this that combines MIR and performance / synthesis; ChucK does this.
  • We're going to show what is available in the language, and examples of how to work with it to accomplish MIR tasks


(re + ge)

In this paper, we discuss the recent additions of audio analysis and machine learning infrastructure to the ChucK music programming language that make it a suitable and unique tool both for music information retrieval system prototyping and for applying music information retrieval algorithms in real-time music performance contexts. From its inception, ChucK has offered both high-level control over building block components paired with fine-grained, low-level sample-synchronous manipulation in a programming environment that encourages on-the-fly experimentation. The new analysis and learning capabilities of the language were designed to preserve this breadth of control options and the "do it yourself" approach to creating algorithms and systems, allowing the programmer to experiment with new features, signal processing techniques, and learning algorithms with ease and flexibility. Additionally, these new capabilities are tightly integrated into ChucK's synthesis framework, making it trivial to use the results of analysis and learning tasks to drive music creation and interaction in real time. We present motivations, additions to the languages, various ChucK-based approach to rapid MIR prototyping, as well as several case studies. We'll also describe a new toolkit for rapidly experimenting with MIR algorithms and systems, and discuss/motivate possibilities for crafting MIR-based live performance.

Designed as a language for composers and performers, ChucK offers programmers high-level control over building block components as well as fine-grained, sample-level control of audio data.

Something about bridging gap between MIR and performance

In this paper, we discuss recent additions to the ChucK music programming language that make it a suitable and unique tool for music information retrieval prototyping tasks and for applying music information retrieval algorithms to music in live performance settings. Fills a niche w/ low & highlevel control => prototyping. Also, as a language originally designed for the live performer, ChucK

the new analysis and learning capabilities of ChucK are exposed at both these levels, and tightly integrated with its synthesis and control capabilities to facilitate analysis and learning in real-time contexts. We discuss the language capabilities in some detail, outline examples of how an MIR researcher can use ChucK for prototyping and applying music analysis in real time, and present a toolkit written in ChucK to facilitate these tasks.


ChucK is a computer music programming language whose primary design goals included

tight control object oriented & built in objects -> high level good for prototyping in music synthesis (e.g., can both build filters from scratch and experiment with re-networking unit generators)


(re, ge)

Catchy first paragraph:

(does the ICMC paper start like this? if so, we should change it)

ChucK began as a high-level programming language for music and sound synthesis, whose design goals included offering the musician user a wide breadth of programmable control-- from the structural level down to the sample level, using a clear and concise syntax, and employing a set of abstractions and built-in objects to facilitate rapid prototyping and live coding. We have recently expanded the language to provide support for audio analysis and machine learning, with two primary goals: first, to offer real-time and on-the-fly analysis and learning capabilities to computer music composers and performers; and second, to offer music information retrieval (MIR) researchers a new platform for rapid prototyping and for easily porting algorithms to a real-time performance context. Our previous papers \ref[icmc2007] and \ref[icmc2008] focus on former goal, and in this paper we deal with the latter.

We begin in section AAA by suggesting that music performance can and should be more significant among the focal points and application domains of MIR research, and motivate the need for additional shared tools between MIR and performance. We also touch on the state of prototyping toolkits in MIR, then describe the ChucK language as it is used for music creation, including prototyping and live coding systems. In section AAA, we describe in some detail how we have incorporated analysis and learning into the language, with attention to preserving the flexible and powerful control that makes ChucK suited for prototyping and experimenting with new algorithms, and to tightly and naturally integrating the new functionality with ChucK's synthesis framework. Sections AAA and AAA illustrate the new potential of ChucK as an MIR rapid prototyping workbench, introducing a working pipeline for MIR tasks and presenting three case studies of how we have been using ChucK to perform/teach music analysis. Finally, in Section AAA we discuss ongoing directions for ChucK as an MIR tool and announce the release of a repository of examples and supporting code for MIR researchers desiring to experiment with the language.

(ge:) something about the importance of rapid prototyping, and also how rapid prototyping enables new MIR-based performance practices.

____. (symbolic representations of "recorded"/non-performed; reviews; etc.).

Why is the focus so narrow? One reason for excluding performed music is the difficulty of translating tools for MIR into a real-time context.

Background & Motivation

(re, ge)

MIR and performance (re)

  • MIR & performance
    • How peformers/composers have used MIR-like algorithms, & what tools they use
    • Why performance should get more attention from MIR!!!
      • What is MIR, really?
      • What good are MIR tools doing for the world?
      • How is an MIR researcher supposed to evaluate and improve his/her work?
  • MIR prototyping
    • Define prototyping for our purposes; why is it important? (+ side benefits)
    • What tools exist

Prototyping in MIR

Let us briefly digress and motivate our other major goal in this work, to provide a new rapid prototyping environment for MIR research. We summarize our basic requirements for an MIR prototyping environment as follows: the ability to design new signal processing algorithms, audio features, and learning algorithms; the ability to apply new and existing signal processing, feature extraction, and learning algorithms in new ways; and the ability to do these tasks quickly by taking advantage of high-level building blocks for common tasks, and by specifying the system either via a GUI or concise and clear code. There do exist several programming environments for MIR that accommodate many of the above requirements, including Matlab, M2K, Weka, and AAAmore?Marsyas?Clam?jAudio?, and their popularity suggests that they meet the needs of many MIR research tasks. We propose that ChucK meets all of these requirements and inhabits a unique place in the palette of tools at the MIR researcher's disposal, not only because of its easy accommodation of real-time and performance tasks, but also because of its particular approach to dealing with time. AAAchange?

Music information retrieval and music performance

(note: work in transition from prototyping -> performance, citing that prototyping naturally lends itself to a on-the-fly, real-time control and tuning of MIR-based performance systems.)

Research in music information retrieval has primarily focused on analyzing and understanding recorded music and other non-performative musical representations and metadata. However, many music information retrieval tasks, such as mood and style analysis, instrumentation and harmony identification, and transcription and score alignment are directly relevant to real-time interactive performance. A core focus of MIR is building computer systems that understand musical audio at a semantic level (\cite[downie?]), so that humans can search through, retrieve, visualize, and otherwise interact with musical data in a meaningful way. Making sense of audio data at this higher level is also essential to </i>machine musicianship</i>, wherein the performing computer-- like any musically trained human collaborator-- is charged with interacting with other performers in musically appropriate ways \cite[rowe].

Despite the shared need to bridge the "semantic gap" between low-level audio features and higher-level musical properties, there does not exist a shared tool set for accomplishing this task in computer music and MIR. MIR researchers employ an abundance of tools for performing signal processing, feature extraction, and machine learning and computer modeling to better understand audio data. Some of these tools are general-purpose signal processing or machine learning packages, such as Matlab or Weka AAAcite, and others such as CLAM, MARSYAS, M2K, and jAAA have been designed specifically for MIR AAAcite. Most of these languages and frameworks were not designed to be used for computer music performance, and most do not perform synthesis and do not suffice for real-time computation; as a result, none have widespread use among computer musicians. On the other hand, most computer music languages do not readily accommodate analysis of audio in the language; for example, spectral analysis and processing tasks must be coded as C++ externals in order to be used in SuperCollider or Max/MSP. Enterprising composers and performers have of course been writing such externals and standalone code to accomplish pitch tracking, score following, harmonic analysis, and other tasks for many years. However, the requirement that specialized code be pushed into externals is a barrier to rapid development and experimentation by programming novices and seasoned researchers alike. So, despite the many shared tasks of MIR researchers and computer musicians, the dominant programming paradigms in the two fields do not include natural avenues for code sharing or collaboration between these groups.

Computer musicians would undoubtedly benefit from lower barriers to adapting state-of-the-art MIR algorithms for their real-time performance needs. We additionally posit that MIR researchers can benefit from increased collaboration with real musicians. Many standard MIR research tasks mentioned above face challenges including copyright restrictions on obtaining data or releasing systems to the public, and difficulty or expense in obtaining ground truth. MIR systems for tasks such as transcription or mood or style classification in the context of a particular musical composition or performance paradigm can circumvent such problems: the relevant data is freely available (the composer or ensemble wants you to have it!), the ground truth may be well-defined, or it may be easy to construct (composers and performers have built-in incentive for the system to perform well). There is also the benefit that an MIR researcher can make an impact on people's experiences with music, which is sadly still hard for those researchers unaffiliated with the music industry.

In summary, one major goal of our work with ChucK is to provide a tool that meets the needs of computer musicians requiring MIR-like analysis algorithms, and of MIR researchers interested in producing tools that can be used in real-time, possibly in conjunction with sound synthesis and performance. We also hope that by making work at the intersection of MIR and computer music easier, we will encourage more work in this area, and facilitate a richer cross-pollination of these two fields than has been happening.

(note: relate back to prototyping, arguing that if we can do this in live performance, then it would naturally feedback into research)

Scratch: Research in music information retrieval has primarily focused on analyzing and understanding recorded music and other non-performative musical representations and metadata. However, many music information retrieval tasks, such as analyzing the mood, rhythm, style, harmony, or instrumentation of music are directly relevant to real-time interactive performance. Furthermore, many established approaches to these tasks are appropriate to a real-time context. However, there has been a paucity of tools and environments that accommodate MIR-style analysis and learning in addition to real-time synthesis and interactive performance. We have recently augmented the ChucK music programming language with analysis and learning capabilities in an effort to begin to bridge the gap between MIR and music performance, and to allow MIR researchers and computer music performers to leverage each other's expertise, tools, and experiences.

Cite Raphael in here somewhere.

Chuck (ge)

  • ChucK (ge)
    • A short history
    • Examples of how to use the language (esp. UGens & time control)
    • OTF & timeliness make suited for prototyping, but not originally for analysis
  • In summary, our goal was to modify ChucK & build tools in it to foster a tighter connection between MIR and performance, and to provide fast prototyping capabability in a new framework for MIRers

ChucK is an ongoing, open-source research experiment in designing a computer music programming language from the "ground-up". A main focus of the design was the precise programmability of time and concurrency, with a emphasis on encouraging concise, readable code. System throughput for real-time audio remains an important consideration, but first and foremost, the language was designed to provide maximal control and flexibility for the audio programmer. In particular, the various components of the design are as follows:

  • flexibility: allows programmers to specify both high-level and low-level time-based operations, in a single unified and well-defined mechanism
  • concurrency: programmers can craft and precisely synchronize parallel code modules that share both data and time
  • readability: the language attempts to provide a strong correspondence between code structure, time, and audio building blocks; chuck is fairly good at doing this, as the language is increasingly being used a teaching tool in computer music programs, including at Princeton, Stanford, Georgia Tech, CalArts.
  • a do-it-yourself approach: by combining the ease of high-level computer music environments with the expressiveness of lower-level languages, ChucK is able to support high-level musical/sonic representations, as well as the prototyping and implementation of low-level, "white-box" signal-processing elements in the same language.
  • on-the-fly: by leverage the ChucKian approach to programming, it is possible and often beneficial to write and experiment with code on-the-fly, allowing programs to be edited as they run.

(note: provide concise example of time/concurrency)

There are no fixed control rates in the language, explicitly leaving programmers to define their own rates for various parts of the system. For example, it's possible to assert control on any Unit Generator (UGen) at any point time in time, and at any rate in a sample-synchronous manner. Furthermore, many processes can share a central notion of time, making it possible to naturally reason about parallel code based on time. Next, the timing mechanism lends itself directly to a concurrent programming model, which is essential to expressively capture parallelism. Multiple processes (called shreds), each advancing time in its own manner, can be synchronized and serialized directly from the timing information. Using this timing/concurrency model, on-the-fly programming can be carried out by exchanging time-aware code segments. Together, these components form a system for experimenting with sound synthesis for composition and performance, as more recently for creation of real-time analysis and MIR based programs.

scratch More specifically, ChucK enables time itself to be computable, and allows a program to be self-aware in time and can control the rate of its own progress through time. There are no fixed control rates in the language, explicitly leaving programmers to define their own rates for various parts of the system. For example, it's possible to assert control on any Unit Generator (UGen) at any point time in time, and at any rate in a sample-synchronous manner. Furthermore, many processes can share a central notion of time, making it possible to naturally reason about parallel code based on time. Next, the timing mechanism lends itself directly to a concurrent programming model, which is essential to expressively capture parallelism. Multiple processes (called shreds), each advancing time in its own manner, can be synchronized and serialized directly from the timing information. Thus arises our concept of a strongly-timed language, in which processes have precise control over their own timing and synchronization. Using this timing/concurrency model, on-the-fly programming can be carried out by exchanging time-aware code segments. To further facilitate this, the Audicle provides a graphical environment in which to write ChucK programs on-the-fly, and to visualize the programs in terms of code, audio synthesis, concurrency, and timing, all in real-time. Together, ChucK, on-the-fly programming, and the Audicle form a system and workbench for experimenting with sound synthesis for composition and performance, as more recently for creation of real-time analysis and MIR based programs.

Recent additions to ChucK to allow for prototyping and realtime MIR

Unit analyzers


  • Say what they are, ...
  • Available features!

In 2007, the authors introduced a language-based solution to combining audio analysis and synthesis in the same high-level programming environment of ChucK (cite icmc 2007, again?). The new analysis framework inherited the same sample-synchronous precision and clarity of the existing synthesis framework, while adding analysis-specific mechanisms where appropriate. The solution consisted of three key components. First, we introduced the notion of a Unit Analyzer (UAna), similar to its synthesis counterpart, the Unit Generator (UGen), but augmented with set of operations and semantics tailored towards analysis. Next, an augmented dataflow model with new datatypes, operators, and objects were provided both in the language and as an underlying system design/implementation. Third, the analysis framework makes use of the existing timing, concurrency, and on-the-fly programming mechanisms in ChucK as a way to precisely control analysis processes.

For example, it's possible to instantiate FFT/IFFT objects for spectral analysis, pass these via the analysis network to feature extractors, and allows concurrent processes to operate and be manipulated in a truly sample-synchronous fashion. The programmer has complete and dynamic control over analysis parameters such as FFT/IFFT sizes, analysis windows, hop sizes, how often (down to the sample) to take FFT or to extract features. (note: add short example of FFT + centroid). This programming model is capable of representing precisely timed algorithms as well as low-level languages such as C++ and Java. The primary advantages are threefold:

  • conciseness and readability: because ChucK takes care of the real-time audio and buffering in the audio synthesis/analysis frameworks, and due to the tailored-for-audio nature of the language, the same algorithm or system can be prototyped/implemented with much less code, often up to factor of 10x reduction in code and in development time (note, validate this claim somehow). As an example, a simple real-time system to do AAA in ChucK takes AAA lines of code, whereas the same system in C++/Marsyas/CLAM takes AAA lines of code. This isn't to say Marsyas/CLAM are verbose, but rather that it's encouraged by the underlying language (C++, Java).
  • rapid turnaround experimentation: this is really friggin' key. Through the application of on-the-fly programming and ChucK's concise audio programming syntax, one can quick prototype systems and sub-systems, changing parameters as well as the underlying structure of systems and experiencing the results almost immediately. (note: maybe clarify actual advantages, such as cutting down compiling time, trying things immediately, I don't know)
  • concurrency: with ChucK, it's possible and straightforward to write truly sample-synchronous, concurrent code for both audio synthesis and analysis, which is something that is difficult to achieve in C++/Java, or any libraries built in these languages. Due to the languages' (C++ and Java) support for preemptive, thread-based concurrency, it can be extremely challenging (and/or inefficient) to represent different parts of analysis/synthesis audio programs in different threads in a sample-synchronous way. Yet, such approaches to systems design can be highly beneficial. For example, consider a system where we want to perform multi-rate analysis/feature-extraction in parallel (e.g., different processes extract features at different rates and with different parameters), and collect and use the result in yet another process. This can be achieved in a few dozen lines of ChucK, whereas the same system in C++/Java would need to contend with issues of thread-instantiation, synchronization, data consistency, as well as buffering and bookkeeping of audio data and feature data. (figure: show multi-rate FFT/feature extraction)

These highly useful flexibilities comes with a tradeoff: system performance and throughput. The same system implemented in reasonable ChucK code would likely run much more efficiently in a optimized C++ implementation, due to the low-level nature of languages like C++ and Java and to their accompanying optimized compilers. In this regard, it may be desirable to leverage the flexibility and rapid experimentation abilities of ChucK to prototype a system (and/or parts of a system), and if needed, then implement a "production" system in C++/CLAM/Marsyas. Using ChucK as a prototyping workbench for MIR not only can drastically reduce coding time, but also suggest new directions and ideas. For researchers experimenting with new MIR algorithms, such a prototyping stage can be instrumental in crafting new systems and also testing for the feasibility of new ideas. We will present some example "working pipelines" for MIR prototyping in Section AAA.

Learning framework

(re) Reference ICMC paper and make note of cool areas this opens up, but which we don't have time to discuss here (side benefit: make it clear that this isn't a copy of the ICMC paper)

A natural consequence of ChucK's new analysis capabilities is that analysis results can be computed and used as features whose relationship to high-level musical concepts can be learned via labeled examples and standard classification algorithms.

-This is what's done in MIR, and what we hope to see done more often in computer music

-Weka is a popular framework for applied machine learning in MIR and other fields, so we have designed ChucK's learning infrastructure based on Weka to make it object oriented and extensible in the future, and to reduce the learning curve for MIR users familiar with Weka. (note: cite, perhaps weaken link to weka).

Describe FeatureCollector, Instance, Instances, Classifier.

Describe available features in a table.

Make note that this is all implemented in chuck, so users can not only change classification by adjusting parameters (e.g., number of rounds for boosting), but also by changing the classifiers themselves (e.g., AdaBoost.M1 can easily be adapted to AdaBoost.MH).

Mention that an example of learning appears in Section AAA.

One consequence of making standard classification algorithms available in a real-time context is that the training stage itself can be executed on-the-fly, during a rehearsal or even a performance. Given an appropriate control interface for providing class labels in real-time, a user can choose the concept to be learned on the spot. For example, in a live-coding improvisation with a violinist and a vocalist, the programmer may wish the computer to execute one function whenever the singer sings, and another whenever the violinist plays. After specifying the desired features and classifier in the code, the programmer can indicate...

We refer the reader to \cite[icmc2008] for a deeper exploration of learning in real-time performance; here, we underscore that design goals of the learning component include making learning implementation easy enough to do in such a context, which is even more time-critical than a prototyping context. At the same time, the facts that feature extraction can be performed in ChucK with sample level control, and learning algorithms themselves are written in ChucK, ensure that specificity and flexibility are not sacrificed in the pursuit of supporting rapid development.

Let us point out that the implementation of ChucK's analysis and learning capabilities preserves its suitability for rapid prototyping, and extends this capability to many MIR tasks. In particular, the user retains sample-level control over audio processing, allowing easy implementation of new feature extraction methods and signal processing algorithms directly in the language. Learning algorithms are also implemented in the language, affording the user arbitrary modification of these algorithms at any time, without the need to recompile. Additionally, the user can choose from many standard MIR features and several standard classifiers that have already been implemented, and use these out-of-the-box. Furthermore, the ability to code features and algorithms precisely in ChucK, within its object-oriented framework, means that the user can create new "white-box" features and classifiers for immediate usage. In summary, ChucK meets all our requirements for an MIR prototyping language, as we've defined above.

introduce some "working pipelines"

intent: give reader a feel for "if I have an idea and want to code it quickly, how would this work in chucK?" (and why would I not use matlab, m2k, etc.) ge? the working pipeline to my brain is busted

one potential pipeline

  • get an idea for an MIR algorithm (e.g., a new way to do speaker classification, or vowel-consonant, or perhaps a way to gauge the acoustics of the space we are in)
  • start writing code:
    • quickly connect together unit analyzers (FFT's, feature extractors, etc.) to create analysis network(s)
    • specify initial parameters
    • write control code, potentially in concurrent chuck processes (shreds). new shreds are specifies simply as functions, and then sporked when they are to be instantiated as running code; the advantage here is that all shreds can automatically synchronized is non-preemptive, thereby precluding many concurrent programming pitfalls (deadline, race conditions, and data corruption). One can focus on the content of the algorithm, rather than on underlying language details
    • run!! observe!!
    • tweak via on-the-fly programming: both parameters and underlying structures
    • done!!
    • adding synthesis would be straightforward (congruent but distinct analysis/synthesis framework)

another, high-level view

c++/java: idea -> c++ -> pain -> compile -> run -> pain -> done;

chuck: idea => chuck => run/edit/tweek => done;

case studies

We now present three case studies of music analysis systems that have been built in ChucK. The first is a trained vowel identification system with a GUI interface, which illustrates the conciseness of interactive ChucK code and ease with which GUIs can be incorporated. The second is an implementation of Bergstra et al.'s genre classification system, which illustrates the ease of porting a state-of-the-art MIR feature extraction and classification system to ChucK and applying it to real-time audio input. The third is a set of example applications created by Stanford students in a beginning computer music course taught by one of the authors, which illustrates the ability of ChucK to support new work by novices in MIR and computer music AAAge?. The code for case studies 1 and 2 is available to the public (see Section AAA).

1: Perry GUI vowel

In order to visualize spectral properties of a sound in real-time, a filter-based sub-band spectral analyzer was implemented in ChucK using the MAUI framework for graphical widgets \cite{maui}. The GUI appears in Figure AAA; sliders update to indicate the power in each octave sub-band in real-time. The entire program for this task is 37 lines of code, 12 of which manage the display.

With a simple modification, this code and display can be used to train a classifier on any task for which sub-band power is a useful feature, then perform classification (and useful visual feedback) on new inputs. For example, code to train a nearest-neighbor classifier to recognize each of 11 spoken vowels (and silence), to classify the vowel of new inputs, and to construct the training and feedback graphical interface shown in Figure AAA, requires under 100 lines of code. This includes the nearest-neighbor classifier coded from scratch (i.e., the KNN ChucK class was not used here).

Clearly, ChucK allows for concise specification of signal processing analysis tasks and for real-time GUI display and input, bitch!

2: Bergstra Adaboost

Of obvious concern to MIR researchers interested in ChucK is whether more complex feature extraction and learning algorithms are possible. To address this, we have implemented one of the genre classification system of Bergstra et al. described in \cite{bergstra}. They use AAA standard audio features, including AAA, AAA, and AAA as input to an AdaBoost classifier using decision stumps as a weak learner. This system performed AAAvery well at MIREX 200AAA, classifying with AAA% accuracy on a dataset containing AAA hours of music from AAA genres/artists.

The features used in \cite{AAA} are included as standard features in ChucK. The following code specifies their extraction from the microphone input:


Furthermore, the following code constructs the classifier:


We have embedded the code above in a simple, keyboard-driven interface for real-time, on-the-fly training and classification of audio. The user can specify that the incoming audio provides examples of a particular class (thereby initiating the construction of labeled Instances from features extracted from this audio), initiate the training rounds of AdaBoost using all available Instances, or direct the trained classifier to classify the incoming audio, outputting the class label to the screen.

Testing of this system reveals impressive strength of Bergstra's features and classifier; in fact, using only AAA seconds of audio from each of two artists, the system is able to classify new songs with AAA% accuracy. Furthermore, it is easy to use the keyboard interface to experiment with applying this system to new tasks, for example speaker identification and instrument identification, without modification to the algorithm or features.

3: Ge stanford stuff yo

conclusion & map out future work


We have made available the ChucK learning infrastructure described in Section AAA, code for case studies 1 and 2 described in Section AAA, as well as code for several other example tasks, as part of the Small Music Information Retrieval toolKit (SMIRK) \footnote{smirk webpage}. The predecessor to this toolkit is the Small Musically-Expressive Laptop Toolkit (SMELT), which provides a set of user interface examples and templates in ChucK, and which we have found to be an extremely useful educational tool for newcomers to ChucK interested in laptop performance. SMIRK will analogously provide a permanent repository for music information retrieval infrastructure and examples using ChucK.

Imminent work

Several improvements to the music information retrieval capabilities of ChucK are currently underway. First, support for asynchronous file I/O will allow the reading and writing of large datasets in a way that does not interfere with audio production. Second, additional classifiers are being added, and support for a classifier plug-in framework is being investigated. Support for modeling is also a top priority, beginning with hidden Markov models.

Discussion /take-home points

  • ChucK has expanded to a new frontier
  • We will be continuing to work on it, driven by our own applications as well as requests from the ChucK user community. We are excited at the possibility of this community expanding to include music information retrieval researchers.