ChucK/ISMIR2008

From CSWiki
Revision as of 05:29, 1 April 2008 by Gewang (talk | contribs) (3: Ge stanford stuff yo (aka ChucK as a MIR teaching tool))

Jump to: navigation, search

Our fancy ISMIR paper outline.

Notes for us

6 page limit.

Title: Support for MIR prototyping and real-time applications in the ChucK programming language

Key points:

  • There is no general purpose language for MIR prototyping that both gives access to analysis building blocks and allows for low-level coding
  • This sort of framework can be really useful for fast prototyping, flexible coding, and education
  • There is furthermore no such framework that does this that combines MIR and performance / synthesis; ChucK does this.
  • We're going to show what is available in the language, and examples of how to work with it to accomplish MIR tasks

Abstract

In this paper, we discuss the recent additions of audio analysis and machine learning infrastructure to the ChucK music programming language that make it a suitable and unique tool both for music information retrieval (MIR) system prototyping and for applying MIR algorithms in real-time music performance contexts. From its inception, ChucK has offered high-level control over building block components paired with low-level, sample-synchronous manipulation in a programming environment that encourages on-the-fly experimentation. The new analysis and learning capabilities of the language preserve this breadth of control options and ChucK's "do it yourself" approach to systems design, allowing the programmer to experiment with new features, signal processing techniques, and learning algorithms with ease and flexibility. Additionally, ChucK's new capabilities are tightly integrated into its synthesis framework, making it trivial to use the results of analysis and learning tasks to drive real-time music creation and interaction. In this paper, we motivate and describe the recent additions to the language, outline a ChucK-based approach to rapid MIR prototyping, and present three case studies in which we have applied ChucK to audio analysis and MIR tasks. We also introduce a new toolkit to facilitate experimentation with ChucK's new capabilities, which we hope will provide a complementary working framework for MIR researchers and lower the barriers for applying MIR algorithms in live music performance.


Introduction

ChucK began as a high-level programming language for music and sound synthesis, whose design goals included offering the musician user a wide breadth of programmable control-- from the structural level down to the sample level, using a clear and concise syntax, and employing a set of abstractions and built-in objects to facilitate rapid prototyping and live coding. We have recently expanded the language to provide support for audio analysis and machine learning, with two primary goals: first, to offer real-time and on-the-fly analysis and learning capabilities to computer music composers and performers; and second, to offer music information retrieval (MIR) researchers a new platform for rapid prototyping and for easily porting algorithms to a real-time performance context. Our previous papers \ref[icmc2007] and \ref[icmc2008] focus on former goal, and in this paper we deal with the latter.

We begin in section AAA by suggesting that music performance can and should be more significant among the focal points and application domains of MIR research, and we motivate the need for additional shared tools between MIR and performance. We also touch on the state of prototyping toolkits in MIR, and we describe the ChucK language as it is used for music creation, including prototyping and live coding systems. In section AAA, we describe in some detail how we have incorporated analysis and learning into the language, with attention both to preserving the flexible and powerful control that makes ChucK suited for prototyping and experimentation, and to tightly and naturally integrating the new functionality with ChucK's synthesis framework. Sections AAA and AAA illustrate the new potential of ChucK as an MIR rapid prototyping workbench, introducing an example working pipeline for prototyping an MIR task and presenting three case studies of how we have used ChucK to perform and teach music analysis. Finally, in Section AAA we discuss ongoing work to improve ChucK as an MIR tool and announce the release of a repository of examples and supporting code for MIR researchers desiring to experiment with the language.

(ge:) something about the importance of rapid prototyping, and also how rapid prototyping enables new MIR-based performance practices.

Background & Motivation

(re, ge)

MIR and performance (re)

Prototyping in MIR

Let us briefly digress and motivate our other major goal in this work, to provide a new rapid prototyping environment for MIR research. We summarize our basic requirements for an MIR prototyping environment as follows: the ability to design new signal processing algorithms, audio features, and learning algorithms; the ability to apply new and existing signal processing, feature extraction, and learning algorithms in new ways; and the ability to do these tasks quickly by taking advantage of high-level building blocks for common tasks, and by specifying the system either via a GUI or concise and clear code. There do exist several programming environments for MIR that accommodate many of the above requirements, including Matlab, M2K, Weka, and AAAmore?Marsyas?Clam?jAudio?, and their popularity suggests that they meet the needs of many MIR research tasks. We propose that ChucK meets all of these requirements and inhabits a unique place in the palette of tools at the MIR researcher's disposal, not only because of its easy accommodation of real-time and performance tasks, but also because of its particular approach to dealing with time. AAAchange?


Music information retrieval and music performance

(note: work in transition from prototyping -> performance, citing that prototyping naturally lends itself to a on-the-fly, real-time control and tuning of MIR-based performance systems.)

Research in music information retrieval has primarily focused on analyzing and understanding recorded music and other non-performative musical representations and metadata. However, many music information retrieval tasks, such as mood and style analysis, instrumentation and harmony identification, and transcription and score alignment are directly relevant to real-time interactive performance. A core focus of MIR is building computer systems that understand musical audio at a semantic level (\cite[downie?]), so that humans can search through, retrieve, visualize, and otherwise interact with musical data in a meaningful way. Making sense of audio data at this higher level is also essential to </i>machine musicianship</i>, wherein the performing computer-- like any musically trained human collaborator-- is charged with interacting with other performers in musically appropriate ways \cite[rowe].

Despite the shared need to bridge the "semantic gap" between low-level audio features and higher-level musical properties, there does not exist a shared tool set for accomplishing this task in computer music and MIR. MIR researchers employ an abundance of tools for performing signal processing, feature extraction, and machine learning and computer modeling to better understand audio data. Some of these tools are general-purpose signal processing or machine learning packages, such as Matlab or Weka AAAcite, and others such as CLAM, MARSYAS, M2K, and jAAA have been designed specifically for MIR AAAcite. Most of these languages and frameworks were not designed to be used for computer music performance, and most do not perform synthesis and do not suffice for real-time computation; as a result, none have widespread use among computer musicians. On the other hand, most computer music languages do not readily accommodate analysis of audio in the language; for example, spectral analysis and processing tasks must be coded as C++ externals in order to be used in SuperCollider or Max/MSP. Enterprising composers and performers have of course been writing such externals and standalone code to accomplish pitch tracking, score following, harmonic analysis, and other tasks for many years. However, the requirement that specialized code be pushed into externals is a barrier to rapid development and experimentation by programming novices and seasoned researchers alike. So, despite the many shared tasks of MIR researchers and computer musicians, the dominant programming paradigms in the two fields do not include natural avenues for code sharing or collaboration between these groups.

Computer musicians would undoubtedly benefit from lower barriers to adapting state-of-the-art MIR algorithms for their real-time performance needs. We additionally posit that MIR researchers can benefit from increased collaboration with real musicians. Many standard MIR research tasks mentioned above face challenges including copyright restrictions on obtaining data or releasing systems to the public, and difficulty or expense in obtaining ground truth. MIR systems for tasks such as transcription or mood or style classification in the context of a particular musical composition or performance paradigm can circumvent such problems: the relevant data is freely available (the composer or ensemble wants you to have it!), the ground truth may be well-defined, or it may be easy to construct (composers and performers have built-in incentive for the system to perform well). There is also the benefit that an MIR researcher can make an impact on people's experiences with music, which is sadly still hard for those researchers unaffiliated with the music industry.

In summary, one major goal of our work with ChucK is to provide a tool that meets the needs of computer musicians requiring MIR-like analysis algorithms, and of MIR researchers interested in producing tools that can be used in real-time, possibly in conjunction with sound synthesis and performance. We also hope that by making work at the intersection of MIR and computer music easier, we will encourage more work in this area, and facilitate a richer cross-pollination of these two fields than has been happening.

(note: relate back to prototyping, arguing that if we can do this in live performance, then it would naturally feedback into research)


Scratch: Research in music information retrieval has primarily focused on analyzing and understanding recorded music and other non-performative musical representations and metadata. However, many music information retrieval tasks, such as analyzing the mood, rhythm, style, harmony, or instrumentation of music are directly relevant to real-time interactive performance. Furthermore, many established approaches to these tasks are appropriate to a real-time context. However, there has been a paucity of tools and environments that accommodate MIR-style analysis and learning in addition to real-time synthesis and interactive performance. We have recently augmented the ChucK music programming language with analysis and learning capabilities in an effort to begin to bridge the gap between MIR and music performance, and to allow MIR researchers and computer music performers to leverage each other's expertise, tools, and experiences.

Also, cite Raphael in here somewhere.

Chuck (ge)

  • ChucK (ge)
    • A short history
    • Examples of how to use the language (esp. UGens & time control)
    • OTF & timeliness make suited for prototyping, but not originally for analysis
  • In summary, our goal was to modify ChucK & build tools in it to foster a tighter connection between MIR and performance, and to provide fast prototyping capabability in a new framework for MIRers

ChucK is an ongoing, open-source research experiment in designing a computer music programming language from the "ground-up". A main focus of the design was the precise programmability of time and concurrency, with a emphasis on encouraging concise, readable code. System throughput for real-time audio remains an important consideration, but first and foremost, the language was designed to provide maximal control and flexibility for the audio programmer. In particular, the various components of the design are as follows:

  • flexibility: allows programmers to specify both high-level and low-level time-based operations, in a single unified and well-defined mechanism
  • concurrency: programmers can craft and precisely synchronize parallel code modules that share both data and time
  • readability: the language attempts to provide a strong correspondence between code structure, time, and audio building blocks; chuck is fairly good at doing this, as the language is increasingly being used a teaching tool in computer music programs, including at Princeton, Stanford, Georgia Tech, CalArts.
  • a do-it-yourself approach: by combining the ease of high-level computer music environments with the expressiveness of lower-level languages, ChucK is able to support high-level musical/sonic representations, as well as the prototyping and implementation of low-level, "white-box" signal-processing elements in the same language.
  • on-the-fly: by leverage the ChucKian approach to programming, it is possible and often beneficial to write and experiment with code on-the-fly, allowing programs to be edited as they run.

(note: provide concise example of time/concurrency)

There are no fixed control rates in the language, explicitly leaving programmers to define their own rates for various parts of the system. For example, it's possible to assert control on any Unit Generator (UGen) at any point time in time, and at any rate in a sample-synchronous manner. Furthermore, many processes can share a central notion of time, making it possible to naturally reason about parallel code based on time. Next, the timing mechanism lends itself directly to a concurrent programming model, which is essential to expressively capture parallelism. Multiple processes (called shreds), each advancing time in its own manner, can be synchronized and serialized directly from the timing information. Using this timing/concurrency model, on-the-fly programming can be carried out by exchanging time-aware code segments. Together, these components form a system for experimenting with sound synthesis for composition and performance, as more recently for creation of real-time analysis and MIR based programs.


scratch More specifically, ChucK enables time itself to be computable, and allows a program to be self-aware in time and can control the rate of its own progress through time. There are no fixed control rates in the language, explicitly leaving programmers to define their own rates for various parts of the system. For example, it's possible to assert control on any Unit Generator (UGen) at any point time in time, and at any rate in a sample-synchronous manner. Furthermore, many processes can share a central notion of time, making it possible to naturally reason about parallel code based on time. Next, the timing mechanism lends itself directly to a concurrent programming model, which is essential to expressively capture parallelism. Multiple processes (called shreds), each advancing time in its own manner, can be synchronized and serialized directly from the timing information. Thus arises our concept of a strongly-timed language, in which processes have precise control over their own timing and synchronization. Using this timing/concurrency model, on-the-fly programming can be carried out by exchanging time-aware code segments. To further facilitate this, the Audicle provides a graphical environment in which to write ChucK programs on-the-fly, and to visualize the programs in terms of code, audio synthesis, concurrency, and timing, all in real-time. Together, ChucK, on-the-fly programming, and the Audicle form a system and workbench for experimenting with sound synthesis for composition and performance, as more recently for creation of real-time analysis and MIR based programs.

Recent additions to ChucK to allow for prototyping and realtime MIR

Unit analyzers

(ge)

  • Say what they are, ...
  • Available features!

In 2007, the authors introduced a language-based solution to combining audio analysis and synthesis in the same high-level programming environment of ChucK (cite icmc 2007, again?). The new analysis framework inherited the same sample-synchronous precision and clarity of the existing synthesis framework, while adding analysis-specific mechanisms where appropriate. The solution consisted of three key components. First, we introduced the notion of a Unit Analyzer (UAna), similar to its synthesis counterpart, the Unit Generator (UGen), but augmented with set of operations and semantics tailored towards analysis. Next, an augmented dataflow model with new datatypes, operators, and objects were provided both in the language and as an underlying system design/implementation. Third, the analysis framework makes use of the existing timing, concurrency, and on-the-fly programming mechanisms in ChucK as a way to precisely control analysis processes.

For example, it's possible to instantiate FFT/IFFT objects for spectral analysis, pass these via the analysis network to feature extractors, and allows concurrent processes to operate and be manipulated in a truly sample-synchronous fashion. The programmer has complete and dynamic control over analysis parameters such as FFT/IFFT sizes, analysis windows, hop sizes, how often (down to the sample) to take FFT or to extract features. (note: add short example of FFT + centroid). This programming model is capable of representing precisely timed algorithms as well as low-level languages such as C++ and Java. The primary advantages are threefold:

  • conciseness and readability: because ChucK takes care of the real-time audio and buffering in the audio synthesis/analysis frameworks, and due to the tailored-for-audio nature of the language, the same algorithm or system can be prototyped/implemented with much less code, often up to factor of 10x reduction in code and in development time (note, validate this claim somehow). As an example, a simple real-time system to do AAA in ChucK takes AAA lines of code, whereas the same system in C++/Marsyas/CLAM takes AAA lines of code. This isn't to say Marsyas/CLAM are verbose, but rather that it's encouraged by the underlying language (C++, Java).
  • rapid turnaround experimentation: this is really friggin' key. Through the application of on-the-fly programming and ChucK's concise audio programming syntax, one can quick prototype systems and sub-systems, changing parameters as well as the underlying structure of systems and experiencing the results almost immediately. (note: maybe clarify actual advantages, such as cutting down compiling time, trying things immediately, I don't know)
  • concurrency: with ChucK, it's possible and straightforward to write truly sample-synchronous, concurrent code for both audio synthesis and analysis, which is something that is difficult to achieve in C++/Java, or any libraries built in these languages. Due to the languages' (C++ and Java) support for preemptive, thread-based concurrency, it can be extremely challenging (and/or inefficient) to represent different parts of analysis/synthesis audio programs in different threads in a sample-synchronous way. Yet, such approaches to systems design can be highly beneficial. For example, consider a system where we want to perform multi-rate analysis/feature-extraction in parallel (e.g., different processes extract features at different rates and with different parameters), and collect and use the result in yet another process. This can be achieved in a few dozen lines of ChucK, whereas the same system in C++/Java would need to contend with issues of thread-instantiation, synchronization, data consistency, as well as buffering and bookkeeping of audio data and feature data. (figure: show multi-rate FFT/feature extraction)

These highly useful flexibilities comes with a tradeoff: system performance and throughput. The same system implemented in reasonable ChucK code would likely run much more efficiently in a optimized C++ implementation, due to the low-level nature of languages like C++ and Java and to their accompanying optimized compilers. In this regard, it may be desirable to leverage the flexibility and rapid experimentation abilities of ChucK to prototype a system (and/or parts of a system), and if needed, then implement a "production" system in C++/CLAM/Marsyas. Using ChucK as a prototyping workbench for MIR not only can drastically reduce coding time, but also suggest new directions and ideas. For researchers experimenting with new MIR algorithms, such a prototyping stage can be instrumental in crafting new systems and also testing for the feasibility of new ideas. We will present some example "working pipelines" for MIR prototyping in Section AAA.

Learning framework

A natural consequence of ChucK's new analysis capabilities is that analysis results can be computed and used as features whose relationship to high-level musical concepts can be learned via labeled examples and standard classification algorithms. Learning by example is a common and powerful technique in MIR, and this paradigm has been applied successfully to high-level concepts such as genre and artist labels \cite{aaa}, mood \cite{aaa}, and transcription \cite{mandel}. Examplar-based learning has also been important in computer music, where neural networks have for example been used in many tasks for over a decade \cite{aaa}. ChucK's learning framework was designed with the recognition that a general tool for learning could be useful for both MIR analysis tasks and creative compositional tasks, and with acknowledgment that classification of new inputs is a task that can naturally be performed in real time, as new input becomes available.

The learning framework is built in ChucK.

Weka \cite{weka} is a popular Java framework for applied machine learning in MIR and other fields, and we have designed ChucK's object-oriented learning architecture and class names to mirror aspects of Weka's to reduce the learning curve for MIR researchers familiar with Weka. Each data point, which consists of a feature vector and (optionally) a class label, is represented by an Instance object. A dataset, consisting of one or more Instance objects, is represented as an Instances object. All classifiers inherit from the Classifier class, which has functions for training on an Instances dataset and classifying a new Instance.


so we have designed ChucK's learning infrastructure based on Weka to make it object oriented and extensible in the future, and to reduce the learning curve for MIR users familiar with Weka. (note: cite, perhaps weaken link to weka).

Describe FeatureCollector, Instance, Instances, Classifier.

Describe available features in a table.

Make note that this is all implemented in chuck, so users can not only change classification by adjusting parameters (e.g., number of rounds for boosting), but also by changing the classifiers themselves (e.g., AdaBoost.M1 can easily be adapted to AdaBoost.MH).

Mention that an example of learning appears in Section AAA.

One consequence of making standard classification algorithms available in a real-time context is that the training stage itself can be executed on-the-fly, during a rehearsal or even a performance. Given an appropriate control interface for providing class labels in real-time, a user can choose the concept to be learned on the spot. For example, in a live-coding improvisation with a violinist and a vocalist, the programmer may wish the computer to execute one function whenever the singer sings, and another whenever the violinist plays. After specifying the desired features and classifier in the code, the programmer can direct the computer to generate new instances of the appropriate class from the

We refer the reader to \cite[icmc2008] for a deeper exploration of learning in real-time performance; here, we underscore that design goals of the learning component include making learning implementation easy enough to do in such a context, which is even more time-critical than a prototyping context. At the same time, the facts that feature extraction can be performed in ChucK with sample level control, and learning algorithms themselves are written in ChucK, ensure that specificity and flexibility are not sacrificed in the pursuit of supporting rapid development.

Let us point out that the implementation of ChucK's analysis and learning capabilities preserves its suitability for rapid prototyping, and extends this capability to many MIR tasks. In particular, the user retains sample-level control over audio processing, allowing easy implementation of new feature extraction methods and signal processing algorithms directly in the language. Learning algorithms are also implemented in the language, affording the user arbitrary modification of these algorithms at any time, without the need to recompile. Additionally, the user can choose from many standard MIR features and several standard classifiers that have already been implemented, and use these out-of-the-box. Furthermore, the ability to code features and algorithms precisely in ChucK, within its object-oriented framework, means that the user can create new "white-box" features and classifiers for immediate usage. In summary, ChucK meets all our requirements for an MIR prototyping language, as we've defined above.

introduce some "working pipelines"

intent: give reader a feel for "if I have an idea and want to code it quickly, how would this work in chucK?" (and why would I not use matlab, m2k, etc.) ge? the working pipeline to my brain is busted

one potential pipeline

  • Let's say a researcher/computer musician performer get an idea for an analysis/MIR-based algorithm (e.g., a new way to do speaker/instrument identification, or vowel-consonant classification, or perhaps a way to gauge the acoustics of the space the performance is current in)
  • the chuckian prototyping, one can start writing relevant code immediately:
    • quickly instantiate connect together unit analyzers (FFT's, feature extractors, etc.) to create analysis network(s)
    • specify any initial parameters
    • write control code, potentially in concurrent chuck processes (shreds). new shreds are specified simply as functions, and then sporked when they are to be instantiated as running code; the advantage here is that all shreds can automatically synchronized, is non-preemptive, thereby precluding many concurrent programming pitfalls (deadlock, race conditions, and data inconsistency). One can focus on the content of the algorithm, rather than on underlying language details and synchronization nonsense.
    • as soon as we have any preliminary parts of the system written, we can immediately run the code and observe!!
    • as changes are desired, the programmer can tweak via on-the-fly programming: both parameters and underlying structures. For example, the programmer can easily modify STFT window type, size, FFT size on-the-fly, and the hop size is a natural consequence of advancing time in the analysis loop, and controlling/modifying hop sizes dynamically is also trivial to do. More importantly, the consequences of these changes are be experienced/verified immediately (e.g., on the order of ms). Such near-zero delay prototyping feedback loop can be essential for tuning a system for optimal performance. This is where ChucK's powers can be best harnessed in a prototyping situation. The intent here is that a developer would spend most of their time in this stage (as opposed to the "offline coding" stage).
    • iterate until done!!
  • one advantage of such pipeline is that the programmer is encouraged to write the system in real-time as much as possible, 1) making it straightforward to modify this to non real-time, and 2) it's immediately adaptable to real-time performance settings
  • adding synthesis would be straightforward (congruent but distinct analysis/synthesis framework)

another, high-level view

c++/java: idea -> c++ -> pain -> compile -> run -> pain -> done;

chuck: idea => chuck => run/edit/tweek => done;

case studies

We now present three case studies of music analysis systems that have been built in ChucK. The first is a trained vowel identification system with a GUI interface, which illustrates the conciseness of interactive ChucK code and ease with which GUIs can be incorporated. The second is an implementation of Bergstra et al.'s genre classification system, which illustrates the ease of porting a state-of-the-art MIR feature extraction and classification system to ChucK and applying it to real-time audio input. The third is a set of example applications created by Stanford students in a beginning computer music course taught by one of the authors, which illustrates the ability of ChucK to support new work by novices in MIR and computer music AAAge?. The code for case studies 1 and 2 is available to the public (see Section AAA).

1: Perry GUI vowel

In order to visualize spectral properties of a sound in real-time, a filter-based sub-band spectral analyzer was implemented in ChucK using the MAUI framework for graphical widgets \cite{maui}. The GUI appears in Figure AAA; sliders update to indicate the power in each octave sub-band in real-time. The entire program for this task is 37 lines of code, 12 of which manage the display.

With a simple modification, this code and display can be used to train a classifier on any task for which sub-band power is a useful feature, then perform classification (and useful visual feedback) on new inputs. For example, code to train a nearest-neighbor classifier to recognize each of 11 spoken vowels (and silence), to classify the vowel of new inputs, and to construct the training and feedback graphical interface shown in Figure AAA, requires under 100 lines of code. This includes the nearest-neighbor classifier coded from scratch (i.e., the KNN ChucK class was not used here).

Clearly, ChucK allows for concise specification of signal processing analysis tasks and for real-time GUI display and input, bitch!

(perry: estimation of lines of code to do the same in C++/C/Java)

2: Bergstra Adaboost

Of obvious concern to MIR researchers interested in ChucK is whether more complex feature extraction and learning algorithms are possible. To address this, we have implemented one of the genre and artist classification systems of Bergstra et al. described in \cite{bergstra}. They use eight types of standard audio features, including FFT coefficients, real cepstral coefficients, mel-frequency cepstral coefficients, zero-crossing rate, spectral spread, spectral centroid, spectral rolloff, and LPC coefficients. Means and variances are computed for each feature over a segment of consecutive frames, then classified by an AdaBoost classifier using decision stumps as the weak learner. This system performed in the top two submissions for genre and artist classification at MIREX 2005.

All eight features used in \cite{AAA} are included as standard features in ChucK. The following code specifies their extraction from the microphone input and instantiates an AdaBoost classifier:

adc => FFT fft => FeatureCollector fc; //extract fft adc => ZCR zcr => fc; // extract other features from raw audio adc => MFCC mfcc => fc; adc => RCC rcc => fc; fft =^ SpectralSpread spread => fc; //extract spectral features from fft fft =^ Centroid centroid => fc; fft =^ Rolloff rolloff => fc;

AdaBoostMH adaboost; // ... set parameters of classifier

To compute all features every frame (without overlap), and classify each frame with a trained classifier, one can spork the following function (training on labeled input is analogous):

while (true) {

    FRAME_LEN::samp => now;
    fc.upchuck() // triggers computation
    new Instance @=> Instance i;
    i.setFeatures(fc.featureVector());
    //make a prediction
    adaboost.test(i) => int predictedClass; 
    // ... etc.

}

Incorporating Bergstra's computation of mean and variance for a set of contiguous frames involves a few more lines of code.

We have embedded the code above in a simple, keyboard-driven interface for real-time, on-the-fly training and classification of audio. The user can specify that the incoming audio provides examples of a particular class (thereby initiating the construction of labeled Instances from features extracted from this audio), initiate the training rounds of AdaBoost using all available Instances, or direct the trained classifier to classify the incoming audio, outputting the class label to the screen.

Testing of this system reveals impressive strength of Bergstra's features and classifier; in fact, using only 6 seconds of audio from each of two artists, the system is able to classify new songs with over 80% accuracy. Furthermore, it is easy to use the keyboard interface to experiment with applying this system to new tasks, for example speaker identification and instrument identification, without modification to the algorithm or features.

3: Ge stanford stuff yo (aka ChucK as a MIR teaching tool)

In this case study, we evaluate the framework in the context of using ChucK to teach audio analysis and basic MIR in introductory computer music courses at Stanford University's Center for Computer Research for Music and Acoustics and at Princeton University. By leveraging the clarity and flexibility of representing analysis and MIR concepts in the language, as well as the rapid prototyping / on-the-fly programming aspects of the system, the instructors were able to teach the following topics to students familiar with the basics of ChucK:

  • practical real-time short-time fourier analysis on actual audio signals (both recorded and live), on-the-fly demonstrating the effects of window type, size, FFT size, and hop size on aspects of the analysis.
  • basics of audio features, their algorithms, extraction, showing in real time how different audio signals can be captured by features such as spectral and time-domain features.
  • basics of feature-based classification, ranging from speaker identification, to vowel/consonant-based panning, to how learning might be used to recognized interactive gestures.
  • spectral processing, basic pitch extraction and beat tracking.

Additionally, the concepts were taught in conjunction with a large class project to design a "computer-mediated" performance. By using the rapid prototyping approach in a pedagogical context, the student were able to quickly gain valuable working intuitions about both MIR and how one might go about building MIR software systems. Although the usage of analysis/MIR algorithms were not required as part of the project, more than one-third of the class employed extracted high-level semantic information as components of their final project. These ranged from real-time algorithmic processes that used extracted features to help craft compelling accompaniments to a live flutist, to analysis on speech that were transformed into gestures controlling a disklavier, to amplitude/pitch event triggered generative clouds. While these projects use basic MIR components, it was encouraging to witness the students (most of whom had not worked with MIR/audio analysis before) eagerly and efficiently experiment in a 2-3 week period, and craft success musical performances from these investigations. We also believe this ChucK-based framework also can be helpful for teaching in more advanced MIR courses - the language is straightforward to read, and does not attempt to abstract low-level parameters, nor their timing or inter-shred scheduling.

conclusion & map out future work

SmirK

We have made available the ChucK learning infrastructure described in Section AAA, code for case studies 1 and 2 described in Section AAA, as well as code for several other example tasks, as part of the Small Music Information Retrieval toolKit (SMIRK) \footnote{smirk webpage}. The predecessor to this toolkit is the Small Musically-Expressive Laptop Toolkit (SMELT) \footnote{SMELT}, which provides a set of user interface examples and templates in ChucK, and which we have found to be an extremely useful educational tool for newcomers to ChucK interested in laptop performance. SMIRK will analogously provide a permanent and growing repository for music information retrieval infrastructure and examples using ChucK, targeted to both researchers as well as educators and students.

Imminent work

Several improvements to the music information retrieval capabilities of ChucK are currently underway. First, support for asynchronous file I/O will allow the reading and writing of large datasets in a way that does not interfere with audio production. Second, additional classifiers are being added (though we emphasize that anyone can easily integrate their own ChucK classifiers into this framework by inheriting from the Classifier class). Support for modeling is also a top priority, beginning with hidden Markov models.

Conclusion

Our recent additions to ChucK have greatly expanded its capabilities beyond the realm of sound synthesis. Analysis and learning support


  • ChucK has expanded to a new frontier
  • We will be continuing to work on it, driven by our own applications as well as requests from the ChucK user community. We are excited at the possibility of this community expanding to include music information retrieval researchers.