From CSWiki
Jump to: navigation, search

Kevin Chou's Final Project: A Speech-Centric Ambience Generator

The Project

This project is a first attempt to generate a soundscape from the audio qualities of speech. In this initial implementation, certain features of audio were sonified as a complement to an input speech file (see Future Work). You can listen to a sonified version of Martin Luther King Jr.'s "I have a Dream" speech here.

(Boring) Details

An important feature of for this speech based soundscape generator is the use of a sliding buffer that will hold the data for the four features implemented in ChucK's unit analyzers--Centroid, Flux, RMS, and Rolloff. The raw data taken at the sample rate (more specifically based on the size of the FFT window), is not terribly useful or coherent--it is much more useful to find the how the data changes over time. Toward this end, averages and standard deviations were implemented at real time using a running total method--recalculating each average and standard deviation after every new data point could not keep up with the sampling rate. This buffer-statistical model is the basis for the sonic qualities of the project.

Running the Code

Running the code:

  • Make sure you have miniAudicle or ChucK: ChucK
  • Careful with directories! The home directory should point to the folder containing the ChucK files or just run it on Command Line/Terminal.
  • To run the program: % chuck will automatically add all needed files. Enjoy!

Download the Code

Find it here:

Future Work

The original goal for this project was to implement a speech-centric-emotion-musicalizer (is that a word?). In other words, the program would analyze a speech from file or from a mic and output an ambiance that would follow the tone, emotion, or other attributes within the speech. Thus, the program would sonically convey a passionate or boring speech in a way that would musically make sense.

Toward this goal:

  • A more thorough investigation of speech qualities and its correlation to the features of audio. This will entail a look at audio from many sources--speech my many different people--in order to find the audio feature correlations to certain emotions.
  • An exploration of other sonic descriptors for emotions. Potential sources are natural recordings.
  • An evaluation of the current code--attempting to streamline the code to prevent memory overflow. Also needed is a rethinking of the implementation of the sounds.