Difference between revisions of "Taps paper"

From CSWiki
Jump to: navigation, search
(Transient Events)
(Stochastic Background)
Line 182: Line 182:
  
 
== Stochastic Background ==
 
== Stochastic Background ==
 +
 +
The internal representation of a ''stochastic background'' template begins with a link to a sound file containing the related background component extracted in the analysis phase. However, merely looping through this sound file or randomly mixing segments of it does not produce a satisfactory background sound. Instead, our goal here is to generate ongoing background that sounds controllably similar to the original extracted stochastic background.
 +
 +
Therefore, the stochastic background is synthesized from the saved sound file using an extension of the wavelet tree learning algorithm [cite Dubnov]. In the original algorithm, the saved background sound
 +
is decomposed into a wavelet tree where each node represents a
 +
wavelet coefficient, with depth corresponding to resolution. The
 +
wavelet coefficients are computed using the Daubechies wavelet
 +
with 5 vanishing moments. A new wavelet tree is then built, with
 +
each node selected based on the similarity of its ancestors and its first
 +
''k'' predecessors (nodes at the same depth but associated with earlier
 +
time samples) to corresponding sequences of nodes in the original
 +
tree. The learning algorithm also takes into account the amount of
 +
randomness desired. Finally, the new wavelet tree undergoes an inverse wavelet transform
 +
 +
We added the option of incorporating randomness into the first step
 +
of the learning and modified k to be a fraction of the total number
 +
of nodes at the current depth, instead of a fixed number. We
 +
also found that we can avoid learning the coefficients at the highest
 +
resolutions, without perceptually altering the results. Since the
 +
wavelet tree is binary, every additional level learned approximately
 +
doubles the learning time. Skipping the highest level learning layers
 +
decreases this time by close to half. This optimization allowed
 +
us to build a real-time version of the wavelet tree analysis and synthesis.
 +
In addition, interactive control over the learning parameters
 +
allows users to immediately observe the effects of changing specific
 +
parameters, and to adjust them accordingly without restarting
 +
the process multiple times. The wavelet tree learning also works
 +
better with the separated stochastic background as input since the
 +
harmonic events it would otherwise garble have been removed.
 +
 
* Wavelet tree learning.
 
* Wavelet tree learning.
 
* Improvements can go here.
 
* Improvements can go here.

Revision as of 18:40, 7 March 2006

The Really Good TitleWithout TAPESTREA

  • Pasta Tree, E. Rat Pesta, Ape Treats
  • Ananya Misra, Perry Cook, Ge Wang
  • Department of Computer Science (also Music), Princeton University
  • {amisra,prc,gewang}@cs.princeton.edu


ABSTRACT

Coming up eventually, along with references. (Note: don't forget these parts...)

INTRODUCTION

In the 1940s and 50s, Pierre Schaeffer developed musique concrete. Unlike traditional music, musique concrete starts with existing or concrete recorded sounds, which are organized into abstract musical structures. The existing recordings often include natural and industrial sounds that are not conventionally musical, but can be manipulated to make music, either by editing magnetic tape or now more commonly through digital sampling. Typical manipulations include cutting, copying, reversing, looping and changing the speed of recorded segments.

Today, several other forms of electronic/electroacoustic music also involve manipulating a set of recorded sounds. Acousmatic music [cite Dhomont], for instance, evolved from musique concrete and refers to compositions designed for environments that emphasize the sound itself rather than the performance-oriented aspects of the piece.

The acoustic ecology [cite Schafer] movement gave rise to soundscape composition [cite Truax] or the creation of realistic soundscapes from recorded environmental sounds. One of the key features of soundscape composition, according to Truax, is that "most pieces can be placed on a continuum between what might be called `found sound' and `abstracted' approaches." However, while "contemporary signal processing techniques can easily render such sounds unrecognizable and completely abstract," a soundscape composition piece remains recognizable even at the abstract end of the continuum.

Sound designers for movies, theater and art often have a related goal of starting with real world sounds and creating emotionally evocative sound scenes, which are still real, yet transformed and transformative. Classic examples include mixing a transformed lion's roar with other sounds to accompany the wave sounds in "Perfect Storm," and the sound design for "Black Hawk Down" [cite Paul Rudy's Miami ICMC paper and provide another informative clause or something]. These sound designers are "sound sculptors" as well, but transform sounds to enhance or create a sense of reality, rather than for musical purposes.

Artists from all of the above backgrounds share the process of manipulating recordings, but aim to achieve different effects. We present a single framework for starting with recordings and producing sounds that can lie anywhere on a `found' to `unrecognizable' continuum. `Found' sounds can be modified in subtle ways or extended indefinitely, while moving towards the `unrecognizable' end of the spectrum unleashes a range of manipulations beyond time-domain techniques. In fact, the same set of techniques apply throughout the continuum, differing only in how they are used. We call this framework TAPESTREA: Techniques and Paradigms for Expressive Synthesis, Transformation and Rendering of Environmental Audio.

TAPESTREA manipulates recorded sounds in two phases. In the analysis phase, the sound is separated into reusable components that map to individual foreground events or background. In the synthesis phase, these components are parametrically transformed, combined and re-synthesized using time- and frequency-domain techniques that can be controlled on multiple levels. While we highlight the synthesis methods in this paper, the analysis phase is also integral as it enables the most flexible means for dealing with real-world sonic material.

RELATED WORK

Related techniques used for musical composition include spectral modeling synthesis [cite Serra/Smith] and granular synthesis [cite Truax, Roads-microsound]. Spectral modeling synthesis separates a sound into sinusoids and noise, and was originally used formodeling instrument sounds. Granular synthesis, in contrast, functions in the time-domain and involves continuously controlling very brief sonic events or sound grains. TAPESTREA employs aspects of both, using separation techniques on environmental sounds and controlling the temporal placement of resulting events.

Another technique used in TAPESTREA is an extension of a wavelet tree learning algorithm [cite Dubnov] for sound texture synthesis. This method performs a wavelet decomposition on a sound clip and uses machine learning on the wavelet coefficients to generate similar non-repeating sound texture. The algorithm works well for sounds that are mostly stochastic, but can break up extended pitched portions. It can also be slow in its original form. TAPESTREA takes advantage of this technique by improving the speed of the algorithm, and only using in on the types of sound for which it works well.

ANALYSIS PHASE

TAPESTREA starts by separating a recording into deterministic events or the sinusoidal or pitched components of the sound, transient events or brief noisy bursts of energy, and the remaining stochastic background or din. This separation can be parametrically controlled and takes place in the analysis phase.

Deterministic events are foreground events extracted by sinusoidal modeling based on the spectral modeling framework [cite Serra]. Overlapping frames of the sound are transformed into the frequency domain using the FFT. For each spectral frame, the n highest peaks above a specified magnitude threshold are recorded, where n can range from 1 to 50. These peaks can also be loaded from a preprocessed file. The highest peaks from every frame are then matched across frames by frequency, subject to a controllable "frequency sensitivity" threshold, to form sinusoidal tracks. Tracks can be "mute" or below the magnitude threshold for a specified maximum number of frames, or can be discarded if they fail to satisfy a minimum track length requirement. Undiscarded tracks are optionally grouped [cite Ellis, Melih and Gonzalez?] by harmonicity, common amplitude and frequency modulation, and common onset/offset, to form deterministic events, which are essentially collections of related sinusoidal tracks. If the grouping option is not selected, each track is interpreted as a separate deterministic event.

Transient events or brief noisy foreground events are usually detected in the time-domain by observing changes in signal energy over time [cite Verma and Meng, Bello et al.?]. TAPESTREA analyzes the recorded sound using a non-linear one-pole envelope follower filter with a sharp attack and slow decay and finds points where the derivative of the envelope is above a threshold. These points mark sudden increases in energy and are interpreted as transient onsets. A transient event is considered to last for up to half a second from its onset; its exact length can be controlled within that range.

The stochastic background represents parts of the recording that constitute background noise, and is obtained by removing the detected deterministic and transient events from the original sound. Deterministic events are removed by eliminating the peaks of each sinusoidal track from the corresponding spectral frames. To eliminate a peak, the magnitudes of the bins beneath the peak are smoothed down, while the phase in these bins is randomized. Transient events, in turn, are removed in the time-domain by applying wavelet tree learning [cite Dubnov] to generate a sound clip that resembles nearby transient-free segments of the original recording. This synthesized "clean" background replaces the samples containing the transient event to be removed.

Separating a sound into components in this way has several advantages. The distinction between foreground and background components is semantically clear to humans, who can therefore work within the framework with a concrete understanding of what each component represents. The different types of components are also stored and processed differently according to their defining characteristics, thus allowing flexible transformations on individual components. Each transformed component can be saved as a template and later reloaded, reused, copied, further transformed, or otherwise treated as a single object. In addition, the act of separating a sound into smaller sounds makes it possible to "compose" a variety of pieces by combining these constituents in diverse ways.

Describe analysis user interface?

SYNTHESIS PHASE

Once the components of a sound have been separated and saved as templates, TAPESTREA allows each template to be transformed and synthesized individually. At this point, TAPESTREA also offers additional synthesis templates to control the placement or distribution of basic components in a composition. The transformation and synthesis options for the different template types are as follows:

Deterministic Events

Deterministic events are synthesized from their tracks using sinusoidal re-synthesis. Frequency and magnitude between consecutive frames in a track are linearly interpolated, and time-domain samples are computed from this information.

The track representation allows considerable flexibility in applying frequency and time transformations on a deterministic event. The event's frequency (related to its pitch) can be linearly scaled before computing the time-domain samples, simply by multiplying the frequency at each point on its tracks by a specified factor. Similarly, the event can be stretched or shrunk in time by scaling the time values in the time-to-frequency trajectories of its tracks. This technique works for almost any frequency or time scaling factor without producing artifacts. Frequency and time transformations can take place in real-time in TAPESTREA, thus allowing an event to be stretched, shrunk or pitch shifted even as it is being synthesized.

Transient Events

Since transient events are brief by definition, TAPESTREA stores them directly as time-domain audio frames. Synthesizing a transient event without any transformations, therefore, involves playing back the samples in the audio frame.

In addition, TAPESTREA allows time-stretching and pitch-shifting in transient events as well. This is implemented using a phase vocoder [cite Dolson], which limits the scaling factors to a range smaller and perhaps more reasonable than what is available for deterministic events, yet large enough to create noticeable effects.

Transient events by nature can also act as "grains" for traditional granular synthesis [cite Truax/Roads again?] The frequency and time transformation tools for transients, along with the additional synthesis templates described in Sections 4.4 to 4.6, can thus provide an interactive "granular synthesis" interface.

Stochastic Background

The internal representation of a stochastic background template begins with a link to a sound file containing the related background component extracted in the analysis phase. However, merely looping through this sound file or randomly mixing segments of it does not produce a satisfactory background sound. Instead, our goal here is to generate ongoing background that sounds controllably similar to the original extracted stochastic background.

Therefore, the stochastic background is synthesized from the saved sound file using an extension of the wavelet tree learning algorithm [cite Dubnov]. In the original algorithm, the saved background sound is decomposed into a wavelet tree where each node represents a wavelet coefficient, with depth corresponding to resolution. The wavelet coefficients are computed using the Daubechies wavelet with 5 vanishing moments. A new wavelet tree is then built, with each node selected based on the similarity of its ancestors and its first k predecessors (nodes at the same depth but associated with earlier time samples) to corresponding sequences of nodes in the original tree. The learning algorithm also takes into account the amount of randomness desired. Finally, the new wavelet tree undergoes an inverse wavelet transform

We added the option of incorporating randomness into the first step of the learning and modified k to be a fraction of the total number of nodes at the current depth, instead of a fixed number. We also found that we can avoid learning the coefficients at the highest resolutions, without perceptually altering the results. Since the wavelet tree is binary, every additional level learned approximately doubles the learning time. Skipping the highest level learning layers decreases this time by close to half. This optimization allowed us to build a real-time version of the wavelet tree analysis and synthesis. In addition, interactive control over the learning parameters allows users to immediately observe the effects of changing specific parameters, and to adjust them accordingly without restarting the process multiple times. The wavelet tree learning also works better with the separated stochastic background as input since the harmonic events it would otherwise garble have been removed.

  • Wavelet tree learning.
  • Improvements can go here.

Event Loops

  • For repeating a single event.
  • Random frequency and time transformations on each instance.
  • Periodicity of event distributionñGaussian.
  • Can also be replaced by other distributions such as Poisson or

your own invention.

  • Density of eventsñhow frequently they recurñrelates to granular

synthesis for transients.

Timelines

  • Explicitly state when a template is played in relation to

other templates.

  • Timelines within timelines... multiresolution synthesis?

Mixed Bags

  • Controlling relative density of multiple templates...
  • Combines features of timelines and loops.

Score Language

  • ChucK
  • Precise control beyond what sliders can provide.
  • Any other speci�c features?

Pitch and Time Quantizations

  • Quantize pitch to a scale.
  • A few pre-programmed pitch tables or one that's user programmable.
  • The ability to change pitch table on a timeline would be good

(hmm).

  • Quantizing time to a time grid for rhythm.

These could probably be done through chuck, but maybe there's a clean way to do it through tapestrea itself.

Other controls

  • Reverb, gain, pan

Discussions/Contributions/Sound samples/

  • what?

CONCLUSIONS

  • Real references to be added.

REFERENCES

  • Dannenberg, R. B. (1989). The Canon score language. Computer

Music Journal 13(1), 47ñ56.

  • Dannenberg, R. B., C. L. Fraley, and P. Velikonja (1991). Fugue:

A functional language for sound synthesis. Computer 24(7), 36ñ42.

  • Dannenberg, R. B. and C. W. Mercer (1992). Real-time software

synthesis on superscalar architectures. In Proceedings of the International Computer Music Conference, pp. 174ñ177. International Computer Music Association.

  • Lindemann, E., F. Dechelle, B. Smith, and M. Starkier (1991).

The architecture of the IRCAM musical workstation. Computer Music Journal 15(3), 41ñ49.

  • Mathews, M. V. (1969). The Technology of Computer Music.

Cambridge, Massachusetts: MIT Press.