Difference between revisions of "Taps paper"

From CSWiki
Jump to: navigation, search
m (INTRODUCTION)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''The Really Good TitleWithout TAPESTREA'''
+
done.
* Pasta Tree, E. Rat Pesta, Ape Treats
 
* Ananya Misra, Perry Cook, Ge Wang
 
* Department of Computer Science (also Music), Princeton University
 
* {amisra,prc,gewang}@cs.princeton.edu
 
 
 
 
 
= ABSTRACT =
 
Coming up eventually, along with references. (Note: don't forget these parts...)
 
 
 
= INTRODUCTION =
 
In the 1940s and 50s, Pierre Schaeffer developed musique
 
concrete. Unlike traditional music, musique concrete starts
 
with existing or concrete recorded sounds, which are organized
 
into abstract musical structures. The existing recordings
 
often include natural and industrial sounds that are not
 
conventionally musical, but can be manipulated to make music,
 
either by editing magnetic tape or now more commonly
 
through digital sampling. Typical manipulations include cutting,
 
copying, reversing, looping and changing the speed of
 
recorded segments.
 
 
 
Today, several other forms of electronic/electroacoustic
 
music also involve manipulating a set of recorded sounds.
 
Acousmatic music [cite Dhomont], for instance, evolved from
 
musique concrete and refers to compositions designed for environments
 
that emphasize the sound itself rather than the
 
performance-oriented aspects of the piece.
 
 
 
The acoustic ecology [cite Schafer] movement gave rise to
 
soundscape composition [cite Truax] or the creation of realistic
 
soundscapes from recorded environmental sounds. One
 
of the key features of soundscape composition, according to
 
Truax, is that "most pieces can be placed on a continuum between
 
what might be called `found sound' and `abstracted'
 
approaches." However, while "contemporary signal processing
 
techniques can easily render such sounds unrecognizable
 
and completely abstract," a soundscape composition piece remains
 
recognizable even at the abstract end of the continuum.
 
 
 
Sound designers for movies, theater and art often have a related goal of
 
starting with real world sounds and creating emotionally evocative sound
 
scenes, which are still real, yet transformed and transformative. Classic
 
examples include mixing a transformed lion's roar with other sounds to
 
accompany the wave sounds in ''The Perfect Storm'', and incorporating a
 
helicopter theme into the sound design for ''Black Hawk Down'' [cite Rudy]. 
 
These sound designers are "sound sculptors" as well, but transform sounds to
 
enhance or create a sense of reality, rather than for musical purposes.
 
 
 
Artists from all of the above backgrounds share the process
 
of manipulating recordings, but aim to achieve different
 
effects. We present a single framework for starting with
 
recordings and producing sounds that can lie anywhere on a
 
`found' to `unrecognizable' continuum. `Found' sounds can
 
be modified in subtle ways or extended indefinitely, while
 
moving towards the `unrecognizable' end of the spectrum unleashes
 
a range of manipulations beyond time-domain techniques.
 
In fact, the same set of techniques apply throughout
 
the continuum, differing only in how they are used. We call
 
this framework TAPESTREA: Techniques and Paradigms for
 
Expressive Synthesis, Transformation and Rendering of Environmental
 
Audio.
 
 
 
TAPESTREA manipulates recorded sounds in two phases.
 
In the analysis phase, the sound is separated into reusable
 
components that map to individual foreground events or background.
 
In the synthesis phase, these components are parametrically
 
transformed, combined and re-synthesized using
 
time- and frequency-domain techniques that can be controlled
 
on multiple levels. While we highlight the synthesis methods
 
in this paper, the analysis phase is also integral as it enables the most flexible means for dealing with real-world sonic material.
 
 
 
= RELATED WORK =
 
Related techniques used for musical composition include
 
spectral modeling synthesis [cite Serra/Smith] and granular
 
synthesis [cite Truax, Roads-microsound]. Spectral modeling synthesis separates a
 
sound into sinusoids and noise, and was originally used formodeling instrument sounds. Granular synthesis, in contrast,
 
functions in the time-domain and involves continuously controlling
 
very brief sonic events or sound grains. TAPESTREA
 
employs aspects of both, using separation techniques on environmental
 
sounds and controlling the temporal placement
 
of resulting events.
 
 
 
Another technique used in TAPESTREA is an extension
 
of a wavelet tree learning algorithm [cite Dubnov] for sound
 
texture synthesis. This method performs a wavelet decomposition
 
on a sound clip and uses machine learning on the
 
wavelet coefficients to generate similar non-repeating sound
 
texture. The algorithm works well for sounds that are mostly
 
stochastic, but can break up extended pitched portions. It can
 
also be slow in its original form. TAPESTREA takes advantage of this technique by
 
improving the speed of the algorithm, and only
 
using in on the types of sound for which it works well.
 
 
 
= ANALYSIS PHASE =
 
TAPESTREA starts by separating a recording into deterministic
 
events or the sinusoidal or pitched components of the
 
sound, transient events or brief noisy bursts of energy, and the
 
remaining stochastic background or din. This separation can
 
be parametrically controlled and takes place in the analysis
 
phase.
 
 
 
Deterministic events are foreground events extracted by
 
sinusoidal modeling based on the spectral modeling framework [cite Serra].
 
Overlapping frames of the sound are transformed into
 
the frequency domain using the FFT. For each spectral frame,
 
the n highest peaks above a specified magnitude threshold
 
are recorded, where n can range from 1 to 50. These peaks
 
can also be loaded from a preprocessed file. The highest
 
peaks from every frame are then matched across frames by
 
frequency, subject to a controllable "frequency sensitivity"
 
threshold, to form sinusoidal tracks. Tracks can be "mute"
 
or below the magnitude threshold for a specified maximum
 
number of frames, or can be discarded if they fail to satisfy
 
a minimum track length requirement. Undiscarded tracks
 
are optionally grouped [cite Ellis, Melih and Gonzalez?] by
 
harmonicity, common amplitude and frequency modulation,
 
and common onset/offset, to form deterministic events, which
 
are essentially collections of related sinusoidal tracks. If the
 
grouping option is not selected, each track is interpreted as a
 
separate deterministic event.
 
 
 
Transient events or brief noisy foreground events are usually
 
detected in the time-domain by observing changes in signal
 
energy over time [cite Verma and Meng, Bello et al.?].
 
TAPESTREA analyzes the recorded sound using a non-linear
 
one-pole envelope follower filter with a sharp attack and slow
 
decay and finds points where the derivative of the envelope is
 
above a threshold. These points mark sudden increases in energy
 
and are interpreted as transient onsets. A transient event
 
is considered to last for up to half a second from its onset; its
 
exact length can be controlled within that range.
 
 
 
The stochastic background represents parts of the recording
 
that constitute background noise, and is obtained by removing
 
the detected deterministic and transient events from
 
the original sound. Deterministic events are removed by eliminating
 
the peaks of each sinusoidal track from the corresponding
 
spectral frames. To eliminate a peak, the magnitudes
 
of the bins beneath the peak are smoothed down, while
 
the phase in these bins is randomized. Transient events, in
 
turn, are removed in the time-domain by applying wavelet
 
tree learning [cite Dubnov] to generate a sound clip that resembles
 
nearby transient-free segments of the original recording.
 
This synthesized "clean" background replaces the samples
 
containing the transient event to be removed.
 
 
 
Separating a sound into components in this way has several
 
advantages. The distinction between foreground and background
 
components is semantically clear to humans, who can
 
therefore work within the framework with a concrete understanding
 
of what each component represents. The different
 
types of components are also stored and processed differently
 
according to their defining characteristics, thus allowing
 
flexible transformations on individual components. Each
 
transformed component can be saved as a template and later
 
reloaded, reused, copied, further transformed, or otherwise
 
treated as a single object. In addition, the act of separating a
 
sound into smaller sounds makes it possible to "compose" a
 
variety of pieces by combining these constituents in diverse
 
ways.
 
 
 
Describe analysis user interface?
 
 
 
= SYNTHESIS PHASE =
 
Once the components of a sound have been separated and saved as templates, TAPESTREA allows each template to be transformed and synthesized individually. At this point, TAPESTREA also offers additional synthesis templates to control the placement or distribution of basic components in a composition. The transformation and synthesis options for the different template types are as follows:
 
 
 
== Deterministic Events ==
 
 
 
Deterministic events are synthesized from their tracks using sinusoidal re-synthesis. Frequency and magnitude between consecutive frames in a track are linearly interpolated, and time-domain samples are computed from this information.
 
 
 
The track representation allows considerable flexibility in applying frequency and time transformations on a deterministic event. The event's frequency (related to its pitch) can be linearly scaled before computing the time-domain samples, simply by multiplying the frequency at each point on its tracks by a specified factor. Similarly, the event can be stretched or shrunk in time by scaling the time values in the time-to-frequency trajectories of its tracks. This technique works for almost any frequency or time scaling factor without producing artifacts. Frequency and time transformations can take place in real-time in TAPESTREA, thus allowing an event to be stretched, shrunk or pitch shifted even as it is being synthesized.
 
 
 
== Transient Events ==
 
 
 
Since transient events are brief by definition, TAPESTREA stores them directly as time-domain audio frames. Synthesizing a transient event without any transformations, therefore, involves playing back the samples in the audio frame.
 
 
 
In addition, TAPESTREA allows time-stretching and pitch-shifting in transient events as well. This is implemented using a phase vocoder [cite Dolson], which limits the scaling factors to a range smaller and perhaps more reasonable than what is available for deterministic events, yet large enough to create noticeable effects.
 
 
 
Transient events by nature can also act as "grains" for traditional granular synthesis [cite Truax/Roads again?] The frequency and time transformation tools for transients, along with the additional synthesis templates described in Sections 4.4 to 4.6, can thus provide an interactive "granular synthesis" interface.
 
 
 
== Stochastic Background ==
 
 
 
The internal representation of a ''stochastic background'' template begins with a link to a sound file containing the related background component extracted in the analysis phase. However, merely looping through this sound file or randomly mixing segments of it does not produce a satisfactory background sound. Instead, our goal here is to generate ongoing background that sounds controllably similar to the original extracted stochastic background.
 
 
 
Therefore, the stochastic background is synthesized from the saved sound file using an extension of the wavelet tree learning algorithm [cite Dubnov]. In the original algorithm, the saved background sound
 
is decomposed into a wavelet tree where each node represents a wavelet coefficient, with depth corresponding to resolution. The wavelet coefficients are computed using the Daubechies wavelet
 
with 5 vanishing moments. A new wavelet tree is then built, with each node selected based on the similarity of its ancestors and its first ''k'' predecessors (nodes at the same depth but associated with earlier time samples) to corresponding sequences of nodes in the original tree. The learning algorithm also takes into account the amount of randomness desired. Finally, the new wavelet tree undergoes an inverse wavelet transform to provide the synthesized time-domains samples. This learning technique works best with the separated stochastic background as input, since the harmonic events it would otherwise chop up have been removed.
 
 
 
TAPESTREA uses a modified and optimized version of the algorithm, which follows the same basic step except in some details. For instance, the modified algorithm includes the option of incorporating randomness into the first level of learning, and also considers ''k'' as dependent on the depth of a node rather than being constant. More importantly, it optionally avoids learning the coefficients at the highest resolutions. These resolutions roughly correspond to high frequencies, and randomness at these levels does not significantly alter the results, while the learning involved in attaining that randomness takes the most time. Optionally stopping the learning at a lower level thus optimizes the algorithm and allows it to run in real-time.
 
 
 
Further, TAPESTREA offers interactive control over the learning parameters in the form of "randomness" and "similarity" parameters. The size of a sound segment to be analyzed as one unit can also be controlled, and results in a "smooth" synthesized background for larger sizes versus a more "chunky" background for smaller sizes. Creatively manipulating these parameters can, in fact, yield interesting (and weird) musical compositions generated through "stochastic background" alone.
 
 
 
== Event Loops ==
 
 
 
Event loops are synthesis templates designed to facilitate the parametric repetition of a single event. Any deterministic or transient event template can be formed into a loop. When the loop is played, instances of the associated event are synthesized at the specified density and periodicity, and within a specified range of random transformations. These parameters can be modified while the loop is playing, to let the synthesized sound change gradually.
 
 
 
The density refers to how many times the event is repeated per second, and could be on the order of 0.001 to 1000. At the higher densities, and especially for transient events, the synthesized sound is often perceived as continuous, thus resembling granular synthesis.
 
 
 
The periodicity, ranging from 0 to 1, denotes how periodic the repetition is, with a periodicity of 1 meaning that the event is repeated at fixed time intervals. The interval between consecutive occurrences of an event is generally determined by feeding the desired periodicity and density into a Gaussian random number generator. It is straightforward to replace this generator with one that follows a Poisson or other user-specified probability distribution.
 
 
 
In addition to the parameters for specifying the temporal placement of events, TAPESTREA allows each instance of the recurring event to be randomly transformed within a range. The range is determined by selected average frequency- and time-scale factors, and a randomness factor that dictates how far an individual transformation may vary from the average. Individual transformation parameters are uniformly selected from within this range. Apart from frequency and time scaling, the gain and pan of event instances can also randomly vary in the same way.
 
 
 
== Timelines ==
 
 
 
While a loop parametrically controls the repetition of a single event, with some amount of randomization, a timeline allows a template to be explicitly placed in time, in relation to other templates. Any number of existing templates can be added to a timeline, as well as deleted from it or re-positioned within it once they have been added.
 
 
 
A template's location on the timeline indicates its onset time with respect to when the timeline starts playing. When a timeline is played, each template on it is synthesized at the appropriate onset time, and is played for its duration or till the end of the timeline is reached. The duration of the entire timeline can be on the order of milliseconds to weeks, and may be modified after the timeline's creation.
 
 
 
TAPESTREA also allows the placement of timelines within timelines (or even within themselves). This allows for template placement to be controlled at multiple time-scales or levels, making for a "multiresolution synthesis."
 
 
 
== Mixed Bags ==
 
 
 
Another template for synthesis purposes is the mixed bag, which is designed to control the relative densities of multiple, possibly repeating, templates. Like a timeline, a mixed bag can contain any number of templates, but these templates are randomly placed in time and transformed, as in loops. The goal is to facilitate the synthesis of a composition with many repeating components, without specifying precisely when each event occurs. The real-time parameters for controlling this also enable the tone of a piece to change over time while using the same set of components, simply by synthesizing these components differently.
 
 
 
When a template is added to a mixed bag, it can be set to play either once or repeatedly. It also has a "likelihood" parameter, which determines the probability of that template's being played in preference over any of the other templates in the bag. Finally, it has a "randomness" parameter, which controls the range for random transformations on that template, analogous to the randomness control in event loops.
 
 
 
Beyond these individual template parameters, each mixed bag has overall periodicity and density settings, which control the temporal distribution of repeating templates in the same way that an event loop does. However, while an event loop plays instances of a single event, a mixed bag randomly selects a repeating template from its list whenever it is time to synthesize a new instance. Templates with higher likelihood settings are more likely to be selected for synthesis.
 
 
 
One way to think of a mixed bag is as a physical bag of marbles. The overall periodicity and density parameters determine how often someone dips his hand in the bag and pulls out a marble, or a template to be synthesized. The likelihood setting of a template or marble controls how likely it is for the hand to pull out that particular marble. A repeating marble is tossed back into the bag as soon as the hand has examined it.
 
 
 
== Score Language ==
 
 
 
The templates described so far are primarily manipulated through a visual interface. Finer control over the synthesis can be obtained through the use of a score language. The audio programming language ChucK [cite a few ChucK papers] can be used both for specifying precise parameter values and for controlling exactly how these values change over time.
 
 
 
* Paragraph or two about ChucK?
 
* Paragraph on key (i.e. implemented) features.
 
 
 
== Pitch and Time Quantizations ==
 
 
 
More customized musical control can be exerted by quantizing pitches and times to either standard or user-specified values.
 
 
 
The pitch of a deterministic event, as well as its frequency scaling factor, can be quantized to a scale defined by a pre-programmed or user-programmable pitch table. This supports the synthesis of scale-dependent music through transformations on a single or multiple deterministic events.
 
 
 
Rhythm can be specified in loops by quantizing the event density parameter to a time grid. Further, a timeline can be set to follow the same time grid, so that templates on a timeline can be triggered at particular beats.
 
 
 
* The ability to change the pitch table on a timeline... what exactly does this mean?
 
* These could probably be done through chuck, but maybe there's a clean way to do it through tapestrea itself.
 
 
 
== Other controls ==
 
 
 
Beyond the synthesis parameters mentioned so far, TAPESTREA offers some generic synthesis and playback controls. The gain and stereo panning of individual templates can be controlled individually, and can be randomly set for templates in event loops and mixed bags. A reverb effect adapted from STK [cite?] can also be added to the final synthesized sound.
 
 
 
= Interface =
 
 
 
TAPESTREA includes a visual interface for interactively analyzing and synthesizing sound. The analysis interface provides features for sinusoidal analysis and grouping, as well as transient separation and removal. The synthesis interface offers ways to transform and synthesize the components extracted during analysis, by manipulating the templates described in the previous section.
 
 
 
On the analysis side, a loaded sound is simultaneously displayed in the form of a waveform and a spectrogram. The spectrogram display can also be toggled with a frame-by-frame spectrum view. Selecting a rectangle on the spectrogram, or selecting an analysis region on the waveform and the frame-by-frame spectrum, limits the analysis to the associated time and frequency ranges. This facilitates the selection and extraction of specific deterministic events. Sinusoidal analysis parameters, such as the number of tracks to find per frame or the peak magnitude threshold, can be set via sliders before each extraction. The magnitude threshold can also be "tilted" so that different frequency bins have different thresholds. The frame-by-frame spectrum displays the threshold as a line, allowing the tilt to be visually adjusted.
 
 
 
After the separation takes place, the sinusoidal tracks found are marked on the spectrogram display. Each deterministic event found in this analysis can be individually played and saved as a template for use in the synthesis phase. The residue from the analysis can also be optionally saved, played, and loaded as a sound file for further analysis. To control how tracks are grouped into deterministic events, certain grouping parameters are also available as sliders. These include closeness thresholds for grouping by harmonics, common modulation and onset/offset time, as well as control over the amount by which two tracks should overlap in time to be included in the same event, and the minimum number of frames over which an event should last.
 
 
 
Similar controls are available for transient analysis. The attack and decay for the envelope follower filter, the energy threshold for what consitutes a transient, and the length of detected transients, can all be modified in real-time via sliders. Detected transients can be individually replayed and saved as templates. Finally, all detected transients can be removed to obtain a residue or background, which can again be saved, played or loaded for further analysis. Saved templates are also visible in a "library" view on the analysis interface.
 
 
 
The synthesis interface provides access to the current library of saved templates, displayed as objects based on their type. Templates saved to file from prior sittings can be loaded into the library, too. Selecting any template on the library displays a set of parameters suited to the template type, which exert control over its transformation and synthesis as described earlier. A selected template can be synthesized to generate sound at any time, including while its transformation parameters are being modified.
 
 
 
In this interface, it is also possible to instantiate new templates. Any existing template can be copied, while deterministic and transient event templates can also be saved as event loops. New timelines and mixed bags can be freely created, and existing templates can be dragged onto or off these as needed. Templates can also be deleted from the library, provided they are not being used in a timeline or a mixed bag. Finally, while sound is generally synthesized in real-time, TAPESTREA offers the option of writing the synthesized sound to file.
 
 
 
= Discussions/Contributions/Sound samples/ =
 
* what?
 
 
 
= CONCLUSIONS =
 
* Real references to be added.
 
 
 
= REFERENCES =
 
* Dannenberg, R. B. (1989). The Canon score language. Computer
 
Music Journal 13(1), 47ñ56.
 
* Dannenberg, R. B., C. L. Fraley, and P. Velikonja (1991). Fugue:
 
A functional language for sound synthesis. Computer 24(7),
 
36ñ42.
 
* Dannenberg, R. B. and C. W. Mercer (1992). Real-time software
 
synthesis on superscalar architectures. In Proceedings of the
 
International Computer Music Conference, pp. 174ñ177. International
 
Computer Music Association.
 
* Lindemann, E., F. Dechelle, B. Smith, and M. Starkier (1991).
 
The architecture of the IRCAM musical workstation. Computer
 
Music Journal 15(3), 41ñ49.
 
* Mathews, M. V. (1969). The Technology of Computer Music.
 
Cambridge, Massachusetts: MIT Press.
 

Latest revision as of 15:10, 12 March 2006

done.