Difference between revisions of "Taps paper"

From CSWiki
Jump to: navigation, search
(Transient Events)
 
(21 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''The Really Good TitleWithout TAPESTREA'''
+
done.
* Pasta Tree, E. Rat Pesta, Ape Treats
 
* Ananya Misra, Perry Cook, Ge Wang
 
* Department of Computer Science (also Music), Princeton University
 
* {amisra,prc,gewang}@cs.princeton.edu
 
 
 
 
 
= ABSTRACT =
 
Coming up eventually, along with references. (Note: don't forget these parts...)
 
 
 
= INTRODUCTION =
 
In the 1940s and 50s, Pierre Schaeffer developed musique
 
concrete. Unlike traditional music, musique concrete starts
 
with existing or concrete recorded sounds, which are organized
 
into abstract musical structures. The existing recordings
 
often include natural and industrial sounds that are not
 
conventionally musical, but can be manipulated to make music,
 
either by editing magnetic tape or now more commonly
 
through digital sampling. Typical manipulations include cutting,
 
copying, reversing, looping and changing the speed of
 
recorded segments.
 
 
 
Today, several other forms of electronic/electroacoustic
 
music also involve manipulating a set of recorded sounds.
 
Acousmatic music [cite Dhomont], for instance, evolved from
 
musique concrete and refers to compositions designed for environments
 
that emphasize the sound itself rather than the
 
performance-oriented aspects of the piece.
 
 
 
The acoustic ecology [cite Schafer] movement gave rise to
 
soundscape composition [cite Truax] or the creation of realistic
 
soundscapes from recorded environmental sounds. One
 
of the key features of soundscape composition, according to
 
Truax, is that "most pieces can be placed on a continuum between
 
what might be called `found sound' and `abstracted'
 
approaches." However, while "contemporary signal processing
 
techniques can easily render such sounds unrecognizable
 
and completely abstract," a soundscape composition piece remains
 
recognizable even at the abstract end of the continuum.
 
 
 
Sound designers for movies, theater and art often have
 
a related goal of starting with real world sounds and creating
 
emotionally evocative sound scenes, which are still real,
 
yet transformed and transformative. Classic examples include
 
mixing a transformed lion's roar with other sounds to accompany
 
the wave sounds in "Perfect Storm," and the sound design for "Black Hawk Down" [cite Paul Rudy's Miami ICMC paper and provide another informative clause or something]. These sound designers
 
are "sound sculptors" as well, but transform sounds
 
to enhance or create a sense of reality, rather than for musical
 
purposes.
 
 
 
Artists from all of the above backgrounds share the process
 
of manipulating recordings, but aim to achieve different
 
effects. We present a single framework for starting with
 
recordings and producing sounds that can lie anywhere on a
 
`found' to `unrecognizable' continuum. `Found' sounds can
 
be modified in subtle ways or extended indefinitely, while
 
moving towards the `unrecognizable' end of the spectrum unleashes
 
a range of manipulations beyond time-domain techniques.
 
In fact, the same set of techniques apply throughout
 
the continuum, differing only in how they are used. We call
 
this framework TAPESTREA: Techniques and Paradigms for
 
Expressive Synthesis, Transformation and Rendering of Environmental
 
Audio.
 
 
 
TAPESTREA manipulates recorded sounds in two phases.
 
In the analysis phase, the sound is separated into reusable
 
components that map to individual foreground events or background.
 
In the synthesis phase, these components are parametrically
 
transformed, combined and re-synthesized using
 
time- and frequency-domain techniques that can be controlled
 
on multiple levels. While we highlight the synthesis methods
 
in this paper, the analysis phase is also integral as it enables the most flexible means for dealing with real-world sonic material.
 
 
 
= RELATED WORK =
 
Related techniques used for musical composition include
 
spectral modeling synthesis [cite Serra/Smith] and granular
 
synthesis [cite Truax, Roads-microsound]. Spectral modeling synthesis separates a
 
sound into sinusoids and noise, and was originally used formodeling instrument sounds. Granular synthesis, in contrast,
 
functions in the time-domain and involves continuously controlling
 
very brief sonic events or sound grains. TAPESTREA
 
employs aspects of both, using separation techniques on environmental
 
sounds and controlling the temporal placement
 
of resulting events.
 
 
 
Another technique used in TAPESTREA is an extension
 
of a wavelet tree learning algorithm [cite Dubnov] for sound
 
texture synthesis. This method performs a wavelet decomposition
 
on a sound clip and uses machine learning on the
 
wavelet coefficients to generate similar non-repeating sound
 
texture. The algorithm works well for sounds that are mostly
 
stochastic, but can break up extended pitched portions. It can
 
also be slow in its original form. TAPESTREA takes advantage of this technique by
 
improving the speed of the algorithm, and only
 
using in on the types of sound for which it works well.
 
 
 
= ANALYSIS PHASE =
 
TAPESTREA starts by separating a recording into deterministic
 
events or the sinusoidal or pitched components of the
 
sound, transient events or brief noisy bursts of energy, and the
 
remaining stochastic background or din. This separation can
 
be parametrically controlled and takes place in the analysis
 
phase.
 
 
 
Deterministic events are foreground events extracted by
 
sinusoidal modeling based on the spectral modeling framework [cite Serra].
 
Overlapping frames of the sound are transformed into
 
the frequency domain using the FFT. For each spectral frame,
 
the n highest peaks above a specified magnitude threshold
 
are recorded, where n can range from 1 to 50. These peaks
 
can also be loaded from a preprocessed file. The highest
 
peaks from every frame are then matched across frames by
 
frequency, subject to a controllable "frequency sensitivity"
 
threshold, to form sinusoidal tracks. Tracks can be "mute"
 
or below the magnitude threshold for a specified maximum
 
number of frames, or can be discarded if they fail to satisfy
 
a minimum track length requirement. Undiscarded tracks
 
are optionally grouped [cite Ellis, Melih and Gonzalez?] by
 
harmonicity, common amplitude and frequency modulation,
 
and common onset/offset, to form deterministic events, which
 
are essentially collections of related sinusoidal tracks. If the
 
grouping option is not selected, each track is interpreted as a
 
separate deterministic event.
 
 
 
Transient events or brief noisy foreground events are usually
 
detected in the time-domain by observing changes in signal
 
energy over time [cite Verma and Meng, Bello et al.?].
 
TAPESTREA analyzes the recorded sound using a non-linear
 
one-pole envelope follower filter with a sharp attack and slow
 
decay and finds points where the derivative of the envelope is
 
above a threshold. These points mark sudden increases in energy
 
and are interpreted as transient onsets. A transient event
 
is considered to last for up to half a second from its onset; its
 
exact length can be controlled within that range.
 
 
 
The stochastic background represents parts of the recording
 
that constitute background noise, and is obtained by removing
 
the detected deterministic and transient events from
 
the original sound. Deterministic events are removed by eliminating
 
the peaks of each sinusoidal track from the corresponding
 
spectral frames. To eliminate a peak, the magnitudes
 
of the bins beneath the peak are smoothed down, while
 
the phase in these bins is randomized. Transient events, in
 
turn, are removed in the time-domain by applying wavelet
 
tree learning [cite Dubnov] to generate a sound clip that resembles
 
nearby transient-free segments of the original recording.
 
This synthesized "clean" background replaces the samples
 
containing the transient event to be removed.
 
 
 
Separating a sound into components in this way has several
 
advantages. The distinction between foreground and background
 
components is semantically clear to humans, who can
 
therefore work within the framework with a concrete understanding
 
of what each component represents. The different
 
types of components are also stored and processed differently
 
according to their defining characteristics, thus allowing
 
flexible transformations on individual components. Each
 
transformed component can be saved as a template and later
 
reloaded, reused, copied, further transformed, or otherwise
 
treated as a single object. In addition, the act of separating a
 
sound into smaller sounds makes it possible to "compose" a
 
variety of pieces by combining these constituents in diverse
 
ways.
 
 
 
Describe analysis user interface?
 
 
 
= SYNTHESIS PHASE =
 
* Templates are synthesized individually with transformations.
 
* Gain / pan control exists
 
 
 
== Deterministic Events ==
 
* Sinusoidal resynthesis.
 
* Pitch and time transformations, real-time.
 
 
 
== Transient Events ==
 
* Phase vocoder is not stable but exists -- so the composer has the standard pitch/time stretching tools.
 
* Transients make nice "grains" for traditional granular synthesis.
 
 
 
== Stochastic Background ==
 
* Wavelet tree learning.
 
* Improvements can go here.
 
 
 
== Event Loops ==
 
* For repeating a single event.
 
* Random frequency and time transformations on each instance.
 
* Periodicity of event distributionñGaussian.
 
* Can also be replaced by other distributions such as Poisson or
 
your own invention.
 
* Density of eventsñhow frequently they recurñrelates to granular
 
synthesis for transients.
 
 
 
== Timelines ==
 
* Explicitly state when a template is played in relation to
 
other templates.
 
* Timelines within timelines... multiresolution synthesis?
 
 
 
== Mixed Bags ==
 
* Controlling relative density of multiple templates...
 
* Combines features of timelines and loops.
 
 
 
== Score Language ==
 
* ChucK
 
* Precise control beyond what sliders can provide.
 
* Any other speci�c features?
 
 
 
== Pitch and Time Quantizations ==
 
* Quantize pitch to a scale.
 
* A few pre-programmed pitch tables or one that's user programmable.
 
* The ability to change pitch table on a timeline would be good
 
(hmm).
 
* Quantizing time to a time grid for rhythm.
 
These could probably be done through chuck, but maybe there's
 
a clean way to do it through tapestrea itself.
 
There is also a secret (not-so-secret) reverb face. Include?
 
 
 
= Discussions/Contributions/Sound samples/ =
 
* what?
 
 
 
= CONCLUSIONS =
 
* Real references to be added.
 
 
 
= REFERENCES =
 
* Dannenberg, R. B. (1989). The Canon score language. Computer
 
Music Journal 13(1), 47ñ56.
 
* Dannenberg, R. B., C. L. Fraley, and P. Velikonja (1991). Fugue:
 
A functional language for sound synthesis. Computer 24(7),
 
36ñ42.
 
* Dannenberg, R. B. and C. W. Mercer (1992). Real-time software
 
synthesis on superscalar architectures. In Proceedings of the
 
International Computer Music Conference, pp. 174ñ177. International
 
Computer Music Association.
 
* Lindemann, E., F. Dechelle, B. Smith, and M. Starkier (1991).
 
The architecture of the IRCAM musical workstation. Computer
 
Music Journal 15(3), 41ñ49.
 
* Mathews, M. V. (1969). The Technology of Computer Music.
 
Cambridge, Massachusetts: MIT Press.
 

Latest revision as of 15:10, 12 March 2006

done.