|
|
(28 intermediate revisions by 3 users not shown) |
Line 1: |
Line 1: |
− | '''The Really Good TitleWithout TAPESTREA'''
| + | done. |
− | * Pasta Tree, E. Rat Pesta, Ape Treats
| |
− | * Ananya Misra, Perry Cook, Ge Wang
| |
− | * Department of Computer Science (also Music), Princeton University
| |
− | * {amisra,prc,gewang}@cs.princeton.edu
| |
− | | |
− | | |
− | = ABSTRACT =
| |
− | Traditional software synthesis systems, such as Music V, utilize
| |
− | an instance model of computation in which each note
| |
− | instantiates a new copy of an instrument. An alternative is
| |
− | the resource model, exempli�ed by MIDI îmono mode,î in
| |
− | which multiple updates can modify a sound continuously, and
| |
− | where multiple notes share a single instrument. We have
| |
− | developed a uni�ed, general model for describing combinations
| |
− | of instances and resources. Our model is a hierarchy in
| |
− | which resource-instances at one level generate output, which
| |
− | is combined to form updates to the next level. The model can
| |
− | express complex system con�gurations in a natural way.
| |
− | | |
− | = INTRODUCTION =
| |
− | In the 1940s and 50s, Pierre Schaeffer developed musique
| |
− | concrete. Unlike traditional music, musique concrete starts
| |
− | with existing or concrete recorded sounds, which are organized
| |
− | into abstract musical structures. The existing recordings
| |
− | often include natural and industrial sounds that are not
| |
− | conventionally musical, but can be manipulated to make music,
| |
− | either by editing magnetic tape or now more commonly
| |
− | through digital sampling. Typical manipulations include cutting,
| |
− | copying, reversing, looping and changing the speed of
| |
− | recorded segments.
| |
− | | |
− | Today, several other forms of electronic/electroacoustic
| |
− | music also involve manipulating a set of recorded sounds.
| |
− | Acousmatic music [cite Dhomont], for instance, evolved from
| |
− | musique concrete and refers to compositions designed for environments
| |
− | that emphasize the sound itself rather than the
| |
− | performance-oriented aspects of the piece.
| |
− | | |
− | The acoustic ecology [cite Schafer] movement gave rise to
| |
− | soundscape composition [cite Truax] or the creation of realistic
| |
− | soundscapes from recorded environmental sounds. One
| |
− | of the key features of soundscape composition, according to
| |
− | Truax, is that ìmost pieces can be placed on a continuum between
| |
− | what might be called `found sound' and `abstracted'
| |
− | approaches.î However, while ìcontemporary signal processing
| |
− | techniques can easily render such sounds unrecognizable
| |
− | and completely abstract,î a soundscape composition piece remains
| |
− | recognizable even at the abstract end of the continuum.
| |
− | | |
− | Sound designers for movies, theater and art often have
| |
− | a related goal of starting with real world sounds and creating
| |
− | emotionally evocative sound scenes, which are still real,
| |
− | yet transformed and transformative. Classic examples include
| |
− | mixing a transformed lion's roar with other sounds to accompany
| |
− | the wave sounds in ìPerfect Storm.î These sound designers
| |
− | are ìsound sculptorsî as well, but transform sounds
| |
− | to enhance or create a sense of reality, rather than for musical
| |
− | purposes.
| |
− | | |
− | Artists from all of the above backgrounds share the process
| |
− | of manipulating recordings, but aim to achieve different
| |
− | effects. We present a single framework for starting with
| |
− | recordings and producing sounds that can lie anywhere on a
| |
− | `found' to `unrecognizable' continuum. `Found' sounds can
| |
− | be modi�ed in subtle ways or extended inde�nitely, while
| |
− | moving towards the `unrecognizable' end of the spectrum unleashes
| |
− | a range of manipulations beyond time-domain techniques.
| |
− | In fact, the same set of techniques apply throughout
| |
− | the continuum, differing only in how they are used. We call
| |
− | this framework TAPESTREA: Techniques and Paradigms for
| |
− | Expressive Synthesis, Transformation and Rendering of Environmental
| |
− | Audio.
| |
− | | |
− | TAPESTREA manipulates recorded sounds in two phases.
| |
− | In the analysis phase, the sound is separated into reusable
| |
− | components that map to individual foreground events or background.
| |
− | In the synthesis phase, these components are parametrically
| |
− | transformed, combined and re-synthesized using
| |
− | time- and frequency-domain techniques that can be controlled
| |
− | on multiple levels. While we highlight the synthesis methods
| |
− | in this paper, the analysis phase is also integral as it makes
| |
− | the synthesis possible.
| |
− | | |
− | = RELATED WORK =
| |
− | Related techniques used for musical composition include
| |
− | spectral modeling synthesis [cite Serra/Smith] and granular
| |
− | synthesis [cite ??]. Spectral modeling synthesis separates a
| |
− | sound into sinusoids and noise, and was originally used formodeling instrument sounds. Granular synthesis, in contrast,
| |
− | functions in the time-domain and involves continuously controlling
| |
− | very brief sonic events or sound grains. TAPESTREA
| |
− | employs aspects of both, using separation techniques on environmental
| |
− | sounds and controlling the temporal placement
| |
− | of resulting events.
| |
− | | |
− | Another technique used in TAPESTREA is an extension
| |
− | of a wavelet tree learning algorithm [cite Dubnov] for sound
| |
− | texture synthesis. This method performs a wavelet decomposition
| |
− | on a sound clip and uses machine learning on the
| |
− | wavelet coef�cients to generate similar non-repeating sound
| |
− | texture. The algorithm works well for sounds that are mostly
| |
− | stochastic, but can break up extended pitched portions. It can
| |
− | also be slow in its original form. TAPESTREA takes advantage
| |
− | of this technique by using a faster version of it on the
| |
− | types of sound for which it works well.
| |
− | | |
− | = ANALYSIS PHASE =
| |
− | TAPESTREA starts by separating a recording into deterministic
| |
− | events or the sinusoidal or pitched components of the
| |
− | sound, transient events or brief noisy bursts of energy, and the
| |
− | remaining stochastic background or din. This separation can
| |
− | be parametrically controlled and takes place in the analysis
| |
− | phase.
| |
− | | |
− | Deterministic events are foreground events extracted by
| |
− | sinusoidal modeling based on the spectral modeling framework.
| |
− | Overlapping frames of the sound are transformed into
| |
− | the frequency domain using the FFT. For each spectral frame,
| |
− | the n highest peaks above a speci�ed magnitude threshold
| |
− | are recorded, where n can range from 1 to 50. These peaks
| |
− | can also be loaded from a preprocessed �le. The highest
| |
− | peaks from every frame are then matched across frames by
| |
− | frequency, subject to a controllable ìfrequency sensitivityî
| |
− | threshold, to form sinusoidal tracks. Tracks can be ìmuteî
| |
− | or below the magnitude threshold for a speci�ed maximum
| |
− | number of frames, or can be discarded if they fail to satisfy
| |
− | a minimum track length requirement. Undiscarded tracks
| |
− | are optionally grouped [cite Ellis, Melih and Gonzalez?] by
| |
− | harmonicity, common amplitude and frequency modulation,
| |
− | and common onset/offset, to form deterministic events, which
| |
− | are essentially collections of related sinusoidal tracks. If the
| |
− | grouping option is not selected, each track is interpreted as a
| |
− | separate deterministic event.
| |
− | | |
− | Transient events or brief noisy foreground events are usually
| |
− | detected in the time-domain by observing changes in signal
| |
− | energy over time [cite Verma and Meng, Bello et al.?].
| |
− | TAPESTREA analyzes the recorded sound using a non-linear
| |
− | one-pole envelope follower �lter with a sharp attack and slow
| |
− | decay and �nds points where the derivative of the envelope is
| |
− | above a threshold. These points mark sudden increases in energy
| |
− | and are interpreted as transient onsets. A transient event
| |
− | is considered to last for up to half a second from its onset; its
| |
− | exact length can be controlled within that range.
| |
− | | |
− | The stochastic background represents parts of the recording
| |
− | that constitute background noise, and is obtained by removing
| |
− | the detected deterministic and transient events from
| |
− | the original sound. Deterministic events are removed by eliminating
| |
− | the peaks of each sinusoidal track from the corresponding
| |
− | spectral frames. To eliminate a peak, the magnitudes
| |
− | of the bins beneath the peak are smoothed down, while
| |
− | the phase in these bins is randomized. Transient events, in
| |
− | turn, are removed in the time-domain by applying wavelet
| |
− | tree learning [cite Dubnov] to generate a sound clip that resembles
| |
− | nearby transient-free segments of the original recording.
| |
− | This synthesized ìcleanî background replaces the samples
| |
− | containing the transient event to be removed.
| |
− | | |
− | Separating a sound into components in this way has several
| |
− | advantages. The distinction between foreground and background
| |
− | components is semantically clear to humans, who can
| |
− | therefore work within the framework with a concrete understanding
| |
− | of what each component represents. The different
| |
− | types of components are also stored and processed differently
| |
− | according to their de�ning characteristics, thus allowing
| |
− | �lfexible transformations on individual components. Each
| |
− | transformed component can be saved as a template and later
| |
− | reloaded, reused, copied, further transformed, or otherwise
| |
− | treated as a single object. In addition, the act of separating a
| |
− | sound into smaller sounds makes it possible to ìcomposeî a
| |
− | variety of pieces by combining these constituents in diverse
| |
− | ways.
| |
− | | |
− | Describe analysis user interface?
| |
− | | |
− | = SYNTHESIS PHASE =
| |
− | * Templates are synthesized individually with transformations.
| |
− | * Gain / pan control exists
| |
− | | |
− | == Deterministic Events ==
| |
− | * Sinusoidal resynthesis.
| |
− | * Pitch and time transformations, real-time.
| |
− | | |
− | == Transient Events ==
| |
− | * Phase vocoder is not stable but exists.
| |
− | | |
− | == Stochastic Background ==
| |
− | * Wavelet tree learning.
| |
− | * Improvements can go here.
| |
− | | |
− | == Event Loops ==
| |
− | * For repeating a single event.
| |
− | * Random frequency and time transformations on each instance.
| |
− | * Periodicity of event distributionñGaussian.
| |
− | * Can also be replaced by other distributions such as Poisson or
| |
− | your own invention.
| |
− | * Density of eventsñhow frequently they recurñrelates to granular
| |
− | synthesis for transients.
| |
− | | |
− | == Timelines ==
| |
− | * Explicitly state when a template is played in relation to
| |
− | other templates.
| |
− | * Timelines within timelines... multiresolution synthesis?
| |
− | | |
− | == Mixed Bags ==
| |
− | * Controlling relative density of multiple templates...
| |
− | * Combines features of timelines and loops.
| |
− | | |
− | == Score Language ==
| |
− | * ChucK
| |
− | * Precise control beyond what sliders can provide.
| |
− | * Any other speci�c features?
| |
− | | |
− | == Pitch and Time Quantizations ==
| |
− | * Quantize pitch to a scale.
| |
− | * A few pre-programmed pitch tables or one that's user programmable.
| |
− | * The ability to change pitch table on a timeline would be good
| |
− | (hmm).
| |
− | * Quantizing time to a time grid for rhythm.
| |
− | These could probably be done through chuck, but maybe there's
| |
− | a clean way to do it through tapestrea itself.
| |
− | There is also a secret (not-so-secret) reverb face. Include?
| |
− | | |
− | = Discussions/Contributions/Sound samples/ =
| |
− | * what?
| |
− | | |
− | = CONCLUSIONS =
| |
− | * Real references to be added.
| |
− | | |
− | = REFERENCES =
| |
− | * Dannenberg, R. B. (1989). The Canon score language. Computer
| |
− | Music Journal 13(1), 47ñ56.
| |
− | * Dannenberg, R. B., C. L. Fraley, and P. Velikonja (1991). Fugue:
| |
− | A functional language for sound synthesis. Computer 24(7),
| |
− | 36ñ42.
| |
− | * Dannenberg, R. B. and C. W. Mercer (1992). Real-time software
| |
− | synthesis on superscalar architectures. In Proceedings of the
| |
− | International Computer Music Conference, pp. 174ñ177. International
| |
− | Computer Music Association.
| |
− | * Lindemann, E., F. Dechelle, B. Smith, and M. Starkier (1991).
| |
− | The architecture of the IRCAM musical workstation. Computer
| |
− | Music Journal 15(3), 41ñ49.
| |
− | * Mathews, M. V. (1969). The Technology of Computer Music.
| |
− | Cambridge, Massachusetts: MIT Press.
| |
done.