Sound Synthesis Envelopes and Articulation

Introduction

In sound synthesis, the word “envelope” applies when the sound characteristics of a note evolve dynamically over the note's duration. Envelopes are also known as “control signals” or “low-frequency signals”. Envelope generators are examples of signal “modulators”. This particular page focuses on amplitude envelopes along with how amplitude envelopes contribute to articulation and phrasing. Sound qualities other than amplitude are considered here only where they contribute to articulation and phrasing. Standing in contrast to amplitude envelopes are timbral envelopes, or more correctly timbral modulations. The latter contribute particularly to speech sounds, to which I have devoted a whole series of pages.

Suggested reading for this page includes “The Onset Behavior of Sound”, chapter III of Fritz Winkel's Music, Sound, and Sensation, A Modern Exposition, pp. 24-57. The segments of an envelope are nicely explained by Francis Prève's tutorial, Understanding Envelopes, Part 1 and Part 2. More advanced readers should take a look at John Chowning's famous 1973 article, “The Synthesis of Complex Audio Spectra by Means of Frequency Modulation”. Although the central topic of his article is frequency-modulation synthesis, Chowning makes very good use of envelopes to control how the spectral content of a tone develops over the tone's duration.

Envelopes contribute fundamentally to instrumental timbre, but they also contribute greatly to musical function; that is, how the notes in a musical passage relate to one another. Thus thinking about envelopes is intimately bound with thinking about articulation. Articulation covers both how notes are spaced and how notes are accented.

Within the Sound engine, parameter #5 in any note statement indicates the start time, while parameter #6 gives the duration. Derived from these two parameters is a third fundamental property, the end time. Although start time and end time define the boundaries of the note's envelope(s), what happens outside these boundaries matters very much to the note's articulation.

Many of the ideas about articulation that appear on this page were expressed in a paper on rhythm which I submitted to the Computer Music Journal around 1979. The paper had three parts. The first part dealt with the formation of notes into phrases and was based on Gestalt factors of organization first enumerated by Max Wertheimer and subsequently applied to music by James Tenney. The second part dealt with perceptual factors that help identify the tonic accent of a phrase. The third part dealt with how phrases divide around the tonic accent into two phases: the anacrusis and the thesis.

My paper on rhythm was rejected by the Computer Music Journal. There were valid reasons for rejection: first, that the paper had a very abrasive tone; second, that the paper didn't actually contain any actual computer-music examples. These however were not the reasons given for rejection; rather, the reviewer asserted that while my paper may have been relevant to Brahms, it had no application either to modern music in general or to computer music in particular.

I revisit this issue here because it all comes down to this: Encoding musical scores as note lists eliminates the performer from the process of music-making. When that happens it falls to the encoder to do the things that performers would otherwise do to make the music speak. And the places where performers exert greatest influence are the interior space of each note (i.e. its envelope) and upon how a note connects (or not) to the notes around it.

For an early example of what happens when scores are transcribed literally for automated performance, get ahold of the two albums entitled Unplayed by Human Hands, which contained renditions of a number of pieces very literally transcribed for PDP-8 controlling a pipe organ. (I have an after-the-fact connection with Prentiss Knowlton, who produced of these two albums. I plan to relate more details about this connection on this site in the future.)

Envelope-Generating Units

The following Sound units are specifically purposed to generate control signals:

EnvSust — Sustained envelope.
Segments: Attack, Steady-State, Release
Arguments: Attack Duration, Steady-State Amplitude, Release Duration

Figure 1 (a): Contour of envelope generated using the EnvSust unit.
EnvExp — Percussive envelope.
Segments: Attack, Hold, Decay, Release
Arguments: Attack Duration, Attack Amplitude, Hold Duration, Decay Rate, Release Duration

Figure 1 (b): Contour of envelope generated using the EnvExp unit.
EnvAHDSR — Multi-purpose envelope.
Segments: Attack, Hold, Decay, Steady-State, Release
Arguments: Attack Duration, Attack Amplitude, Hold Duration, Decay Duration, Steady-State Amplitude, Release Duration

Figure 1 (c): Contour of envelope generated using the EnvAHDSR unit.
Line — Single-Segment Linear Interpolation
1 Segment
Arguments: Origin, Segment Duration, Goal

Figure 1 (d): Contour of envelope generated using the Line unit.
Line3 Three-Segment Linear Interpolation
3 Segments
Arguments: Origin, Duration1, Goal1, Duration2, Goal2, Duration3, Goal3

Figure 1 (e): Contour of envelope generated using the Line3 unit.
Growth Single-Segment Exponential Interpolation
3 Segments
Arguments: Origin, Segment Duration, Goal

Figure 1 (f): Contour of envelope generated using the Growth unit.
Growth3 Three-Segment Exponential Interpolation
3 Segments
Arguments: Origin, Duration1, Goal1, Duration2, Goal2, Duration3, Goal3

Figure 1 (g): Contour of envelope generated using the Growth3 unit.

Contours are another approach to generating control signals. Contours have two calculation methods:

One advantage of contours is that they don't limit an instrument design to a specific number of segments over the duration of a single note. Rather any number of segments can be defined using ramp statements in the note list. Another advantage of contours is that their information is available to any number of notes, so long as each note's voice ID matches the contour's voice ID.

Orchestra

EnvelopeOrch.xml implements a Sound orchestra which has been designed to accompany this page. Sound examples for the remainder of this page are provided in the form of note lists referencing EnvelopeOrch.xml. Each note list is supported by an audio realization. Do not expect musicality from these examples. Their purpose is to demonstrate how specific parameter adjustments affect the resulting sound.

My original intent was that visitors would generate these examples themselves using a Java Applet implementation of Sound. Since the internet powers that be have repudiated applet technology, I have been obliged to fall back on a desktop application. If you have access to this application, you may download this file into your working directory and paste the note lists into Sound's note-list editor.

The instruments in EnvelopeOrch.xml will be detailed shortly, but all three instruments share similar design. Each instrument uses some kind of envelope generator to amplitude-modulate an Oscillator. Which waveform the oscillator should employ is controlled by a note parameter. The orchestra employs the waveform family generated using the Negative Binomial formula to concentrate spectral energy at specific harmonics. (The approach is explained in the Waveform Reference). Waveform are generally selected to concentrate spectral energy in the region around 440 Hz.

Amplitude Envelopes

Pretty much every instrument design employs an amplitude envelope, if for no other reason than this: if one does not take steps to ramp down the amplitude at the end of a note, a click will result.

Traditional instruments divide roughly into two categories, based on their envelopes. These are the sustained and percussive categories. A third category, AHDSR, blends the previous two categories into a single generalized approach.



Figure 2 (a): Instrument #1: Sustained
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeSustained

note 1 0 1 0 0.5 0.5 24000 24000 138.6 138.6 803 0.1 0.1 // C#3
note 2 0 1 0 1.0 0.5 24000 24000 246.9 246.9 801 0.1 0.1 // B3
note 3 0 1 0 1.5 0.5 24000 24000 349.2 349.2 801 0.1 0.1 // F4
note 4 0 1 0 2.0 0.5 24000 24000 311.1 311.1 801 0.1 0.1 // E!4

note 5 0 1 0 3.0 0.5 24000 24000 415.3 415.3 801 0.1 0.1 // A!4
note 6 0 1 0 3.5 0.5 24000 24000 233.1 233.1 801 0.1 0.1 // B!3
note 7 0 1 0 4.0 0.5 24000 24000 146.8 146.8 802 0.1 0.1 // D3
note 8 0 1 0 4.5 0.5 24000 24000 92.50 92.50 804 0.1 0.1 // F#2

end 5.5
Listing 1 (a): Pitch sequence played by Instrument #1: Sustained. To hear a realization, click here.

Sustained Envelopes

Sustained envelopes can be generated by the EnvSust unit, which produces the profile shown in Figure 1 (a). Sustained envelopes are characteristic of instruments whose resonances are sustained by continuous input of energy (e.g., through breath or bowing). EnvelopeOrch.xml employs the EnvSust envelope generator in Instrument #1: Sustained, shown in Figure 2 (a).

Instrument #1: Sustained is complicated in two ways compared to the other two instruments in EnvelopeOrch.xml.



Figure 2 (b): Instrument #2: Percussive
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopePercussive

note 1 0 2 0 0.5 0.5 24000 138.6 803 0.02 0.0 40 0.05 // C#3
note 2 0 2 0 1.0 0.5 24000 246.9 801 0.02 0.0 40 0.05 // B3
note 3 0 2 0 1.5 0.5 24000 349.2 801 0.02 0.0 40 0.05 // F4
note 4 0 2 0 2.0 0.5 24000 311.1 801 0.02 0.0 40 0.05 // E!4

note 5 0 2 0 3.0 0.5 24000 415.3 801 0.02 0.0 40 0.05 // A!4
note 6 0 2 0 3.5 0.5 24000 233.1 801 0.02 0.0 40 0.05 // B!3
note 7 0 2 0 4.0 0.5 24000 146.8 802 0.02 0.0 40 0.05 // D3
note 8 0 2 0 4.5 0.5 24000 92.50 804 0.02 0.0 40 0.05 // F#2

end 5.5
Listing 1 (b): Pitch sequence played by Instrument #2: Percussive with a decay rate of 40 dB/sec. To hear a realization, click here.

Percussive Envelopes

Percussive envelopes can be generated by the EnvExp unit, which produces the profile shown in Figure 1 (b). Percussive envelopes are characteristic of instruments whose resonances are initiated by an initial input of energy (e.g., through striking or plucking) after which the energy is let to dissipate. EnvelopeOrch.xml employs the EnvExp envelope generator in Instrument #2: Percussive, shown in Figure 2 (b).



Figure 2 (c): Instrument #3: AHDSR
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeAHDSR

note 1 0 3 0 0.5 0.5 24000 0.33 138.6 803 0.02 0.0 0.05 0.05 // C#3
note 2 0 3 0 1.0 0.5 24000 0.33 246.9 801 0.02 0.0 0.05 0.05 // B3
note 3 0 3 0 1.5 0.5 24000 0.33 349.2 801 0.02 0.0 0.05 0.05 // F4
note 4 0 3 0 2.0 0.5 24000 0.33 311.1 801 0.02 0.0 0.05 0.05 // E!4

note 5 0 3 0 3.0 0.5 24000 0.33 415.3 801 0.02 0.0 0.05 0.05 // A!4
note 6 0 3 0 3.5 0.5 24000 0.33 233.1 801 0.02 0.0 0.05 0.05 // B!3
note 7 0 3 0 4.0 0.5 24000 0.33 146.8 802 0.02 0.0 0.05 0.05 // D3
note 8 0 3 0 4.5 0.5 24000 0.33 92.50 804 0.02 0.0 0.05 0.05 // F#2

end 5.5
Listing 1 (c): Pitch sequence played by Instrument #3: AHDSR. To hear a realization, click here.

AHDSR Envelopes

AHDSR envelopes are typically used to create sustained tones initiated by sforzandi. AHDSR envelopes can be generated by the EnvAHDSR unit, which produces the profile shown in Figure 1 (c). This five-segment generator carries forth a tradition from the days of analog synthesizers. Like the EnvSust unit, EnvAHDSR generates a sustained tone. However the AHDSR envelope begins with an amplitude peak which, like the EnvExp unit, dies back exponentially. In fact the EnvSust and EnvExp units can be understood as special cases of a more general AHDSR model:

EnvelopeOrch.xml employs the EnvExp envelope generator in Instrument #3: AHDSR, shown in Figure 2 (c).

Accentuation

We now consider methods of emphasizing one note over its neighbors, that is, methods of accentuation. We start by considering dynamic accents, then consider methods which do not directly involve peak sample magnitudes. Several accentuation methods rely on the concept of the Agogic accent, which concept is singled out for explanation by the Wikipedia entry on Accent. Agogic accents are essential technique for the “organ and harpsichord (which don't afford the use of dynamic accents)”. Wikipedia describes several varieties of agogic accent, most based on the principle that longer notes have greater opportunity to speak than shorter notes. The term “agogic” seems to have originated with Hugo Riemann, a prominent 19th century German musical theorist.

Dynamic Accents

In my own early experience with digital sound synthesis, I had the inclination to accent notes by making them louder. Upon listening to the generated wave file, I was astonished to discover that the accented note didn't sound louder per se. Rather, the note sounded closer with an unanticipated side effect: suddenly bringing the note forward strongly disrupted the musical line. This effect is an artifact of digital sound synthesis, where changing the overall amplitude of a tone only slightly effects the tone's timbral characteristics.

The Silence Before the Note

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeSpacing

// slur
note 1 0 1 0 0.5 0.5 24000 24000 349.2 311.1 801 0.03 0.05 // F4
note 2 0 1 1 1.0 0.5 24000 24000 311.1 311.1 801 0.03 0.1 // E!4

// legato
note 3 0 1 0 2.0 0.5 24000 24000 349.2 349.2 801 0.03 0.05 // F4
note 4 0 1 0 2.5 0.5 24000 24000 311.1 311.1 801 0.03 0.1 // E!4

// Detached by 40 msec
note 5 0 1 0 3.5 0.46 24000 24000 349.2 349.2 801 0.03 0.05 // F4
note 6 0 1 0 4.0 0.5 24000 24000 311.1 311.1 801 0.03 0.1 // E!4

// Detached by 80 msec
note 7 0 1 0 5.0 0.42 24000 24000 349.2 349.2 801 0.03 0.05 // F4
note 8 0 1 0 5.5 0.5 24000 24000 311.1 311.1 801 0.03 0.1 // E!4

// Detached by 120 msec
note 9 0 1 0 6.5 0.38 24000 24000 349.2 349.2 801 0.03 0.05 // F4
note 10 0 1 0 7.0 0.5 24000 24000 311.1 311.1 801 0.03 0.1 // E!4

end 8.5
Listing 2: Comparison of articulations played by Instrument #1:
Sustained
. To hear a realization, click here.

Back when I played the trumpet I brought into a lesson an etude I was learning. The etude began with three accented notes, so naturally I punched these notes dynamically. My teacher David Kuehn, who at the time was principal in the Buffalo Philharmonic, stopped me. “Don't punch them, space them.” Sure enough, it worked much better, and the insight has remained with me ever since. Think of it like this: If the attack of a note is preceded by a moment of silence, then the listener experiences the full onslaught of the attack. No silence, and one hears a continuation of what came before.

Spacing is intrinsic to stop consonants, which are the most complex type of phoneme encountered in speech synthesis. Around 50 msec. of “closure” (near silence) precedes each word-initiating stop consonant. When a stop consonant occurs in the midst of a word, it is still necessary to ramp the envelope of the preceding syllable down to zero, providing a fresh attack for the plosive sound.

Bear in mind that silence between notes can serve musical purposes other than accent; for instances: (a) spacing contributes greatly to phrasing, (b) spacing serves a textural purpose when the notes in a passage are played staccato.

The opposite of detached spacing is slurring. Within the Sound engine, slurring means skipping both the release segment of the slur-from note and the attack segment of the slur-to note. It also means that the slur-to note should take over state values (e.g. the oscillator phase) from the slur-from note.

Listing 2 generates a sound example containing five pairs of tones. The first pair are slurred, the second pair legato, and the remaining three pairs are increasingly detached. I think 80 msec. gives best effect among the detached pairs. 40 msec. only contrasts subtly with its two neighbors, while silences longer than 80 msec. give no more enhancement.

Attack Durations

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeAttacks

// 50 msec. attack
note 1 0 1 0 0.5 0.4 24000 24000 349.2 349.2 801 0.1 0.05 // F4
note 2 0 1 0 1.0 0.5 24000 24000 311.1 311.1 801 0.05 0.1 // E!4

// 20 msec. attack
note 3 0 1 0 2.0 0.4 24000 24000 349.2 349.2 801 0.1 0.05 // F4
note 4 0 1 0 2.5 0.5 24000 24000 311.1 311.1 801 0.02 0.1 // E!4

// 10 msec. attack
note 5 0 1 0 3.5 0.4 24000 24000 349.2 349.2 801 0.1 0.05 // F4
note 6 0 1 0 4.0 0.5 24000 24000 311.1 311.1 801 0.01 0.1 // E!4

// 5 msec. attack
note 7 0 1 0 5.0 0.4 24000 24000 349.2 349.2 801 0.1 0.05 // F4
note 8 0 1 0 5.5 0.5 24000 24000 311.1 311.1 801 0.005 0.1 // E!4

// 0 msec. attack
note 9 0 1 0 6.5 0.4 24000 24000 349.2 349.2 801 0.1 0.05 // F4
note 10 0 1 0 7.0 0.5 24000 24000 311.1 311.1 801 0.0 0.1 // E!4

end 8.5
Listing 3: Comparison of attack durations played by Instrument #1:
Sustained
. To hear a realization, click here.

The attack duration, which is the time taken by the envelope to rise from zero to full initial amplitude, can also contribute to accentuation. Understand that with live instruments the attack duration is strictly a timbral feature; only in a synthetic context can instrumental attack durations vary from one note to another.

Low frequency sine tones, or other tones with limited harmonic content, effectively have slow attack times even when the attack-duration parameter is nominally set to zero.

Listing 3 generates a sound example containing five pairs of tones. The attack duration for the first tone in each tone holds at 100 msec. The only thing that changes from pair to pair is attack duration of the second tone. This value ranges from 50 msec. down to an instantaneous attack (no duration). Attack durations of 50 msec. and above are very soft, corresponding by analogy to a live percussion instrument beaten with a very soft mallet. Continuing this analogy, shorter attack durations correspond to harder mallets. The limiting case, an instaneous attack, produces a click which one might charitably compare with exchanging a mallet for a stick. Had EnvelopeOrch.xml employed a cosine wave rather than a sine wave — thus initiating the tone with a very dramatic discontinuity — the resulting click would be much worse.

Hold Durations

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeHolds

// No hold
note 1 0 2 0 0.5 0.5 24000 349.2 801 0.02 0.00 40 0.05 // F4
note 2 0 2 0 1.0 0.5 24000 311.1 801 0.02 0.00 40 0.05 // E!4

// 30 msec. (Moog) hold
note 3 0 2 0 2.0 0.5 24000 349.2 801 0.02 0.00 40 0.05 // F4
note 4 0 2 0 2.5 0.5 24000 311.1 801 0.02 0.03 40 0.05 // E!4

// 60 msec. hold
note 5 0 2 0 3.5 0.5 24000 349.2 801 0.02 0.00 40 0.05 // F4
note 6 0 2 0 4.0 0.5 24000 311.1 801 0.02 0.06 40 0.05 // E!4

// 90 msec. hold
note 7 0 2 0 5.0 0.5 24000 349.2 801 0.02 0.00 40 0.05 // F4
note 8 0 2 0 5.5 0.5 24000 311.1 801 0.02 0.09 40 0.05 // E!4

// 120 millisecond hold
note  9 0 2 0 6.5 0.4 24000 349.2 801 0.02 0.00 40 0.05 // F4
note 10 0 2 0 7.0 0.5 24000 311.1 801 0.02 0.12 40 0.05 // E!4

end 8.5
Listing 4: Comparison of hold durations played by Instrument #2:
Percussive
. To hear a realization, click here.

I decided to include “hold” segments in the Sound engine's EnvExp and EnvAHDSR units after reading Francis Prève's tutorial on envelopes (Part1). Hold segments intervene between attack segment and decay segments. What a hold segment does is sustain the attack amplitude for the indicated period of time. According to Prève, this is “quite useful for adding a bit of punch to an envelope.” The principle is that of the agogic accent. Holding on to the attack amplitude in effect increases the note's RMS amplitude while leaving the note's peak amplitude unaltered. Holding for too long comprimises the transient quality of attack and decay. Prève's further comment makes an implicit recommendation:

According to legend, early Moog synths had a non-adjustable hold time of approximately 30 milliseconds immediately following the attack. This is often associated with the overall impact that characterized vintage Minimoog sounds.

Listing 4 demonstrates the effect of five different hold durations. The first tone in each pair maintains a hold duration of zero for a baseline.

Decay Rates

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeDecay

note 1 0 2 0 0.5 0.5 24000 138.6 803 0.02 0.0 40 0.05 // C#3
note 2 0 2 0 1.0 0.5 24000 246.9 801 0.02 0.0 40 0.05 // B3
note 3 0 2 0 1.5 0.5 24000 349.2 801 0.02 0.0 10 0.05 // F4 ^
note 4 0 2 0 2.0 0.5 24000 311.1 801 0.02 0.0 40 0.05 // E!4

note 5 0 2 0 3.0 0.5 24000 415.3 801 0.02 0.0 40 0.05 // A!4
note 6 0 2 0 3.5 0.5 24000 233.1 801 0.02 0.0 40 0.05 // B!3
note 7 0 2 0 4.0 0.5 24000 146.8 802 0.02 0.0 40 0.05 // D3
note 8 0 2 0 4.5 0.5 24000 92.50 804 0.02 0.0 10 0.05 // F#2 ^

end 5.5
Listing 5: Pitch sequence played by Instrument #2: Percussive,
accenting note #3 and note #8 with more gradual decay rates. To hear a realization, click here.

Like the attack duration, the decay rate is a fixed timbral characteristic of acoustic instruments which admits to synthetic manipulation. Within the Sound engine, decay segments are available from the EnvExp and EnvAHDSR units but not from EnvSust. The decay rate quantifies how quickly the sound dies away in decibels per second. Winckel, 1967 (p. 47) gives 45 decibels as a threshold, meaning that once the amplitude of a percussive envelope has gone this far down from its starting value, the sound can be considered to have dwindled away to nothing. Winckel also claims that “when the pedal is employed”, the decay rate for a piano tone “has been measured” as 45 dB in 60 seconds, or 0.75 dB/sec. Chowning, 1973 gives effective durations for three percussion sounds: 15 seconds for a bell (3 dB/sec), 2 seconds for a wood drum (22.5 dB/sec), and .2 seconds (seems short to me) for a skin drum (225 dB/sec).

Listing 5 reprises the pitch sequence introduced in Listing 1 (a) through Listing 1 (c). This time around, I have chosen to accent note #3 and note #8 by manipulating their decay rates. These choices are indicated using a circumflex (^) in the note statement's comment area.

Overall Note Durations


Figure 3: Durational accents.
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeDuration

note 1 0 1 0 0.5 0.17 24000 24000 138.6 138.6 803 0.03 0.03 // C#3
note 2 0 1 0 1.0 0.17 24000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.5 0.40 24000 24000 349.2 349.2 801 0.03 0.03 // F4 ^
note 4 0 1 0 2.0 0.17 24000 24000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.0 0.17 24000 24000 415.3 415.3 801 0.03 0.03 // A!4
note 6 0 1 0 3.5 0.17 24000 24000 233.1 233.1 801 0.03 0.03 // B!3
note 7 0 1 0 4.0 0.17 24000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.5 0.40 24000 24000 92.50 92.50 804 0.03 0.03 // F#2 ^

end 5.5
Listing 6: Pitch sequence played by Instrument #1: Sustained,
accenting note #3 and note #8 durationally. To hear a realization, click here.

The idea of the agogic accent generally means having the duration of the accented note last longer than the note durations around it. Composers generally understand this when they choose to employ half notes rather than quarter notes. Agogic accents can also be created through articulation, as happens when Wikipedia describes having a player “emphasize one of a sequence of staccato quarter notes by making it less staccato.” An example of this performance practice is provided in Figure 3, within which the upper line shows the rhythm as notated, while the lower line shows the rhythm as played.

Listing 6 applies the rhythm as played from Figure 3 to the pitch sequence introduced in Listing 1 (a) through Listing 1 (c). The accents in Figure 3 happen to coincide with the accents in Listing 5, accented notes in Listing 6 once again being indicated using a circumflex (^) in the note statement's comment area.

Notice in Listing 6 that the note durations do not conform precisely to the played durations indicated in Figure 3. At 0.17 seconds the unaccented durations are shorter than an eighth note (0.25) while at 0.40 seconds the accented durations are longer than a dotted eighth note (0.375). These adjustments were made after originally using the played durations from Figure 3 and discovering that the accented / unaccented ratio of 3:2 was unconvincing. The accented / unaccented ratio in Listing 6 is around 7:3.

Tonic Accents

The Wikipedia entry on Accent lists tonic accent as one of three fundamental accent types (the other two being dynamic and agogic; the entry neglects accents due to spacing) — based on a definition which I consider completely wrong. By Wikipedia's definition, a note with the highest pitch in a phrase will somehow be more accented than the other notes. This is nonsense; depending on how the phrase turns, one could equally assert that the note with the lowest pitch is the accented note. Beyond this, Wikipedia's musical definition conflicts with the definition of a tonic accent in prosody, that is, the stressed syllable in a word. Anybody who has ever set text to music knows that if you don't align the tonic accents of the words with the strong beats of the meter, you're asking for trouble. (Here's a trick from Cantor Barbara: Try singing “Take Me Out to the Ball Game” with the words displaced by one note.)

In my world, the term tonic accent indicates an emphasized note in a musical phrase, relative to which other notes derive their function. Within a metric context, tonic accents inevitably align with the strong beats. For example in the Cybernetic Composer “cadence tones” and “chord tones” generally serve as tonic accents while “ornamental tones” do not. However appoggiaturas (on-beat non-chord tones) are exceptional. In an appoggiatura, the tonic accent falls on an “ornamental tone”, while the resolution (a “chord tone”) is unaccented.

Crescendi and Diminuendi

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeDynamics

note 1 0 1 0 0.5 0.5 6000 12000 138.6 138.6 803 0.03 0.03 // C#3
note 2 0 1 0 1.0 0.5 12000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.5 0.5 24000 12000 349.2 349.2 801 0.03 0.03 // F4 ^
note 4 0 1 0 2.0 0.5 12000 6000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.0 0.5 3000 6000 415.3 415.3 801 0.03 0.03 // A!4
note 6 0 1 0 3.5 0.5 6000 12000 233.1 233.1 801 0.03 0.03 // B!3
note 7 0 1 0 4.0 0.5 12000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.5 0.5 24000 6000 92.50 92.50 804 0.03 0.03 // F#2 ^

end 5.5
Listing 7 (a): Pitch sequence played by Instrument #1: Sustained, using
crescendi and diminuendi to highlight note #3 and note #8. To hear a realization, click here.
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeCombined

note 1 0 1 0 0.5 0.5 6000 12000 138.6 246.9 803 0.03 0.03 // C#3
note 2 0 1 1 1.0 0.4 12000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.5 0.5 24000 12000 349.2 311.1 801 0.01 0.03 // F4 ^
note 4 0 1 3 2.0 0.5 12000 6000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.0 0.5 3000 6000 415.3 233.1 801 0.03 0.03 // A!4
note 6 0 1 5 3.5 0.5 6000 12000 233.1 146.8 801 0.03 0.03 // B!3
note 7 0 1 6 4.0 0.4 12000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.5 0.5 24000 6000 92.50 92.50 804 0.01 0.03 // F#2 ^

end 5.5
Listing 7 (b): Pitch sequence played by Instrument #1: Sustained, using
crescendi, diminuendi, articulation and attack durations to accent note #3
and note #8. To hear a realization, click here.

Crescendi and diminuendi are properly discussed in the context of phrasing, specifically regarding how the secondary notes of a phrase relate to the phrase's tonic accents. The most fundamental consideration in deciding if two sequential notes share membership in a phrase is whether the release of the first note coincides (more or less) with the onset of the second note. The extended duration is itself grounds for inferring that the first note has an upcoming successor, but this perception is heighted if the first note has a crescendo. Contrariwise, if a note has a diminuendo, that implies that a climax (read: tonic accent) has been reached and what follows (if anything) is an afterthought.

Consider the pitch sequence prevously introduced in Listing 1 (a) through Listing 1 (c). This sequence contains one phrase containing pitches C#3, B3, F4, E!4 and a second phrase containing A!4, B!3, D3, F#2. We know these are separate phrases because of the rest that separates them. At this point there is no reason to regard any one note as accented relative to the other members of its phrase. However in Listing 5 and Listing 6, I reveal my inclination to emphasize F4 in phrase #1 and F#2 in phrase #2.

Listing 7 (a) goes about emphasizing F4 and F#2 in a third way, using crescendi and diminuendi to ‘point out’ these notes as tonic accents of their respective phrases. The slurring structure of Listing 1 (a) is reprised. Each accented note is indicated using a circumflex (^) in the note statement's comment area.

Listing 7 (b) reenforces the accentuation of F4 and F#2 by combining the crescendi and diminuendi of Listing 7 (a) with techniques described previously: silences before notes (which breaks up the slurring structure) and harder attack durations.

Tempo Suspension

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeAgogic

note 1 0 1 0 0.50 0.50 24000 24000 138.6 138.6 803 0.03 0.03 // C#3
note 2 0 1 0 1.00 0.50 24000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.50 0.75 24000 24000 349.2 349.2 801 0.03 0.03 // F4 ^
note 4 0 1 0 2.25 0.50 24000 24000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.25 0.50 24000 24000 415.3 415.3 801 0.03 0.03 // A!4
note 6 0 1 0 3.75 0.50 24000 24000 233.1 233.1 801 0.03 0.03 // B!3
note 7 0 1 0 4.25 0.50 24000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.75 0.75 24000 24000 92.50 92.50 804 0.03 0.03 // F#2 ^

end 6.00
Listing 8 (a): Pitch sequence played by Instrument #1: Sustained, using
elongated durations to highlight note #3 and note #8. To hear a realization, click here.
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopePause

note 1 0 1 0 0.50 0.50 24000 24000 138.6 138.6 803 0.03 0.03 // C#3
note 2 0 1 0 1.00 0.50 24000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.63 0.50 24000 24000 349.2 349.2 801 0.03 0.03 // F4 ^
note 4 0 1 0 2.13 0.50 24000 24000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.13 0.50 24000 24000 415.3 415.3 801 0.03 0.03 // A!4
note 6 0 1 0 3.63 0.50 24000 24000 233.1 233.1 801 0.03 0.03 // B!3
note 7 0 1 0 4.13 0.50 24000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.76 0.50 24000 24000 92.50 92.50 804 0.03 0.03 // F#2 ^

end 5.76
Listing 8 (b): Pitch sequence played by Instrument #1: Sustained, using
interpolated thirty-second rests to accent note #3 and note #8. To hear a realization, click here.

The Wikipedia entry on Accent lists two methods of producing “agogic” accents by momentarily suspending the tempo.

The first method extends the “duration of a note with the effect of temporarily slowing down the tempo.” This action is very similar to applying a fermata, except less dramatic. Listing 8 (a) illustrates accents produced by suspending tempo during the accented note's duration.

The second type of tempo suspension is achieved through the “Delayed onset of a note.” This action introduces a hiccup into the temporal flow. Listing 8 (b) illustrates accents produced by suspending tempo just before the accented note begins. Notice that the accentuation is reenforced in this example by allowing moments of silence before the accented notes.

Igor Kipnis makes very expressive use of delayed onsets in his recordings of J.S. Bach's Partitas, specifically to emphasize downbeats in the slower allemandes and sarabands.

Spectral Accent

orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeSpectra

note 1 0 1 0 0.5 0.5 24000 24000 138.6 138.6 803 0.03 0.03 // C#3
note 2 0 1 0 1.0 0.5 24000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.5 0.5 24000 24000 349.2 349.2 803 0.03 0.03 // F4 ^
note 4 0 1 0 2.0 0.5 24000 24000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.0 0.5 24000 24000 415.3 415.3 801 0.03 0.03 // A!4
note 6 0 1 0 3.5 0.5 24000 24000 233.1 233.1 801 0.03 0.03 // B!3
note 7 0 1 0 4.0 0.5 24000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.5 0.5 24000 24000 92.50 92.50 806 0.03 0.03 // F#2 ^

end 5.5
Listing 9 (a): Pitch sequence played by Instrument #1: Sustained, using
enriched spectra to highlight note #3 and note #8. To hear a realization, click here.
orch /Users/charlesames/Scratch/EnvelopeOrch.xml
set rate 44100
set bits 16
set norm 1
name EnvelopeDurationSpectrum

note 1 0 1 0 0.5 0.17 24000 24000 138.6 138.6 803 0.03 0.03 // C#3
note 2 0 1 0 1.0 0.17 24000 24000 246.9 246.9 801 0.03 0.03 // B3
note 3 0 1 0 1.5 0.40 24000 24000 349.2 349.2 804 0.03 0.03 // F4 ^
note 4 0 1 0 2.0 0.17 24000 24000 311.1 311.1 801 0.03 0.03 // E!4

note 5 0 1 0 3.0 0.17 24000 24000 415.3 415.3 801 0.03 0.03 // A!4
note 6 0 1 0 3.5 0.17 24000 24000 233.1 233.1 801 0.03 0.03 // B!3
note 7 0 1 0 4.0 0.17 24000 24000 146.8 146.8 802 0.03 0.03 // D3
note 8 0 1 0 4.5 0.40 24000 24000 92.50 92.50 806 0.03 0.03 // F#2 ^

end 5.5
Listing 9 (b): Pitch sequence played by Instrument #1: Sustained, using
both enriched spectra and elongated durations to highlight note #3 and note #8. To hear a realization, click here.

The following quote comes from Winckel, 1967, page 47:

A definitive explanation of the influcence of the piano hammer stroke on the timbre as a whole has never [as of 1967] been given. We know simply that the tone color is altered by a change in the force of the stroke, as shown by the spectrum in Figure 37. A light stroke results in a softer sound with fewer overtones, a medium stroke makes the sound harsher, while a strong stroke gives the sound a bright, almost wind instrument-like timbre.

There are lots of ways one can go about designing instruments which render richness of spectrum parametrically controllable. Indeed, spectral richness can be made to evolve over the duration of a note — this is something explored in John Chowning's iconic 1973 article on frequency modulation synthesis.

The Kurzweil 250 synthesizer had its saxophone patch set up so that above a certain key velocity threshold, the K250 would substitute a growl sound for the default saxophone sound.

EnvelopeOrch.xml supports spectral accents through the direct expedient of using note parameters to indicate which waveform an oscillator should employ. Most of the examples on this page present note #3 (F4) using waveform #801, whose spectrum peaks above the fundamental at 349 Hz. However Listing 9 (a) and Listing 9 (b) increase spectral richness by substituding waveform #801, which shifts the spectrum peak to the fourth harmonic (F6). Likewise the waveform used for F#2, normally waveform #804, has been swapped for waveform #806.

Listing 9 (a) relies solely upon spectral richness for accent. Listing 9 (b) reinforces the spectral effect by applying the durational effect previously employed in Listing 6.

Vibrato

The topic of vibrato requires attention because it is so essential to the performance practice of live players, and especially of singers.

Instrument #1: Sustained in EnvelopeOrch.xml does not support vibrato; however the page named “Implementing a Sound-Synthesis Instrument provides instructions for an instrument which modulates the pitch of an audio-frequency tone using a low-frequency sine wave. Using this instrument, one can achieve vibrato accents by selectively kicking up Parameter #9: Vibrato Rate and Parameter #10: Vibrato Depth.

The instrument just described admittedly sounds like an old-time electric organ. The next level of synthetic vibrato would be to modulate the rate and depth of vibrato using contours. Consider the way Billie Holiday uses vibrato to prevent a sustained note from going stale. The note starts out unmodulated; a slow and shallow vibrato is introduced, then finally near the end of the note, a few rapid and deep cycles.

Encoding Articulation

Several disadvantages of the note-list format were enumerated in this site's instructions for the note-list editor. Here we focus on the particular challenges faced by note-list preprocessing tools when articulation is taken into account.

The two score-transcription approaches known to me during the 1970's were Leland Smith's SCORE and Alan Ashton's linear music code.

Both SCORE and the linear music code treated articulation as something textural rather than something functional. Smith's duty-factor approach has an engineering appeal. It is clearly more flexible than Ashton's approach, for it allows such effects as a gradual transition between staccato (short duty factor) and legato (full duty factor). However, the discussion above under The Silence Before the Note suggests that articulation just doesn't work along a percentage scale. Among the terms that we use to describe articulation, only staccato severely cuts down the note's sounding duration. Detached articulation indicates just enough space between notes to permit the second note's attack to speak — around 80 milliseconds. Legato, and slurred articulations would both measure out with a 100% duty factor, yet there is a world of difference between the two: Legato notes have separate envelopes while slurred notes happen under a single envelope.

When I adapted the linear music code to sound synthesis, I introduced the comma as the encoding character for detached articulation. For example in “QC4C,D!D” the comma interposes a thirty-second rest, reducing second middle C's sounding duration to a double-dotted eighth. However, I didn't get much time to explore subtleties of articulation with my ASHTON program. I had hardly got my program running when a basement flood in the UB music building wrecked our digital-to-analog converter.

Articulation and Tempo

As this page wraps up I would like to draw your attention to the way it times various segments of the envelope in milliseconds rather than in fractions of the quarter note. Such is the case for silences which precede detached notes, for attack durations, for hold durations, decay rates, and final release. Consider a slowing tempo, which expands note durations. For a sustained envelope, the expansion affects only the steady-state segment; not attack nor release. For a percussive envelope the expanded duration doesn't mean that the energy dissipates any less rapidly; rather it means that the amplitude grows ever fainter until finally damped off by the release. Expressed in physical terms, having these envelope timings expand with slower tempi would be like saying that when the music slows down a human percussionist should exchange hard mallets for softer mallets (slower attack time) or switch from xylophone to tubular bells (slower decay rate).

The way software sound-synthesis instruments take in data from note parameters renders most envelope settings independent of tempo. The only exception to this the moment of silence in a detached articulation, since the duration of this moment depends on the end time of the predecessor note and the start time of the successor note. So much for articulation and tempo. However we're not through with the dichotomy between absolute and relative timings. This issue will come up again in reference to speech synthesis.

© Charles Ames Page created: 2014-02-20 Last updated: 2017-06-24