Speech Synthesis:
Melody of “Daisy Bell”

orch /Users/charlesames/Scratch/SpeechOrch.xml
set rate 44100
set bits 16
set norm 1
name Daisy1

ramp 1 1 0.0 96.0 6000 6000 // Amplitude
ramp 1 3 0.0 96.0 414 414 // Formant 1
ramp 1 4 0.0 96.0 1516 1516 // Formant 2
ramp 1 5 0.0 96.0 2500 2500 // Formant 3

// Dai-
ramp 1 2 0.00 2.98 293.7 293.7 // D4
ramp 1 2 2.98 0.02 293.7 246.9 // D4-B3
note 1 1 101 0 0.00 3.00 1 0.03 // Buzz1
// -sy,
ramp 1 2 3.00 3.00 246.9 246.9 // B3
note 2 1 101 1 3.00 2.00 1 0 // Buzz1
note 3 1 119 0 0.00 5.00 0.1 // RMS1
note 4 1 121 0 0.00 5.00 // Mouth1
note 5 1 199 0 0.00 5.00 // Rebalance1

// (rest)

// Dai-
ramp 1 2 6.00 2.98 196 196 // G3
ramp 1 2 8.98 0.02 196 146.8 // G3-D3
note 6 1 101 0 6.00 3.00 1 0.03 // Buzz1
// -sy.
ramp 1 2 9.00 3.00 146.8 146.8 // D3
note 7 1 101 6 9.00 2.00 1 0 // Buzz1
note 8 1 119 0 6.00 5.00 0.1 // RMS1
note 9 1 121 0 6.00 5.00 // Mouth1
note 10 1 199 0 6.00 5.00 // Rebalance1

// (rest)

// Give
ramp 1 2 12.00 1.00 164.8 164.8 // E3
note 11 1 101 0 12.00 0.90 1 0.03 // Buzz1
note 12 1 119 0 12.00 0.90 0.1 // RMS1
note 13 1 121 0 12.00 0.90 // Mouth1
note 14 1 199 0 12.00 0.90 // Rebalance1

// me
ramp 1 2 13.00 1.00 185 185 // F#3
note 15 1 101 0 13.00 0.90 1 0.03 // Buzz1
note 16 1 119 0 13.00 0.90 // RMS1
note 17 1 121 0 13.00 0.90 // Mouth1
note 18 1 199 0 13.00 0.90 // Rebalance1

// your
ramp 1 2 14.00 1.00 196 196 // G3
note 19 1 101 0 14.00 0.90 1 0.03 // Buzz1
note 20 1 119 0 14.00 0.90 0.1 // RMS1
note 21 1 121 0 14.00 0.90 // Mouth1
note 22 1 199 0 14.00 0.90 // Rebalance1

// an-
ramp 1 2 15.00 1.98 164.8 164.8 // E3
ramp 1 2 16.98 0.02 164.8 196 // E3-G3
note 23 1 101 0 15.00 2.00 1 0.03 // Buzz1
// -swer
ramp 1 2 17.00 1.00 196 196 // G3
note 24 1 101 23 17.00 0.90 1 0 // Buzz1
note 25 1 119 0 15.00 2.90 // RMS1
note 26 1 121 0 15.00 2.90 // Mouth1
note 27 1 199 0 15.00 2.90 // Rebalance1

// do.
ramp 1 2 18.00 6.00 146.8 146.8 // D3
note 28 1 101 0 18.00 3.00 1 0.03 // Buzz1
note 29 1 119 0 18.00 3.00 0.1 // RMS1
note 30 1 121 0 18.00 3.00 // Mouth1
note 31 1 199 0 18.00 3.00 // Rebalance1

// (rest)

// I'm
ramp 1 2 24.00 3.00 220 220 // A3
note 32 1 101 0 24.00 2.90 1 0.03 // Buzz1
note 33 1 119 0 24.00 2.90 // RMS1
note 34 1 121 0 24.00 2.90 // Mouth1
note 35 1 199 0 24.00 2.90 // Rebalance1

// half
ramp 1 2 27.00 3.00 293.7 293.7 // D4
note 36 1 101 0 27.00 3.00 1 0.03 // Buzz1
note 37 1 119 0 27.00 3.00 0.1 // RMS1
note 38 1 121 0 27.00 3.00 // Mouth1
note 39 1 199 0 27.00 3.00 // Rebalance1

// cra-
ramp 1 2 30.00 1.98 246.9 246.9 // B3
ramp 1 2 31.98 0.02 246.9 220 // B3-A3
ramp 1 2 32.00 0.98 220 220 // A3
ramp 1 2 32.98 0.02 220 196 // A3-G3
note 40 1 101 0 30.00 3.00 1 0.03 // Buzz1
// -zy,
ramp 1 2 33.00 2.00 196 196 // G3
note 41 1 101 40 33.00 1.90 1 0 // Buzz1
note 42 1 119 0 30.00 4.90 0.1 // RMS1
note 43 1 121 0 30.00 4.90 // Mouth1
note 44 1 199 0 30.00 4.90 // Rebalance1

// and
ramp 1 2 35.00 1.00 185 185 // F#3
note 45 1 101 0 35.00 0.90 1 0.03 // Buzz1
note 46 1 119 0 35.00 0.90 // RMS1
note 47 1 121 0 35.00 0.90 // Mouth1
note 48 1 199 0 35.00 0.90 // Rebalance1

// all
ramp 1 2 36.00 1.00 164.8 164.8 // E3
note 49 1 101 0 36.00 0.90 1 0.03 // Buzz1
note 50 1 119 0 36.00 0.90 0.1 // RMS1
note 51 1 121 0 36.00 0.90 // Mouth1
note 52 1 199 0 36.00 0.90 // Rebalance1

// for
ramp 1 2 37.00 1.00 185 185 // F#3
note 53 1 101 0 37.00 0.90 1 0.1 // Buzz1
note 54 1 119 0 37.00 0.90 0.1 // RMS1
note 55 1 121 0 37.00 0.90 // Mouth1
note 56 1 199 0 37.00 0.90 // Rebalance1

// the
ramp 1 2 38.00 1.00 196 196 // G3
note 57 1 101 0 38.00 0.90 1 0.03 // Buzz1
note 58 1 119 0 38.00 0.90 0.1 // RMS1
note 59 1 121 0 38.00 0.90 // Mouth1
note 60 1 199 0 38.00 0.90 // Rebalance1

// love
ramp 1 2 39.00 2.00 220 220 // A3
note 61 1 101 0 39.00 1.90 1 0.03 // Buzz1
note 62 1 119 0 39.00 1.90 0.1 // RMS1
note 63 1 121 0 39.00 1.90 // Mouth1
note 64 1 199 0 39.00 1.90 // Rebalance1

// of
ramp 1 2 41.00 1.00 246.9 246.9 // B3
note 65 1 101 0 41.00 0.90 1 0.03 // Buzz1
note 66 1 119 0 41.00 0.90 0.1 // RMS1
note 67 1 121 0 41.00 0.90 // Mouth1
note 68 1 199 0 41.00 0.90 // Rebalance1

// you.
ramp 1 2 42.00 5.00 220 220 // A3
note 69 1 101 0 42.00 2.90 1 0.03 // Buzz1
note 70 1 119 0 42.00 2.90 0.1 // RMS1
note 71 1 121 0 42.00 2.90 // Mouth1
note 72 1 199 0 42.00 2.90 // Rebalance1

// (rest)

// It
ramp 1 2 47.00 1.00 246.9 246.9 // B3
note 73 1 101 0 47.00 0.50 1 0.03 // Buzz1
note 74 1 119 0 47.00 0.90 0.1 // RMS1
note 75 1 121 0 47.00 0.90 // Mouth1
note 76 1 199 0 47.00 0.90 // Rebalance1

// won't
ramp 1 2 48.00 1.00 261.6 261.6 // C4
note 77 1 101 0 48.00 0.90 1 0.03 // Buzz1
note 78 1 119 0 48.00 0.90 // RMS1
note 79 1 121 0 48.00 0.90 // Mouth1
note 80 1 199 0 48.00 0.90 // Rebalance1

// be
ramp 1 2 49.00 1.00 246.9 246.9 // B3
note 81 1 101 0 49.00 0.90 1 0.03 // Buzz1
note 82 1 119 0 49.00 0.90 0.1 // RMS1
note 83 1 121 0 49.00 0.90 // Mouth1
note 84 1 199 0 49.00 0.90 // Rebalance1

// a
ramp 1 2 50.00 1.00 220 220 // A4
note 85 1 101 0 50.00 0.90 1 0.03 // Buzz1
note 86 1 119 0 50.00 0.90 0.1 // RMS1
note 87 1 121 0 50.00 0.90 // Mouth1
note 88 1 199 0 50.00 0.90 // Rebalance1

// sty-
ramp 1 2 51.00 1.98 293.7 293.7 // D4
ramp 1 2 52.98 0.02 293.7 246.9 // D4-B3
note 89 1 101 0 51.00 2.00 1 0.03 // Buzz1
// -lish
ramp 1 2 53.00 1.00 246.9 246.9 // B3
note 90 1 101 89 53.00 0.90 1 0 // Buzz1
note 91 1 119 0 51.00 2.90 0.1 // RMS1
note 92 1 121 0 51.00 2.90 // Mouth1
note 93 1 199 0 51.00 2.90 // Rebalance1

// mar-
ramp 1 2 54.00 0.98 220 220 // A3
ramp 1 2 54.98 0.02 220 196 // A3-G3
note 94 1 101 0 54.00 1.00 1 0.03 // Buzz1
// -riage.
ramp 1 2 55.00 4.00 196 196 // G3
note 95 1 101 94 55.00 1.90 1 0 // Buzz1
note 96 1 119 0 54.00 2.90 // RMS1
note 97 1 121 0 54.00 2.90 // Mouth1
note 98 1 199 0 54.00 2.90 // Rebalance1

// (rest)

// I
ramp 1 2 59.00 1.00 220 220 // A3
note 99 1 101 0 59.00 0.90 1 0.03 // Buzz1
note 100 1 119 0 59.00 0.90 0.1 // RMS1
note 101 1 121 0 59.00 0.90 // Mouth1
note 102 1 199 0 59.00 0.90 // Rebalance1

// can't
ramp 1 2 60.00 2.00 246.9 246.9 // B3
note 103 1 101 0 60.00 1.90 1 0.03 // Buzz1
note 104 1 119 0 60.00 1.90 // RMS1
note 105 1 121 0 60.00 1.90 // Mouth1
note 106 1 199 0 60.00 1.90 // Rebalance1

// af-
ramp 1 2 62.00 0.98 196 196 // G3
ramp 1 2 62.98 0.02 196 164.8 // G3-E3
note 107 1 101 0 62.00 1.00 1 0.03 // Buzz1
// -ford
ramp 1 2 63.00 2.00 164.8 164.8 // E3
note 108 1 101 107 63.00 1.90 1 0 // Buzz1
note 109 1 119 0 62.00 2.90 0.1 // RMS1
note 110 1 121 0 62.00 2.90 // Mouth1
note 111 1 199 0 62.00 2.90 // Rebalance1

// a
ramp 1 2 65.00 1.00 196 196 // G3
note 112 1 101 0 65.00 0.90 1 0.03 // Buzz1
note 113 1 119 0 65.00 0.90 0.1 // RMS1
note 114 1 121 0 65.00 0.90 // Mouth1
note 115 1 199 0 65.00 0.90 // Rebalance1

// car-
ramp 1 2 66.00 0.98 164.8 164.8 // E3
ramp 1 2 66.98 0.02 164.8 146.8 // E3-D3
note 116 1 101 0 66.00 1.00 1 0.03 // Buzz1
// -riage.
ramp 1 2 67.00 4.00 146.8 146.8 // D3
note 117 1 101 116 67.00 2.00 1 0 // Buzz1
note 118 1 119 0 66.00 3.00 0.1 // RMS1
note 119 1 121 0 66.00 3.00 // Mouth1
note 120 1 199 0 66.00 3.00 // Rebalance1

// (rest)

// But
ramp 1 2 71.00 1.00 146.8 146.8 // D3
note 121 1 101 0 71.00 0.90 1 0.03 // Buzz1
note 122 1 119 0 71.00 0.90 0.1 // RMS1
note 123 1 121 0 71.00 0.90 // Mouth1
note 124 1 199 0 71.00 0.90 // Rebalance1

// you'll
ramp 1 2 72.00 2.00 196 196 // G3
note 125 1 101 0 72.00 1.90 1 0.03 // Buzz1
note 126 1 119 0 72.00 1.90 0.1 // RMS1
note 127 1 121 0 72.00 1.90 // Mouth1
note 128 1 199 0 72.00 1.90 // Rebalance1

// look
ramp 1 2 74.00 1.00 246.9 246.9 // B3
note 129 1 101 0 74.00 0.90 1 0.03 // Buzz1
note 130 1 119 0 74.00 0.90 0.1 // RMS1
note 131 1 121 0 74.00 0.90 // Mouth1
note 132 1 199 0 74.00 0.90 // Rebalance1

// sweet
ramp 1 2 75.00 2.00 220 220 // A3
note 133 1 101 0 75.00 1.90 1 0.03 // Buzz1
note 134 1 119 0 75.00 1.90 0.1 // RMS1
note 135 1 121 0 75.00 1.90 // Mouth1
note 136 1 199 0 75.00 1.90 // Rebalance1

// u-
ramp 1 2 77.00 0.98 146.8 146.8 // D3
ramp 1 2 77.98 0.02 146.8 196 // D3-G3
note 137 1 101 0 77.00 1.00 1 0.03 // Buzz1
// -pon
ramp 1 2 78.00 2.00 196 196 // G3
note 138 1 101 137 78.00 1.90 1 0 // Buzz1
note 139 1 119 0 77.00 2.90 // RMS1
note 140 1 121 0 77.00 2.90 // Mouth1
note 141 1 199 0 77.00 2.90 // Rebalance1

// the
ramp 1 2 80.00 1.00 246.9 246.9 // B3
note 142 1 101 0 80.00 0.90 1 0.03 // Buzz1
note 143 1 119 0 80.00 0.90 0.1 // RMS1
note 144 1 121 0 80.00 0.90 // Mouth1
note 145 1 199 0 80.00 0.90 // Rebalance1

// seat
ramp 1 2 81.00 1.00 220 220 // A3
note 146 1 101 0 81.00 0.90 1 0.03 // Buzz1
note 147 1 119 0 81.00 0.90 0.1 // RMS1
note 148 1 121 0 81.00 0.90 // Mouth1
note 149 1 199 0 81.00 0.90 // Rebalance1

// of
ramp 1 2 82.00 1.00 246.9 246.9 // B3
note 150 1 101 0 82.00 0.90 1 0.03 // Buzz1
note 151 1 119 0 82.00 0.90 0.1 // RMS1
note 152 1 121 0 82.00 0.90 // Mouth1
note 153 1 199 0 82.00 0.90 // Rebalance1

// a
ramp 1 2 83.00 1.00 261.6 261.6 // C4
note 154 1 101 0 83.00 0.90 1 0.03 // Buzz1
note 155 1 119 0 83.00 0.90 0.1 // RMS1
note 156 1 121 0 83.00 0.90 // Mouth1
note 157 1 199 0 83.00 0.90 // Rebalance1

// bi-
ramp 1 2 84.00 0.98 293.7 293.7 // D4
ramp 1 2 84.98 0.02 293.7 246.9 // D4-B3
note 158 1 101 0 84.00 1.00 1 0.03 // Buzz1
// -cy-
ramp 1 2 85.00 0.98 246.9 246.9 // B3
ramp 1 2 85.98 0.02 246.9 196 // B3-G3
note 159 1 101 158 85.00 1.00 1 0 // Buzz1
// -cle
ramp 1 2 86.00 1.00 196 196 // G3
note 160 1 101 159 86.00 0.90 1 0 // Buzz1
note 161 1 119 0 84.00 2.90 0.1 // RMS1
note 162 1 121 0 84.00 2.90 // Mouth1
note 163 1 199 0 84.00 2.90 // Rebalance1

// built
ramp 1 2 87.00 2.00 220 220 // A3
note 164 1 101 0 87.00 1.90 1 0.03 // Buzz1
note 165 1 119 0 87.00 1.90 0.1 // RMS1
note 166 1 121 0 87.00 1.90 // Mouth1
note 167 1 199 0 87.00 1.90 // Rebalance1

// for
ramp 1 2 89.00 1.00 146.8 146.8 // D3
note 168 1 101 0 89.00 0.90 1 0.03 // Buzz1
note 169 1 119 0 89.00 0.90 0.1 // RMS1
note 170 1 121 0 89.00 0.90 // Mouth1
note 171 1 199 0 89.00 0.90 // Rebalance1

// two
ramp 1 2 90.00 6.00 196 196 // G3
note 172 1 101 0 90.00 3.00 1 0.03 // Buzz1
note 173 1 119 0 90.00 3.00 0.1 // RMS1
note 174 1 121 0 90.00 3.00 // Mouth1
note 175 1 199 0 90.00 3.00 // Rebalance1

// (rest)

end 96.0
Listing 2: Note-list body for “Daisy Bell” by Henry Dacre, Iteration #1, establishing durations and pitches. To hear a realization, click here.

“Daisy Bell”, Iteration 1

Listing 2 presents the first-iteration synthesis of “Daisy Bell”. The tempo employed is one quarter note per second. We now clarify the functionalities of the note list by launching into an explanation of each statement. Be reassured however that this blow-by-blow description will only extend through the first word of the song.

The listing begins with the note-list header previously provided in Listing 1. Notice, however, the additional name statement, which directs the sythesized result to a file named Daisy1.wav. This file will reside in the same directory as the note list.

The listing continues with four ramp statements. The six parameters of any ramp statement have the following roles:

  1. Voice identifier,
  2. Contour number,
  3. Start time in seconds,
  4. Duration in seconds,
  5. Origin value, and
  6. Goal value.

These particular ramp statements each set their corresponding contour to a value which, for the purposes of this first iteration, holds steady through the duration of the note list.

For the record, the formant frequencies 414, 1516, and 2500 are together characteristic of the neutral vowel, or schwa.

Next in the listing come three statements specifically affecting the initial syllable for “Daisy”, which sustains for 3 seconds (a dotted half note).

The next five statements complete the synthesis of the initial word, “Daisy”:

Slurs

Notice that Listing 2 always slurs between consecutive syllables of a single word. This is the convention when setting text to music, and the convention is a sensible one, since words are single utterances which generally ought to be connected together. The slur from notes in Listing 2 #2 (Dai-sy), #7 (Dai-sy), #24 (an-swer), #41 (cra-zy), #90 (sty-lish), #95 (mar-riage), #108 (af-ford), #117 (car-riage), #138 (u-pon), #159 (bi-cy-cle), and #160 (bi-cy-cle).

A slur is indicated by placing a slur-from note id in parameter #4 of the slur-to note statement. This causes the amplitude envelope to skip the release phase of the slur-from note and to skip the attack phase of the slur-to note. It also causes Instrument #101: Buzz's oscillator to carry over it's waveform position pointer. Slurs are only permitted when two notes reference the same instrument and when the end time of the slur-from note matches the start time of the slur-to note.

However it turns out that allowing an oscillator's frequency to transition instantaneously between slur-from note and slur-to note generates a pop. The pop happens despite the fact that if you view the signal, you'll see no actual discontinuity where the notes change over. The remedy for this in SpeechOrch.xml and in Listing 2 has been to implement the frequency input to Instrument #101: Buzz not as a discrete parameter, but rather as a contour that evolves over time. Listing 2 for the most part describes frequencies using steady-state ramp statements — “steady-state” meaning that the origin frequency and the goal frequency are the same. However, when notes are slurred, there is a transitional ramp lasting a very brief 20 msec.

The 20 msec duration of frequency transitions is too short to be heard as a portamento effect. All it does is eliminate the pop. The transition becomes hearable portamento if you increase the transition time to 50 msec. and beyond. I tried 50 msec for “Daisy Bell” and found the poramenti irritating, at least when done as a matter of policy. I have indulged myself by using audible (100 msec.) portamento on the first syllable of “cra-zy” (note #40). Even so, sliding by two full semitones proved too much, so I stepped the pitch down by inserting an A3 on the third beat.

Later iterations will transition back and forth between vowel-like sounds and noisy sounds. Specifically, notes #2, #7, #41 begin with voiced fricatives (z sounds); notes #24 and #108 begins with unvoiced fricatives (s and f, respectively); notes #138 and #160 begin with stop consonants (p and k). In all of these cases it will be necessary to back out the slurs. For the fricatives the reason is technical: the orchestra in SpeechOrch.xml does not permit slurring between voices. For the stop consonants the reason is practical: such consonants involve actual cessation of sound prior to the “plosive” burst.

Articulation

A second policy observed in Listing 2 is that if two syllables do not belong to the same word, then there should always be at least 100 msec. of silence separating the syllables. The converse of this policy is that there should be little or no silence separating syllables within the same word. For now, and continuing on through Iteration #4, this no-separation policy is masked by the must-slur policy. However articulation policy will again become an issue in Iteration #5 and Iteration #6.

Next topic: Pronunciation

© Charles Ames Page created: 2014-02-20 Last updated: 2017-06-12