Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
49 Cards in this Set
- Front
- Back
Three systems that make up the speech apparatus
|
Respiratory
Phonatory Articulatory |
|
Respiratory System
|
Lungs, rib cage, diaphragm, tissues, intercostal muscles.
An air pump providing the aerodynamic energy for the laryngeal and articulatory systems. Elastic process. Usually passive but can be active. Ex. taking a large breath before a long sentence. |
|
Laryngeal System
|
Job: change voiceless sounds to voiced.
The larynx is at the top of the trachea and consists of a number of cartilages and muscles. Just lateral to the vocal ligament is the internal cricoarytnoid muscles. Lateral to that is the external cricoarytnoid muscles. |
|
Cartilages, muscles and bones of the laryngeal system
|
Cartilages: cricoid, thyroid (largest), arytenoids (on top of cricoid).
Muscles: internal and external thyroarytenoids. Hyoid bone: one of the one free floating bones. |
|
Glottis
|
Opening between the vocal folds.
|
|
Lateral cricoarytenoid muscles
|
Contract to adduct the vocal ligaments
|
|
Transverse arytenoid muscle
|
Only unpaired muscle. Adducts the vocal ligaments.
|
|
Oblique arytenoid muscles
|
Crossed behind transverse arytenoid muscle.
|
|
Muscles of vocal fold adduction
|
Lateral cricoarytenoids, transverse arytenoid, oblique arytenoids.
|
|
Muscle of vocal fold abduction
|
Posterior cricoarytenoids
|
|
Articulatory system
|
Tongue, lips, jaw, and velum.
The shape of the system determine the resonance properties. |
|
Vowels
|
Voiced sounds with an open vocal tract to produce specific resonances. Always voiced. Steady-state articulatory configuration and acoustic pattern. Vowels have inherent differences in duration.
|
|
Fricatives
|
Produced with a narrow constriction in the vocal tract. Broadband noise. Voiced fricatives have extra low frequency energy.
|
|
Stops
|
A brief closure and a burst of noise. Then movement toward another vocal tract configuration. Fastest sound in connected speech, about 10-15 ms. 50 ms in isolation.
|
|
Nasals
|
Produced with the velopharynx open. Sound passes through nasal and oral tract or just nasal tract. The formants of the nasal cavity depend on the length of the cavity from the uvula to the nostrils.
Nasal formant (murmer) a band at 200-300 Hz. |
|
Average man's voice
|
Fundamental frequency of 120 Hz and has spectral energy at 120, 240, 360, 480 and so on in harmonic steps.
|
|
Average females voice
|
About 225 Hz
|
|
How are we able to produce intelligible speech with a variety of energy sources?
|
The independence of the source and the filter.
|
|
Formants
|
The natural mode of vibration of the vocal tract. Make up the transfer function (input-output relation similar to filtering) of the vocal tract.
|
|
Radiation characteristic
|
The filtering effects when sound escapes the mouth and radiates into space. The amount of energy measured at the lips.
|
|
Articulatory-acoustic relationship
|
Front=high F2
Back=low F2 High=low F1 Low=high F1 |
|
Lip rounding effect on formants
|
Lip rounding occurs for some back and center vowels. Lip rounding extends the vocal tract, lowering all formant frequencies.
|
|
Affricates
|
Have a friction segment that is intermediate in duration between the burst for stops and the friction interval for fricatives. Combination of a stop and fricative.
|
|
Liquids
|
Lateral liquid /l/ have formants similar to nasal consonants. The rhotic consonant /r/ has a very low F3 frequency when compared to /l/. Laterals involve a splitting of the vocal tract around a midline constriction.
|
|
Diphthongs
|
Combinations of vowels. Movement of the articulators. Like vowels b/c relatively open vocal tract and well defined formant structure. Cannot be steady state acoustic features.
|
|
Glides
|
Show movement. /w/ and /j/.
|
|
Vertical striations
|
VF vibration
|
|
Does bandwidth contribute to intelligibility?
|
No
|
|
Limitations of Simple Vowel Target Model
|
Does not account for speaker variations, temporal or dynamic variations.
Inability to account for target undershoot. F2 in a CVC syllable does not reach the target value determined the the isolated vowel because of coarticulation. |
|
Elaborated Target Model
|
The Bark transform is designed to model the normalization of acoustic data performed by the auditory system. Must be non-linear output b/c cochlea is a non-linear structure.
Trying to find: is there a key thing that helps us identify vowels? |
|
Dynamic Specification Model
|
Temporal or dynamic information is used to identify vowels. These cues are the formant transitions into and out of a vowel steady state and the duration of the steady state. Timing is not specific to vowels!
|
|
Vowel perception
|
Constructed patterns (multiple speakers).
Templates (single speaker): pull other speakers into that template. |
|
How we differentiate vowels
|
Formant pattern (only one we can see on a spectrogram)
Spectrum Duration Fundamental frequency |
|
Optimal octaves for /i/
|
1250-2500 Hz, 2500-5000 Hz, 5000-10000 Hz
|
|
Optimal octaves for /u/
|
80-160 Hz, 160-315 Hz
|
|
Optimal octaves for /a/
|
630-1250 Hz, 1250-2500 Hz
|
|
Does fundamental frequency vary with vowel height?
|
Yes. Higher vowel=higher F0.
F0 secondary to F2 for identifying vowels. |
|
Formant bandwidth
|
Increases with damping. Increases with formant number. Dulling of formant spectrum.
|
|
Relationship between formant bandwidth and amplitude
|
Increased bandwidth leads to reduction in overall amplitude.
|
|
Compare diphthongs and vowels
|
Similarities: voicing and open vocal tract.
Differences: no steady state info. |
|
Consonants involve
|
1. Noise generation
2. A period of complete obstruction 3. A narrowing of the vocal tract 4. Strictly oral 5. Nasal 6. Voiced vs. voiceless |
|
Acoustic properties of stop consonants
|
1. Stop gap
2. Release burst 3. Formant transition- out of a stop into a vowel. 4. Voicing |
|
Stop gap
|
The acoustic interval corresponding to the articulatory occlusion. 50-100 msec. If there is nothing ahead of the stop you can't tell where the stop starts (voiceless).
|
|
The three places of occlusion for stop consonants
|
Bilabial
Alveolar Velar |
|
Stop consonant release burst
|
The transient that is produced on release of the occlusion and is no more than 40 msec in duration. Fastest acoustic event in speech production.
|
|
Stop identification using a simplified burst cue and the following vowel.
|
1. Bursts with a center frequency lower than the vowel F2 were identified as /p/ (bilabial).
2. Bursts with a center frequency close to F2 were identified as /k/ (velar). 3. Bursts with a center frequency higher than the vowel F2 were identified as /t/ (alveolar). |
|
Stop aspiration
|
Voiceless stops have aspirated releases except when they follow /s/.
|
|
VOT
|
Voice onset time. The interval between the articulatory release of the stop and the onset of vocal fold vibrations. For voiced stops -20 msec to +20 msec VOT. For voiceless stops 25 msec to 100 msec VOT.
|
|
Problems with the simple vowel target
|
1. Assumes that the vowel is invariant across phonetic contexts and defined by a static vocal tract shape or by a point in the F1-F2 plane.
2. Inability to account for target undershoot of F2 in a CVC syllable. 3. Cannot account for temporal or dynamic variations. |