Contents
The aural rendering of a document, already commonly used by the blind and print-impaired communities, combines speech synthesis and "auditory icons." Often such aural presentation occurs by converting the document to plain text and feeding this to a screen reader -- software or hardware that simply reads all the characters on the screen. This results in less effective presentation than would be the case if the document structure were retained. Style sheet properties for aural presentation may be used together with visual properties (mixed media) or as an aural alternative to visual presentation.
Besides the obvious accessibility advantages, there are other large markets for listening to information, including in-car use, industrial and medical documentation systems (intranets), home entertainment, and to help users learning to read or who have difficulty reading.
When using aural properties, the canvas consists of a three-dimensional physical space (sound surrounds) and a temporal space (one may specify sounds before, during, and after other sounds). The CSS properties also allow authors to vary the quality of synthesized speech (voice type, frequency, inflection, etc.).
H1, H2, H3, H4, H5, H6 { voice-family: paul; stress: 20; richness: 90; cue-before: url("ping.au") } P.heidi { azimuth: center-left } P.peter { azimuth: right } P.goat { volume: x-soft }
This will direct the speech synthesizer to speak headers in a voice (a kind of "audio font") called "paul", on a flat tone, but in a very rich voice. Before speaking the headers, a sound sample will be played from the given URL. Paragraphs with class "heidi" will appear to come from front left (if the sound system is capable of spatial audio), and paragraphs of class "peter" from the right. Paragraphs with class "goat" will be very soft.
Value: | <number> | <percentage> | silent | x-soft | soft | medium | loud | x-loud | inherit |
Initial: | medium |
Applies to: | all elements |
Inherited: | yes |
Percentages: | refer to inherited value |
Media: | aural |
Volume refers to the median volume of the waveform. In other words, a highly inflected voice at a volume of 50 might peak well above that. The overall values are likely to be human adjustable for comfort, for example with a physical volume control (which would increase both the 0 and 100 values proportionately); what this property does is adjust the dynamic range.
Values have the following meanings:
User agents should allow the values corresponding to '0' and '100' to be set by the listener. No one setting is universally applicable; suitable values depend on the equipment in use (speakers, headphones), the environment (in car, home theater, library) and personal preferences. Some examples:
The same author style sheet could be used in all cases, simply by mapping the '0' and '100' points suitably at the client side.
This property specifies whether text will be rendered aurally and if so, in what manner (somewhat analogous to the 'display' property). The possible values are:
Note the difference between an element whose 'volume' property has a value of 'silent' and an element whose 'speak' property has the value 'none'. The former takes up the same time as if it had been spoken, including any pause before and after the element, but no sound is generated. The latter requires no time and is not rendered (though its descendants may be).
Value: | <time> | <percentage> | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | no |
Percentages: | see prose |
Media: | aural |
Value: | <time> | <percentage> | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | no |
Percentages: | see prose |
Media: | aural |
These properties specify a pause to be observed before (or after) speaking an element's content. Values have the following meanings:
The pause is inserted between the element's content and any 'cue-before' or 'cue-after' content.
Authors should use relative units to create more robust style sheets in the face of large changes in speech-rate.
Value: | [ [<time> | <percentage>]{1,2} ] | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | no |
Percentages: | see descriptions of 'pause-before' and 'pause-after' |
Media: | aural |
The 'pause' property is a shorthand for setting 'pause-before' and 'pause-after'. If two values are given, the first value is 'pause-before' and the second is 'pause-after'. If only one value is given, it applies to both properties.
H1 { pause: 20ms } /* pause-before: 20ms; pause-after: 20ms */ H2 { pause: 30ms 40ms } /* pause-before: 30ms; pause-after: 40ms */ H3 { pause-after: 10ms } /* pause-before: ?; pause-after: 10ms */
Value: | <uri> | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | no |
Percentages: | N/A |
Media: | aural |
Value: | <uri> | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | no |
Percentages: | N/A |
Media: | aural |
Auditory icons are another way to distinguish semantic elements. Sounds may be played before and/or after the element to delimit it. Values have the following meanings:
A {cue-before: url("bell.aiff"); cue-after: url("dong.wav") } H1 {cue-before: url("pop.au"); cue-after: url("pop.au") }
Value: | [ <'cue-before'> || <'cue-after'> ] | inherit |
Initial: | not defined for shorthand properties |
Applies to: | all elements |
Inherited: | no |
Percentages: | N/A |
Media: | aural |
The 'cue' property is a shorthand for setting 'cue-before' and 'cue-after'. If two values are given, the first value is 'cue-before' and the second is 'cue-after'. If only one value is given, it applies to both properties.
The following two rules are equivalent:
H1 {cue-before: url("pop.au"); cue-after: url("pop.au") } H1 {cue: url("pop.au") }
If a user agent cannot render an auditory icon (e.g., the user's environment does not permit it), we recommend that it produce an alternative cue (e.g., popping up a warning, emitting a warning sound, etc.)
Please see the sections on the :before and :after pseudo-elements for information on other content generation techniques.
Value: | <uri> mix? repeat? | auto | none | inherit |
Initial: | auto |
Applies to: | all elements |
Inherited: | no |
Percentages: | N/A |
Media: | aural |
Similar to the 'cue-before' and 'cue-after' properties, this property specifies a sound to be played as a background while an element's content is spoken. Values have the following meanings:
BLOCKQUOTE.sad { play-during: url("violins.aiff") } BLOCKQUOTE Q { play-during: url("harp.wav") mix } SPAN.quiet { play-during: none }
Spatial audio is an important stylistic property for aural presentation. It provides a natural way to tell several voices apart, as in real life (people rarely all stand in the same spot in a room). Stereo speakers produce a lateral sound stage. Binaural headphones or the increasingly popular 5-speaker home theater setups can generate full surround sound, and multi-speaker setups can create a true three-dimensional sound stage. VRML 2.0 also includes spatial audio, which implies that in time consumer-priced spatial audio hardware will become more widely available.
Values have the following meanings:
This property is most likely to be implemented by mixing the same signal into different channels at differing volumes. It might also use phase shifting, digital delay, and other such techniques to provide the illusion of a sound stage. The precise means used to achieve this effect and the number of speakers used to do so are user agent-dependent; this property merely identifies the desired end result.
H1 { azimuth: 30deg } TD.a { azimuth: far-right } /* 60deg */ #12 { azimuth: behind far-right } /* 120deg */ P.comment { azimuth: behind } /* 180deg */
If spatial-azimuth is specified and the output device cannot produce sounds behind the listening position, user agents should convert values in the rearwards hemisphere to forwards hemisphere values. One method is as follows:
Value: | <angle> | below | level | above | higher | lower | inherit |
Initial: | level |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
Values of this property have the following meanings:
The precise means used to achieve this effect and the number of speakers used to do so are undefined. This property merely identifies the desired end result.
H1 { elevation: above } TR.a { elevation: 60deg } TR.b { elevation: 30deg } TR.c { elevation: level }
Value: | <number> | x-slow | slow | medium | fast | x-fast | faster | slower | inherit |
Initial: | medium |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
This property specifies the speaking rate. Note that both absolute and relative keyword values are allowed (compare with 'font-size'). Values have the following meanings:
Value: | [[<specific-voice> | <generic-voice> ],]* [<specific-voice> | <generic-voice> ] | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
The value is a comma-separated, prioritized list of voice family names (compare with 'font-family'). Values have the following meanings:
H1 { voice-family: announcer, male } P.part.romeo { voice-family: romeo, male } P.part.juliet { voice-family: juliet, female }
Names of specific voices may be quoted, and indeed must be quoted if any of the words that make up the name does not conform to the syntax rules for identifiers. It is also recommended to quote specific voices with a name consisting of more than one word. If quoting is omitted, any whitespace characters before and after the font name are ignored and any sequence of whitespace characters inside the font name is converted to a single space.
Value: | <frequency> | x-low | low | medium | high | x-high | inherit |
Initial: | medium |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
Specifies the average pitch (a frequency) of the speaking voice. The average pitch of a voice depends on the voice family. For example, the average pitch for a standard male voice is around 120Hz, but for a female voice, it's around 210Hz.
Values have the following meanings:
Value: | <number> | inherit |
Initial: | 50 |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
Specifies variation in average pitch. The perceived pitch of a human voice is determined by the fundamental frequency and typically has a value of 120Hz for a male voice and 210Hz for a female voice. Human languages are spoken with varying inflection and pitch; these variations convey additional meaning and emphasis. Thus, a highly animated voice, i.e., one that is heavily inflected, displays a high pitch range. This property specifies the range over which these variations occur, i.e., how much the fundamental frequency may deviate from the average pitch.
Values have the following meanings:
Specifies the height of "local peaks" in the intonation contour of a voice. For example, English is a stressed language, and different parts of a sentence are assigned primary, secondary, or tertiary stress. The value of 'stress' controls the amount of inflection that results from these stress markers. This property is a companion to the 'pitch-range' property and is provided to allow developers to exploit higher-end auditory displays.
Values have the following meanings:
Value: | <number> | inherit |
Initial: | 50 |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
Specifies the richness, or brightness, of the speaking voice. A rich voice will "carry" in a large room, a smooth voice will not. (The term "smooth" refers to how the wave form looks when drawn.)
Values have the following meanings:
An additional speech property, speak-header, is described in the chapter on tables
Value: | code | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
This property specifies how punctuation is spoken. Values have the following meanings:
Value: | digits | continuous | inherit |
Initial: | continuous |
Applies to: | all elements |
Inherited: | yes |
Percentages: | N/A |
Media: | aural |
This property controls how numerals are spoken. Values have the following meanings: