XHTML+Voice in Style

By Jonny Axelsson

July 27th 2011: Please note that Voice only works in Opera on Windows 2000/XP, and we no longer officially support it.

This article builds upon topics in the XHTML+Voice by Example article. A knowledge of CSS is also assumed.

Hello World! Revisited

In the Hello World! example it wasn't really the block element that did the talking:

<block>Hello World!</block>

There is an implicit 'prompt' element hidden inside 'block' element. In other words the code above is identical to this:

<block>
<prompt>Hello World!</prompt>
</block>

In X+V the 'prompt' element can have a 'src' attribute. That 'src' attribute can point to any element that is supposed to be spoken.

The following document has two different style sheets attached. By default the "Masculine" style sheet is used, but the user may elect to use the "Feminine style sheet instead". One way you can choose an alternate style sheet in Opera is to use the View > Style submenu. In this case you should see the "Masculine" and "Feminine" options at the bottom of the menu.

<!DOCTYPE html PUBLIC "-//VoiceXML Forum//DTD XHTML+Voice 1.2//EN"
"http://www.voicexml.org/specs/multimodal/x+v/12/dtd/xhtml+voice12.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      xmlns:ev="http://www.w3.org/2001/xml-events">
<head>
  <link rel="stylesheet" href="male.css" title="Masculine"/> [1]
  <link rel="alternate stylesheet" href="female.css" title="Feminine"/> [2]
  <form xmlns="http://www.w3.org/2001/vxml" id="speak">
    <block>
      <prompt xv:src="#greetings">I failed you.</prompt> [3]
    </block>
  </form>
</head>
<body>
  <p id="greetings">Hello world!</p> [4]
  <input type="button" value="Greet by gender" ev:event="click" ev:handler="#speak"/>
</body>
</html>

[1] Any style sheet link with a 'title' attribute (in this case "Masculine") is a preferred style sheet. Unlike regular style sheets (that have no 'title' attribute), preferred style sheets are turned off when an alternate style is selected. There can only be one set of preferred style sheets.
[2] Any style sheet link with a 'title' attribute and both "alternate" and "stylesheet" relations is an alternate style sheet. An alternate style sheet can be turned on by user interaction such as selecting from a menu. Only one set of alternate or preferred style sheets can be active at a time. There can be any number of alternate style sheets.
[3] The spoken text, 'prompt' element, is by reference (xv:src="#greetings"). The content ("I failed you.") is alternate text that will only be spoken if the #greetings element cannot be spoken for any reason. This is also what an VoiceXML or X+V browser before version 1.2 will speak.
[4] The #greetings element. It will be displayed on screen as well as spoken when the button is clicked. As any other HTML element it can be styled using CSS. With speech CSS you can style not only how this paragraph looks, but also how it sounds.

The "Masculine" style sheet looks like this, giving a male voice and a masculine cyan background color:

#greetings {voice-family: male; background: cyan;}

And the "Feminine" style sheet like this:

#greetings {voice-family: female; background: pink;}

Try this in action.

Look who's talking

There is much more to styling speech than male and female voices.

<!DOCTYPE html PUBLIC "-//VoiceXML Forum//DTD XHTML+Voice 1.2//EN"
"http://www.voicexml.org/specs/multimodal/x+v/12/dtd/xhtml+voice12.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:ev="http://www.w3.org/2001/xml-events"
	  xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
<head>
  <title>Example 2: Listen to me</title>
  <form xmlns="http://www.w3.org/2001/vxml" id="speak-now">
   <block><prompt xv:src="#spoken"/></block>
  </form>
  <style type="text/css">
  #spoken {background: #FFD}
  h1, h2, h3, h4 {  
   voice-family: child female; [1] 
   -xv-voice-volume: loud;   [2] 
	pause: 1s;
  }
  p {
     voice-family: male 1;   [3]
	 -xv-voice-volume: soft;
  }
  .surprise {
     pause-before: 1.5s;
     -xv-voice-volume: x-loud;
  }
  .disclaimer {
     -xv-voice-volume: x-soft; 
     -xv-voice-rate: 280;    [4]
  }
  </style>
</head>

<body>
  <h1>Example 2:  Listen to me</h1>
  <p>Either use an Opera command to speak the following 
  paragraphs <button ev:event="focus" ev:handler="#speak-now">or  [5]
  click this button</button></p>
<div id="spoken"> 
<h2>Opera releases voice browser</h2>
<p>Opera Software has released a browser to fulfil all 
your conversational needs. Like any good partner it can listen 
to you, ignore you, and talk back <span class="surprise">at 
the least expected moment.</span></p>
<h3 class="disclaimer">Disclaimer</h3>
<p class="disclaimer">Void when used in the vicinity of 
person or persons of human or inhuman origin.</p>
</div>
</body>
</html>

[1] This is a generic voice family ("age gender")
[2] voice-volume is a new property in CSS3 and uses -xv-voice-volume (with the -xv- prefix) in Opera, as CSS3 Speech is in a draft stage at the W3C.
[3] This is a numbered generic voice family. You will not know how "male 1" will sound like in a browser (apart from being male), but if there are two different male voices they will sound different.
[4] As the equivalent of small type is used 280 words per minute, which is very fast. This is also an example why the -xv- prefix is necessary. A more recent draft has changed this from a number (words per minute) to a percentage (of normal speed).
[5] This button uses a focus event instead of click. A mouseclick will provide a focus on the button, and so will a keyboard Tab

Try this in action.

Styling speech with CSS

Styling of speech was first specified with Aural CSS in CSS2, and then with the CSS3 Speech Module. Opera's speech support is based upon the latter.

Since the CSS3 Speech specification is in an early (Working Draft) stage, the properties must have a prefix to allow the final version to work differently from the current, experimental, one. All new Speech properties have an -xv- prefix, which will be dropped when CSS3 Speech becomes a candidate recommendation.

New speech properties in CSS3

CSS2 Aural properties also in CSS3 Speech

CSS2 Aural properties not in CSS3 Speech

These properties are effectively deprecated, though a couple may become a part of an expected CSS3 Audio Module. Broadly speaking all the functionality from CSS2 should be in either CSS3 Speech Module or CSS3 Audio Module, but likely under another name.

Be heard, not seen

Many browsers, including Opera, can show content in head. This is not what you want for voice forms, but is easy to fix:

head {display:none}

If you have normal, valid web pages, you can use this rule in your site-wide style sheet. There is rarely any need to display head content. When there is, you can just override it on those pages:

head {display:block}

Implementation notes

CSS3 Speech is in the working draft stage at the W3C. Consider CSS3 Speech to be experimental, any speech style sheet you make now you will have to update in the future.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.