Supplementing text with multi-modal content

This draft is out of date, there may be new research and new technologies that are relevant to this topic.

Introduction

Textual content can be made easier to understand when delivered in different modes to help people with cognitive disabilities. These modes can include the addition of:

speech, via text-to-speech (TTS);
video;
contextually-relevant images;
consistent icons and graphics; and / or
text replaced or augmented by symbol sets.

Challenges of text for people with cognitive disabilities

Difficulty of text comprehension by people with cognitive disabilities ranges from minimal to extreme. They may comprehend most of a web page's textual content, or none at all. These can impact people with impairments of:

memory;
executive function;
attention;
language;
literacy;
perception processing;
knowledge.

Memory

People with cognitive disabilities may have to:

read text several times to aid comprehension;
repeat aloud or otherwise reiterate text multiple times to retain it; and/or
return to sections when difficulties mean they cannot retain information because content has taken too long to read.

Issues with working memory may affect ability to multi-task so multi-modal approach needs to be used judiciously with user choice depending on the tasks in hand and the setting.

“Good cues for individuals without EMI (episodic memory impairment) can be more subtle and less central to the experience, whereas good cues for those with memory impairment need to cover the important highlights of the experience so that they can re-learn and re-construct the forgotten experience […] Individuals with EMI are more easily cognitively overloaded, which leads to a need for systems to present a smaller number of only the most powerful cues.”

Executive function

People with cognitive disabilities may not:

sufficiently process / understand text as they read it;
understand text because they did not understand the text that preceded it; and/or
be able to plan which text to read next despite clues such as headings or numbering.

Attention-related limitations

People with cognitive disabilities:

may not attend to important concepts and relevant details; and/or
may be significantly distracted by extraneous text.

Language-related functions

People with cognitive disabilities may not understand text because they:

are unable to sound out letters or words;
are confused by text written in their language, but written with vocabulary from a different culture;
are stymied by too-complex text written in their native language; and/or
may have comprehension problems exacerbated by text or instructions presented in a non-native language.

Literacy-related functions

Some people with cognitive disabilities may not:

cope with sounds or syllables that comprise words;
understand text because it is not literal and written plainly; and/or
comprehend text-only instructions in order to adequately follow them.

Perception-processing limitations

Many people with cognitive disabilities may not:

comprehend text that can't be enlarged without distortion;
recognize characters if they do not form words, or are shown in different fonts or styles, e.g., italics.

Reduced knowledge

Some people with cognitive disabilities may not comprehend text because:

they do not have relevant background knowledge; and/or
background concepts are not explained simply.

Use cases for multi-modal content

Other use cases include:

Jumping to the relevant part of content. This is typically not supported, making content less usable.
Finding pieces in the content once focus is lost.
Going back a step when something was not understood.
Going back and forth between where a term was explained and the content of focus.

Ways to enhance text with multi-modal content

Text is written communication.

Textual content can be provided in a variety of alternative modes / formats as described below. Ideally, people with cognitive disabilities should be able to choose that content is delivered in the mode they comprehend best. This is an important component of the proposed Global Public Inclusive Infrastructure.

Text-to-speech

Text-to-speech (TTS) is hardware and/or software that produces human speech by a device such as a computer. Most TTS reads text aloud in a synthesized voice. Other TTS converts symbols, such as those employed by augmentative and alternative communication (AAC), into spoken speech.

Many people with cognitive disabilities, such as Dyslexia, may have the capacity to use a screen reader for TTS. However, people with severe cognitive disabilities, such as intellectual disabilities, may require simpler TTS delivery.

A common example is a TTS widget embedded in a website. An alternative is a CSS speech module, as proposed by the W3C. Advantages include that there is nothing to download and install; and learning how to use a TTS widget or a CSS speech module is dramatically simpler than learning how to use a screen reader.

The TTS should be limited to relevant content, and exclude such text as found in menus, footers, and advertisements. Another helpful feature is the visual highlighting of text as it is read aloud. Such features may help people with cognitive disabilities who are overwhelmed even by simple TTS delivery.

Video

Video is a short film clip of moving visual images with or without audio.

To aid comprehension, video with audio should be captioned and/or have audio description, which provides important information not described or spoken in the main sound track. For example, see "Autistic spectrum, captions and audio description".

Further, video and audio should be navigable, such as:

Having the content structured such that it is clearly identified or signposted (e.g., with a slide that says "step two - remove the old washer" or "step three - put on the new washer")
The structure is navigable (e.g., a person can jump directly to step two)
Keywords are identified, and can be jumped to directly
Enabling bookmarks and annotations (that can be navigated)

WCAG 2.0 Success Criterion References:

1.2.2 Captions (pre-recorded): Captions are provided for all pre-recorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)
1.2.5 Audio Description (pre-recorded): Audio description is provided for all pre-recorded video content in synchronized media. (Level AA)
1.2.7 Extended Audio Description (pre-recorded): Where pauses in foreground audio are insufficient to allow audio descriptions to convey the sense of the video, extended audio description is provided for all pre-recorded video content in synchronized media. (Level AAA)

Supplement with contextually-relevant images

An image is a picture, a representation of a visual perception.

User research has shown that text comprehension is significantly enhanced where accompanied by contextually-relevant images. A picture of an object may be easier to recognize than a textual description of it.

Diagrams and charts as visual representations could be helpful for textual descriptions of processes or flows. Employing HTML Canvas, as proposed by the W3C, diagrams and charts could be interactive and have additional descriptions for their parts to aid comprehension.

Supplement with consistent icons and graphics

An icon is a small image or drawing that commonly represents a function. A graphic is a drawing of a visual perception or an abstract concept, or is otherwise a representation of an object or an idea.

Text accompanied by consistent iconography helps convey meaning, such as by associating discrete textual passages with each other. Similarly, a pie-chart graphic may help convey meaning easier to comprehend than a table of statistics.

What is "consistent" in this context?

Replace or augment by symbol sets

A symbol is a sign that represents or suggests an idea, an object, an action, or a belief.

Symbol sets can be used for augmentative and alternative communication (AAC) to support people with cognitive disabilities who have severe speech and/or language difficulties. This can include those who may understand speech, but who are unable to express what they wish to say, perhaps because of a physical disability. (It is common for people with cognitive disabilities to also have physical disabilities.) Ideally, interoperable symbol sets could be used to replace or to augment web-based text.

Ease-of-use ideas

Text should be written clearly and simply using the following attributes:

plain-language standards relevant to language and culture;
- (Examples for English include:
- literal explanations, e.g., without jargon, slang, and metaphors;
- active voice, not passive voice; and
- no or minimal use of acronyms and abbreviations.)
visual and organizational structures, e.g., headings and bulleted lists;
large font size; and
sans-serif font

Plain language and clear structure will help comprehension of text-to-speech users.