W3C

CSS3 Text Layout Module

Editor's Draft

This version:
http://dev.w3.org/csswg/css3-text-layout/
Latest version:
http://www.w3.org/tr/css3-text-layout/
previous version:
http://www.w3.org/tr/2003/CR-css3-text-20030514/
Editor:
Paul Nelson (Microsoft)
previous Editor:
Michel Suignard (Microsoft)

Abstract

This module specifies the text layout model in CSS and the properties that control it. It covers bidirectionality, vertical text, grid layout, and other special inline text layout effects.

Status of this document

This document is a Working Draft. This draft has not been approved or endorsed by the W3C or the CSS Working Group in any way and you may not use it as a reference or cite it other than as a work in progress. It may be updated, replaced or rendered obsolete at any time by subsequent publications.

Contents

1. Text layout introduction

This section describes the text layout features supported by CSS, which includes support for various international writing directions, such as left-to-right (e.g., Latin scripts), right-to-left (e.g., Hebrew or Arabic), bidirectional (e.g., mixing Latin with Arabic) and vertical (e.g., Asian scripts). In addition, this section defines a set of properties that provide for a document grid layout system that is commonly used with East Asian text layout.

2. Text flow

Text flow is defined in terms of inline progression and block progression. Inline progression is the way elements flow on the line and defines on which end of the line the "start" and "end" are located. The 'direction' and 'unicode-bidi' properties determine the inline-progression and bidirectional layout. The 'block-progression' property determines the line progression, or way lines stack, in a block. The 'writing-mode' shorthand combines inline and block progression together.

Some examples of text flow are:

2.1. Inline progression: the 'direction' and 'unicode-bidi' properties

Conforming user agents that do not support bidirectional text may ignore the 'direction' and 'unicode-bidi' properties described in this section. This exception includes UAs that render right-to-left characters simply because a font on the system contains them but do not support the concept of right-to-left text direction.

The characters in certain scripts are written from right to left. In some documents, in particular those written with the Arabic or Hebrew script, and in some mixed-language contexts, text in a single (visually displayed) block may appear with mixed directionality. This phenomenon is called bidirectionality, or "bidi" for short.

The Unicode standard (Unicode Standard Annex #9) defines a complex algorithm for determining the proper directionality of text. The algorithm consists of an implicit part based on character properties, as well as explicit controls for embeddings and overrides. CSS relies on this algorithm to achieve proper bidirectional rendering. The 'direction' and 'unicode-bidi' properties allow authors to specify how the elements and attributes of a document language map to this algorithm.

User agents that support bidirectional text must apply the Unicode bidirectional algorithm to every sequence of inline boxes uninterrupted by a forced line break or block boundary. This sequence forms the "paragraph" unit in the bidirectional algorithm. The paragraph embedding level is set according to the value of the 'direction' property of the containing block rather than by the heuristic given in steps P2 and P3 of the Unicode algorithm.

Because the directionality of a text depends on the structure and semantics of the document language, these properties should in most cases be used only by designers of document type descriptions (Dtds), or authors of special documents. If a default style sheet specifies these properties, authors and users should not specify rules to override them.

The HTML 4 specification ([HTML4], section 8.2) defines bidirectionality behavior for HTML elements. The style sheet rules that would achieve the bidi behaviors specified in [[HTML4]] are given below. The HTML 4 specification also contains more information on bidirectionality issues.

*[dir="ltr"]    { direction: ltr; unicode-bidi: embed; }
*[dir="rtl"]    { direction: rtl; unicode-bidi: embed; }

bdo[dir="ltr"]  { direction: ltr; unicode-bidi: bidi-override; }
bdo[dir="rtl"]  { direction: rtl; unicode-bidi: bidi-override; }
  

Note: Because HTML UAs can turn off CSS styling, we advise HTML authors to use the HTML 'dir' property and <bdo> element to ensure correct bidirectional layout in the absence of a style sheet.

'direction'
Value:   ltr | rtl | inherit
Initial:   ltr
Applies to:   all elements, but see prose
Inherited:   yes
Percentages:   N/A
Media:   visual
Computed value:   as specified

This property specifies the inline progression of text and elements on a line, and the direction of embeddings and overrides (see 'unicode-bidi') for the Unicode bidirectional algorithm. In addition, it specifies the direction of table column layout, the direction of horizontal overflow, and the position of an incomplete last line in a block in case of 'text-align: justify'.

Values for this property have the following meanings:

ltr
Left-to-right direction.
rtl
Right-to-left direction.

For the 'direction' property to affect reordering in inline-level elements, the 'unicode-bidi' property's value must be 'embed' or 'override'.

Note. The 'direction' property, when specified for table column elements, is not inherited by cells in the column since columns are not the ancestors of the cells in the document tree. Thus, CSS cannot easily capture the "dir" attribute inheritance rules described in [[HTML4]], section 11.3.2.1.

'unicode-bidi'
Value:   normal | embed | bidi-override
Initial:   normal
Applies to:   all elements and generated content, but see
Inherited:   no
Percentages:   N/A
Media:   visual
Computed value:   specified (except for initial and inherit)

Values for this property have the following meanings:

normal
The element does not open an additional level of embedding with respect to the bidirectional algorithm. For inline-level elements, implicit reordering works across element boundaries.
embed
If the element is inline-level, this value opens an additional level of embedding with respect to the bidirectional algorithm. The direction of this embedding level is given by the 'direction' property. Inside the element, reordering is done implicitly. This corresponds to adding a LRE (U+202A; for 'direction: ltr') or RLE (U+202B; for 'direction: rtl') at the start of the element and a PDF (U+202C) at the end of the element.
bidi-override
For inline-level elements this creates an override. For block-level, table-cell, table-caption, or inline-block elements this creates an override for inline-level descendents not within another block-level, table-cell, table-caption, or inline-block element. This means that inside the element, reordering is strictly in sequence according to the 'direction' property; the implicit part of the bidirectional algorithm is ignored. This corresponds to adding a LRO (U+202D; for 'direction: ltr') or RLO (U+202E; for 'direction: rtl') at the start of the element and a PDF (U+202C) at the end of the element.

The final order of characters in each block-level element is the same as if the bidi control codes had been added as described above, markup had been stripped, and the resulting character sequence had been passed to an implementation of the Unicode bidirectional algorithm for plain text that produced the same line-breaks as the styled text. In this process, non-textual entities such as images are treated as neutral characters, unless their 'unicode-bidi' property has a value other than 'normal', in which case they are treated as strong characters in the 'direction' specified for the element.

Please note that in order to be able to flow inline boxes in a uniform direction (either entirely left-to-right or entirely right-to-left), more inline boxes (including anonymous inline boxes) may have to be created, and some inline boxes may have to be split up and reordered before flowing.

Because the Unicode algorithm has a limit of 61 levels of embedding, care should be taken not to use 'unicode-bidi' with a value other than 'normal' unless appropriate. In particular, a value of 'inherit' should be used with extreme caution. However, for elements that are, in general, intended to be displayed as blocks, a setting of 'unicode-bidi: embed' is preferred to keep the element together in case display is changed to inline (see example below).

The following example shows an XML document with bidirectional text. It illustrates an important design principle: Dtd designers should take bidi into account both in the language proper (elements and attributes) and in any accompanying style sheets. The style sheets should be designed so that bidi rules are separate from other style rules. The bidi rules should not be overridden by other style sheets so that the document language's or Dtd's bidi behavior is preserved.

In this example, lowercase letters stand for inherently left-to-right characters and uppercase letters represent inherently right-to-left characters. The text stream is shown in logical backing store order.


<HEBREW>
  <PAR>HEBREW1 HEBREW2 english3 HEBREW4 HEBREW5</PAR>
  <PAR>HEBREW6 <EMPH>HEBREW7</EMPH> HEBREW8</PAR>
</HEBREW>
<ENGLISH>
  <PAR>english9 english10 english11 HEBREW12 HEBREW13</PAR>
  <PAR>english14 english15 english16</PAR>
  <PAR>english17 <HE-QUO>HEBREW18 english19 HEBREW20</HE-QUO></PAR>
</ENGLISH>
    

Since this is XML, the style sheet is responsible for setting the writing direction. This is the style sheet:

/* Rules for bidi */
HEBREW, HE-QUO  {direction: rtl; unicode-bidi: embed;}
ENGLISH         {direction: ltr; unicode-bidi: embed;} 

/* Rules for presentation */
HEBREW, ENGLISH, PAR  {display: block;}
EMPH                  {font-weight: bold;}
    

The HEBREW element is a block with a right-to-left base direction, the ENGLISH element is a block with a left-to-right base direction. The PARs are blocks that inherit the base direction from their parents. Thus, the first two PARs are read starting at the top right, the final three are read starting at the top left. Please note that HEBREW and ENGLISH are chosen as element names for explicitness only; in general, element names should convey structure without reference to language.

The EMPH element is inline-level, and since its value for 'unicode-bidi' is 'normal' (the initial value), it has no effect on the ordering of the text. The HE-QUO element, on the other hand, creates an embedding.

The formatting of this text might look like this if the line length is long:

               5WERBEH 4WERBEH english3 2WERBEH 1WERBEH

                                8WERBEH 7WERBEH 6WERBEH

english9 english10 english11 13WERBEH 12WERBEH

english14 english15 english16

english17 20WERBEH english19 18WERBEH
    

Note that the HE-QUO embedding causes HEBREW18 to be to the right of english19.

If lines have to be broken, it might be more like this:

       2WERBEH 1WERBEH
  -EH 4WERBEH english3
                 5WERB

   -EH 7WERBEH 6WERBEH
                 8WERB

english9 english10 en-
glish11 12WERBEH
13WERBEH

english14 english15
english16

english17 18WERBEH
20WERBEH english19
    

Because HEBREW18 must be read before english19, it is on the line above english19. Just breaking the long line from the earlier formatting would not have worked. Note also that the first syllable from english19 might have fit on the previous line, but hyphenation of left-to-right words in a right-to-left context, and vice versa, is usually suppressed to avoid having to display a hyphen in the middle of a line.

2.2. Block progression: the 'block-progression' property

Name: block-progression
Value: tb | rl | lr
Initial: tb
Applies to: ??
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

Should we change the name to something more understandable? "block-flow"? "block-direction"? We need to make sure terminology for inline progression and block progression ("layout" vs "flow" vs "orientation" vs "direction", "horizontal" vs "vertical") is consistent throughout the document as well.)

This property sets the block-progression value and the layout orientation. Possible values:

tb
Top-to-bottom block flow. The layout orientation is horizontal.
rl
Right-to-left block flow. The layout orientation is vertical.
lr
Left-to-right block flow. The layout orientation is vertical.

An inline-level element that has a different 'block-progression' from its containing block becomes an 'inline-block' element [CSS3-box]. switch to CSS2.1

Details

Box layout in vertical orientations is exactly analogous to layout in the horizontal orientation.

When an element has a different 'block-progression' from its containing block two cases are possible:

These descriptions need work

Example

In the following example, two blocks elements (1 and 3) separated by an image (2) are presented in various flow orientations.

Here is a diagram of horizontal layout (block-progression: tb):

Diagram of horizontal layout: blocks 1, 2, and 3 are stacked top-to-bottom

Here is a diagram for the right-to-left vertical layout commonly used in East Asia (block-progression: rl):

Diagram of a right-to-left vertical layout: blocks 1, 2,
                    and 3 are arranged side by side from right to left

And finally, here is a diagram for the left-to-right vertical layout used for Uighur and Mongolian (block-progression: lr):

Diagram of left-to-right vertical layout: blocks 1, 2,
                   and 3 are arranged side by side from left to right

2.3. Text flow short cut: the 'writing-mode' property

Name: writing-mode
Value: lr-tb | rl-tb | tb-rl | bt-rl | tb-lr | bt-lr
Initial: not defined for shorthand properties
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: see individual properties

The 'writing-mode' property is a shorthand property for the 'direction' property and the 'block-progression' property. Although strictly speaking, the property has no initial value, it is equivalent to 'lr-tb'. The definition of the property values are established by the following table, which shows the setting of the constituent properties and example of common usage.

writing-mode: direction: block-progression: Common Usage:
lr-tb ltr tb Latin-based, Greek, Cyrillic writing systems (and many others)
rl-tb rtl tb Arabic, Hebrew writing systems
tb-rl ltr rl East Asian writing systems in vertical mode
bt-rl ltr rl Arabic script block quote embedded in East Asian vertical text
tb-lr ltr lr Mongolian script writing system
bt-lr rtl lr Arabic script block quote embedded in Mongolian script document

2.4. Mixing the 'writing-mode' in normal usage

In East Asian documents, it is often preferred to display certain Latin-based strings, such as numerals in a year, always in a horizontal layout orientation regardless of the flow orientation of the line of text these strings appear in, as in:

Diagram of Tate Naka Yoko, showing a group of glyphs
                     appearing horizontally in a vertical column of text Example of Tate Naka Yoko, showing the year 1996
                     appearing horizontally in a column of vertical text

Horizontal in vertical ("Tate-chu-yoko")

In Japanese, this effect is known as "Tate-chu-yoko". In order to achieve it in a CSS document, the Latin string should be enclosed in an inline element with a horizontal layout orientation, as in:

.date {block-progression: tb;}
<span class="date">1996</span>

3. Document grid

3.1. Document grid introduction

Documents written in East Asian languages, such as Chinese or Japanese, are commonly laid out on the page according to a specified one- or two-dimensional grid. The concept of grid can also be used with other non-ideographic contexts such as Braille script or monospaced layout.

The diagram below represents a fragment of horizontal text on a page with mixed wide-cell and narrow-cell glyphs that a Japanese content author intended to be laid out on a grid system that has 9 glyphs per line. Gray grid lines are shown for clarity):

Example of strict grid layout (genko) applied to mixed Japanese and English in horizontal layout.
'Genko' grid applied to mixed text

The grid behavior can be set on the inline-progression, on the block-progression or both. The grid on the block-progression dimension is determined by the following properties:

The block-progression grid is not described in this section as it can be achieved simply by using the properties mentioned above and described in the CSS3 Line module.

The grid on the inline-progression dimension is obtained by altering the glyphs advance width (or inline-progression value) of inline elements. There are several modes:

Two properties control this advance width modification: 'line-grid-mode' enables it and 'line-grid-progression' determines its value. The shorthand 'line-grid' allows setting both together.

3.2. Line grid mode: the 'line-grid-mode' property

Name: line-grid-mode
Value: none | ideograph | all
Initial: none
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

Specifies the line grid behavior. Each line grid mode value entails a different set of rules for rendering inline contents (the term 'horizontal' is used in the following description to indicate the inline-progression dimension). Possible values:

none

No line grid. Standard text alignments apply to the block element.

ideograph
Content is divided into units that we will call strips. Each strip is horizontally centered within the smallest number of grid spaces that contain the strip. The width of the grid space is determined by the 'line-grid-progression' setting.

Each grapheme cluster with a wide base character is a strip. Each grapheme cluster with a narrow kana character as its base is a strip. Each non-breakable object (e.g. an image) is a strip. Other grapheme clusters are treated as a single strip bounded by the strips described prior. That single strip may be decomposed in several strips if line breaking occurs within it.

The strips are arranged in the grid as follows:

Mixed glyph layout in strict grid

Mixed glyph layout in strict grid

Object layout in strict grid. Large rectangular object is
centered horizontally within 2 grid spaces

Object layout in strict grid

The 'ideograph' mode disables all special text justification and glyph width adjustment normally applied to the contents of the block element.

If a line break opportunity cannot be found in a text run going over the line boundary, then that text run will be pushed down to the next line and the last part of the previous line will be left blank.

Here is an example of mixed text in 'ideograph' grid mode:

Example of strict grid applied to mixed Japanese and English
text in horizontal layout

Strict grid applied to mixed text

all
This type of grid can be used to achieve mono-spaced layout. As with 'ideograph', content is divided into strips and each strip is horizontally centered within the smallest number of grid spaces that can contain the grid. The rules for determining strips differs.

Each grapheme cluster with a non-joining base character is a strip. Each non-breakable object (e.g. an image) is a strip. Each run of grapheme clusters with joining base characters that join to each other is a strip.

Layout in fixed grid mode. All glyphs equally spread out.

Mixed glyph layout in fixed grid

For example:

Example of fixed grid mode in mixed Japanese and English text in
horizontal layout

Fixed grid applied to mixed text

The 'letter-spacing' property does not apply to characters in a grid but does apply for all characters not in the grid (i.e. all characters for 'line-grid-mode: none', non-ideographs for 'line-grid-mode: ideograph' and all non-connected glyphs for 'line-grid-mode: all').

3.3. Line grid progression: the 'line-grid-progression' property

Name: line-grid-progression
Value: text-height | line-height | <length>
Initial: text-height
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: <length>

This property affects the inline-progression dimension of characters which are subject to the fixed advance width as determined by the 'line-grid-mode' property.

Possible values:

text-height
The computed value of the block element 'text-height' [CSS3-line] is used.
line-height
The computed value of the block element 'line-height' [CSS3-line] is used.
<length>
inline-progression dimension of the line grid's unit space.

For example:

div.section1 { line-grid-progression: .5in }

The rule set above would make grid spaces 0.5 inches long in a div element in the section1 class. If the element has horizontal flow, it would like the following (without the grid lines, which are shown for clarity).

Example of a line-grid-char setting applied to mixed Japanese
and English text in horizontal layout

Enlarged grid applied to mixed text in horizontal layout

If the element has vertical flow, then 0.5in is the vertical measure of each grid space:

Example of a line-grid-char setting applied to mixed Japanese
and English text in vertical-ideographic layout

Enlarged grid applied to mixed text in vertical-ideographic layout

3.4. Line grid: the 'line-grid' shorthand property

Name: line-grid
Value: <'line-grid-mode'> || <'line-grid-progression'>
Initial: not defined for shorthand properties
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: see individual properties

The 'line-grid' property is a shorthand property for setting 'line-grid-mode' and 'line-grid-progression'.

The following is an example of setting the grid in both progressions:

div.grid { line-height:20pt;
           text-height: max-size;
           line-stacking-strategy: grid-height;
           line-grid: ideograph line-height; }

This sets for the div element a grid with 20pt inline and block-progression dimensions. All ideographs will be set in cells sized in multiple of 20pt in both directions.