Changes to the Encoding Standard to support the Stream Standard

On Github:
Repository (open issues)

In 8.1, Interface TextDecoder:

In the IDL, add

  [SameObject] readonly attribute ReadableStream readable;
  [SameObject] readonly attribute WritableStream writable;

to the TextDecoder interface.

In the paragraph following the IDL, insert ", and transform." at the end of the sentence.

To the DOM intro note, add

decoder . readable

Returns a readable stream whose chunks are strings resulting from running encoding’s decoder on the chunks written to writable.

decoder . writable

Returns a writable stream which accepts BufferSource chunks and runs them through encoding’s decoder before making them available to readable.

Typically this will be used via the pipeThrough() method on a ReadableStream source.

var decoder = new TextDecoder(encoding);
byteReadable
  .pipeThrough(decoder)
  .pipeTo(textWritable);

If the error mode is "fatal" and encoding’s decoder returns error, both readable and writable will be errored with a TypeError.

To the TextDecoder(label, options) constructor steps, before the final step, add:

  1. Let decForTransform be a new TextDecoder object.

  2. Set decForTransform’s encoding to encoding.

  3. Set decForTransform’s error mode to dec’s error mode.

  4. Set decForTransform’s ignore BOM flag to dec’s ignore BOM flag.

  5. Set decForTransform’s decoder to a new decoder for decForTransform’s encoding, and set decForTransform’s stream to a new stream.

    For simplicity, dec and decForTransform have redundant members. However, the BOM seen flag, do not flush flag, and transform are unused in decForTransform, and encoding, ignore BOM flag and error mode are identical to dec. It is not necessary for implementations to duplicate these member fields.

  6. Let startAlgorithm be an algorithm that takes no arguments and returns nothing.

  7. Let transformAlgorithm be an algorithm which takes a chunk argument and runs the decode and enqueue a chunk algorithm with decForTransform and chunk.

  8. Let flushAlgorithm be an algorithm which takes no arguments and runs the flush and enqueue algorithm with decForTransform.

  9. Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, flushAlgorithm).

  10. Set dec’s transform to transform.

After the description of the ignoreBOM attribute, add:

The readable attribute’s getter must return transform.[[readable]].

The writable attribute’s getter must return transform.[[writable]].

After the definition of the decode() method, add:

The decode and enqueue a chunk algorithm, given a TextDecoder decForTransform and a chunk, runs these steps:

  1. If Type(chunk) is not Object, or chunk does not have an [[ArrayBufferData]] internal slot, or IsDetachedBuffer(chunk) is true, or IsSharedArrayBuffer(chunk) is true, then return a new promise rejected with a TypeError exception.

  2. Push a copy of chunk to decForTransform’s stream.

  3. Let controller be decForTransform’s transform.[[transformStreamController]].

  4. Let output be a new stream.

  5. While true, run these steps:

    1. Let token be the result of reading from decForTransform’s stream.

    2. If token is end-of-stream, then run these steps:

      1. Let outputChunk be output, serialized.

      2. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk).

      3. Return a new promise resolved with undefined.

    3. Let result be the result of processing token for decForTransform’s decoder, decForTransform’s stream, output, and decForTransform’s error mode.

    4. If result is error, then return a new promise rejected with a TypeError exception.

The flush and enqueue algorithm, which handles the end of data from the input ReadableStream, given a TextDecoder decForTransform, runs these steps:

  1. Let output be a new stream.

  2. Let result be the result of processing end-of-stream for decForTransform’s decoder and decForTransform’s stream, output, and decForTransform’s error mode.

  3. If result is finished, then run these steps:

    1. Let outputChunk be output, serialized.

    2. Let controller be decForTransform’s transform.[[transformStreamController]].

    3. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk).

    4. Return a new promise resolved with undefined.

  4. Otherwise, return a new promise rejected with a TypeError exception.

In 8.2 Interface TextEncoder, add:

In the IDL, add

  [SameObject] readonly attribute ReadableStream readable;
  [SameObject] readonly attribute WritableStream writable;

to the TextEncoder interface.

In the paragraph following the IDL, insert " and transform and pending high surrogate (initially null)" at the end of the sentence.

To the DOM intro note, add

encoder . readable

Returns a readable stream whose chunks are Uint8Arrays resulting from running UTF-8’s encoder on the chunks written to writable.

encoder . writable

Returns a writable stream which accepts string chunks and runs them through UTF-8’s encoder before making them available to readable.

Typically this will be used via the pipeThrough() method on a ReadableStream source.

textReadable
  .pipeThrough(new TextEncoder())
  .pipeTo(byteWritable);

To the TextEncoder() constructor steps, before the final step, add:

  1. Let encForTransform be a new TextEncoder object.

  2. Set encForTransform’s encoder to UTF-8’s encoder.

    For simplicity, enc and encForTransform have the same members. However, transform is not used by encForTransform and pending high surrogate is not used by enc. It is not necessary for implementations to store these unused member fields.

  3. Let startAlgorithm be an algorithm that takes no arguments and returns nothing.

  4. Let transformAlgorithm be an algorithm which takes a chunk argument and runs the encode and enqueue a chunk algorithm with encForTransform and chunk.

  5. Let flushAlgorithm be an algorithm which runs the encode and flush algorithm with encForTransform.

  6. Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, flushAlgorithm).

  7. Set enc’s transform to transform.

After the description of the encoding attribute, add:

The readable attribute’s getter must return transform.[[readable]].

The writable attribute’s getter must return transform.[[writable]].

After the definition of the encode() method, add:

The encode and enqueue a chunk algorithm, given a TextEncoder encForTransform and chunk, runs these steps:

  1. Let input be the result of converting chunk to a DOMString. If this throws an exception, then return a promise rejected with that exception.

  2. Convert input to a stream.

  3. Let output be a new stream.

  4. Let controller be encForTransform’s transform.[[transformStreamController]].

  5. While true, run these steps:

    1. Let token be the result of reading from input.

    2. If token is end-of-stream, then run these steps:

      1. Convert output into a byte sequence.

      2. Let chunk be a Uint8Array object wrapping an ArrayBuffer containing output.

      3. Call TransformStreamDefaultControllerEnqueue(controller, chunk).

      4. Return a new promise resolved with undefined.

    3. Let result be the result of executing the convert code unit to scalar value algorithm with encForTransform, token and input.

    4. If result is not continue, then process result for encoder, input, output.

The convert code unit to scalar value algorithm, given a TextEncoder encForTransform, token and input stream, runs these steps:

  1. If encForTransform’s pending high surrogate is non-null, then run these steps:

    1. Let high surrogate be encForTransform’s pending high surrogate.

    2. Set encForTransform’s pending high surrogate to null.

    3. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + (token − 0xDC00).

    4. Prepend token to input.

    5. Return U+FFFD.

  2. If token is in the range U+D800 to U+DBFF, inclusive, then set pending high surrogate to token and return continue.

  3. If token is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD.

  4. Return token.

This is equivalent to the "convert a JavaScript string into a scalar value string" algorithm from the Infra Standard, but allows for surrogate pairs that are split between strings.

The encode and flush algorithm, given a TextEncoder encForTransform, runs these steps:

  1. If encForTransform’s pending high surrogate is non-null, then run these steps:

    1. Let controller be encForTransform’s transform.[[transformStreamController]].

    2. Let output be the byte sequence 0xEF 0xBF 0xBD.

      This is the replacement character U+FFFD encoded as UTF-8.

    3. Let chunk be a Uint8Array object wrapping an ArrayBuffer containing output.

    4. Call TransformStreamDefaultControllerEnqueue(controller, chunk).

  2. Return a new promise resolved with undefined.