Encrypted Media Extensions v0.1

Draft Proposal

Editors:: David Dorwin, Google, Inc.; Adrian Bateman, Microsoft Corporation; Mark Watson, Netflix, Inc.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was submitted to the HTML Working Group as an Unofficial Draft. If you wish to make comments regarding this document, please send them to public-html@w3.org (subscribe, archives). You may send feedback to public-html-comments@w3.org (subscribe, archives) without joining the working group. All feedback is welcome.

Publication as a Unofficial Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Abstract

This proposal extends HTMLMediaElement to enable playback of protected content. The proposed API supports use cases ranging from simple clear key decryption to high value video (given an appropriate user agent implementation). License/key exchange is controlled by the application, facilitating the development of robust playback applications supporting a range of content decryption and protection technologies. No "DRM" is added to the HTML5 specification, and only simple clear key decryption is required as a common baseline.

1. Introduction

This section is non-normative.

This proposal allows JavaScript to select content protection mechanisms, control license/key exchange, and implement custom license management algorithms. It supports a wide range of use cases without requiring client-side modifications in each user agent for each use case. This also enables content providers to develop a single application solution for all devices. A generic stack implemented using the proposed APIs is shown below. This is just an example flow and is not intended to show all possible communication or uses.

A generic stack implemented using the proposed APIs

1.1 Goals

This section is non-normative.

This proposal was designed with the following goals in mind:

Support simple decryption without the need for DRM servers, etc.
Support a wide range of media containers and codecs.
Stream reusability - the actual encrypted content stream/file for a given container/codec should be identical regardless of the user agent and content decryption and protection mechanism.
Support a wide range of use cases.
Flexibility (and control) for applications and content providers without requiring client/user agent updates.
Minimize additions to HTMLMediaElement and new capabilities added to the user agent.
- Defer all information and algorithms about the content decryption and protection solution to the application/server and client content decryption module. The user agent should just pass information.
- The user agent should not be responsible for communication with license servers.
- The user agent should not select among content decryption and protection options. The application should make this decision.
- Note: Applications are already capable of everything required except secure decryption and decode.
Compatible with adaptive streaming.
Usability.

1.2. Definitions

1.2.1. Content Decryption Module (CDM)

This section is non-normative.

The Content Decryption Module (CDM) is a generic term for a part of or add-on to the user agent that provides functionality for one or more Key Systems. Implementations may or may not separate the implementations of CDMs and may or may not treat them as separate from the user agent. This is transparent to the API and application. A user agent may support one or more CDMs.

1.2.2. Key System

A Key System is a generic term for a decryption mechanism and/or content protection provider. Key System strings provide unique identification of a Key System. They used by the user agent to select the Content Decryption Modules and identify the source of a key-related event. Simple Decryption Key Systems are supported by all user agents. User agents may also provide additional CDMs with corresponding Key System strings.

Key System strings are always a reverse domain name. For example, "com.example.somesystem". Within a given system ("somesystem" in the example), subsystems may be defined as determined by the key system provider. For example, "com.example.somesystem.1" and "com.example.somesystem.1_5". Key system providers should keep in mind that these will be used for comparison and discovery, so they should be easy to compare and the structure should remain reasonably simple.

It may make sense to provide informal guidelines to avoid these diverging too much. There are probably best practices too. Should platform-specific or protection capability information be contained in these strings?)

If a user agent returns "maybe" or "probably" for any subsystem string, it must return "maybe" when a parent system string is passed to canPlayType(). For example, if a user agent returns "maybe" or "probably" for "com.example.somesystem.1_5", it must return "maybe" for "com.example.somesystem".

1.2.3. Session ID

A session ID is an optional string ID used to associate calls related to a key/license lifetime, starting with the request. It is a local binding between a request and key/license. It does not associate keys or licenses for different streams (i.e. audio and video). If supported by the Key System, it is generated by the user agent/CDM and provided to the application in the keymessage event. (Session IDs need not necessarily be supported by the underlying content protection client or server.)

Each successful call to generateKeyRequest() generates a new Session ID (returned in the keymessage event).

Applications should always provide the session ID from an event in subsequent calls for this key or license. (This is a best practice, even if the current Key System does not support session IDs.) This may mean that the application must associate a server response with the session ID and provide them both to addKey().

If Session IDs are supported, a new one will be created each time generateKeyRequest() is called. The user agent/CDM manage the lifetime of Session IDs. All Session IDs are cleared from the media element when a load occurs, although the CDM may retain them for longer.

NOTE: The key acquisition process (calling generateKeyRequest()/addKey()) may be executed multiple times for different sessions (each identified by a sessionId).

The current proposal does not support a mechanism to release keys. It is expected that the User Agent and CDM will release keys that are no longer needed as necessary to free resources. No use case for triggering this release from JavaScript has been identified.

1.2.4. Initialization Data

This section is non-normative.

Initialization Data is a generic term for container-specific data that is used by Content Decryption Modules to generate a key request. It should always allow unique identification of the key or keys needed to decrypt the content, possibly after being parsed by a CDM or server.

Key Systems usually require a block of initialization data containing information about the stream to be decrypted before they can construct a key request message. This block could be as simple as a key or content ID to send to a server or as complex as an opaque Key System-specific collection of data. This initialization information may be obtained in some application-specific way or may be stored with the media data. Container formats may provide for storage of such information, possibly for multiple Key Systems in a single media file.

Initialization data found in the media data is provided to the application in the initData attribute of the needkey event. This data has a container-specific format and is assumed to contain one or more generic or Key System-specific sets of initialization information.

Initialization Data - generic or containing information for the selected Key System - must be provided, in the same format, in the first media element method call that specifies a keySystem.

2. Media Element Extensions

We extend media element to allow decryption key acquisition to be handled in JavaScript. We also extend canPlayType() to provide basic information about the Key Systems supported by the user agent.

Note: For some CDMs, "key" and "key request" correspond to "license" and "license request", respectively.

partial interface HTMLMediaElement {
  // No API changes. 'type' string is extended.
  DOMString canPlayType(in DOMString type, in DOMstring? keySystem);

  void generateKeyRequest(in DOMString keySystem, in Uint8Array? initData);
  void addKey(in DOMString keySystem, in Uint8Array key, in Uint8Array? initData, in DOMString? sessionId);
  void cancelKeyRequest(in DOMString keySystem, in DOMString? sessionId);
};

partial interface HTMLSourceElement {
             attribute DOMString keySystem;
};

The canPlayType(type, keySystem) method is modified to add an optional second parameter to specify the Key System.

The following list shows some examples of how to use the keySystem parameter in canPlayType() calls.

Returns whether the Some System Key System is supported. Specific containers and codecs may or may not be supported with Some System.

video.canPlayType(null, "com.example.somesystem")

Returns whether version 1.5 of the Some System Key System is supported. Specific containers and codecs may or may not be supported with Some System 1.5.

video.canPlayType(null, "com.example.somesystem.1_5")

Returns whether the Some System Key System is present and supports the container and codec(s) specified by mimeType.

video.canPlayType(mimeType, "com.example.somesystem")

Returns whether the user agent supports Clear Key Simple Decryption of the container and codec(s) specified by mimeType.

video.canPlayType(mimeType, "org.w3.clearkey")

Returns whether the user agent supports the container and codec(s) specified by mimeType but not whether encrypted streams can be decrypted. This is no different from the current specification.

video.canPlayType(mimeType)

video.canPlayType(mimeType, null)

video.canPlayType(mimeType, "")

The canPlayType() method provides a simple capability detection mechanism for Key System capabilities. If multiple versions of a protection system exist with different capabilities, these can be allocated distinct identifiers by the owner of that Key System. This can be extended even to feature discovery, for example "com.example.somesystem.ignite" and "com.example.somesystem.explode" might identify features of the "com.example.somesystem" keysystem. It is an open question whether this usage is desirable or sufficient or whether more detailed capability detection mechanisms are needed.

In addition to the steps in the current specification, this method must run the following steps:

Check whether the Key System is supported with the specified container and codec type(s) by following the steps for the first matching condition from the following list:

If keySystem is null

Continue to the next step.

If keySystem contains an unrecognized or unsupported Key System

Return the empty string.

If the Key System specified by keySystem does not support decrypting the container and/or codec specified in the rest of the type string.

Return the empty string.
Return "maybe" or "probably" as appropriate per the existing specification of canPlayType().

The generateKeyRequest(keySystem, initData) method must run the following steps:

Note: The contents of initData are container-specific Initialization Data.

If the first argument is null, throw a SYNTAX_ERR.
If networkState is NETWORK_EMPTY, throw an INVALID_STATE_ERR.

In general, applications should wait for an event named needkey or loadstart (per the resource fetch algorithm) before calling this method.
Initialize handler by following the steps for the first matching condition from the following list:

If keySystem is one of the user agent's supported Key Systems

Let handler be the content decryption module corresponding to keySystem.

Otherwise

Throw a NOT_SUPPORTED_ERR.
Schedule a task to handle the call, providing initData.

The user agent will asynchronously execute the following steps in the task:
1. Load handler if necessary.
2. Let defaultUrl be null.
3. Use handler to generate a key request and follow the steps for the first matching condition from the following list:
  If a request is successfully generated
  1. Let key request be a key request generated by the CDM using initData, if provided.
    
    Note: handler must not use any data, including media data, not provided via initData.
  2. If initData is not null and contains a default URL for keySystem, let defaultUrl be that URL.
  Otherwise
  queue a task to fire a simple event named keyerror at the media element and abort the task.
  
  The event is of type MediaKeyErrorEvent and has:
  - keySystem = keySystem
    sessionId = null
    errorCode = the appropriate MediaKeyError code
    systemCode = a Key System-specific value, if provided, and 0 otherwise
4. Let sessionId be a unique Session ID string. It may be generated by handler.
5. queue a task to fire a simple event named keymessage at the media element
  
  The event is of type MediaKeyMessageEvent and has:
  - keySystem = keySystem
    sessionId = sessionId
    message = key request
    defaultUrl = defaultUrl
  Note: message may be a request for multiple keys, depending on the keySystem and/or initData. This is transparent to the application.

The addKey(keySystem, key, initData, sessionId) method must run the following steps:

Note: The contents of key are keySystem-specific. It may be a raw key or a license containing a key. The contents may also vary depending on the container, key length, etc.

Note: The contents of initData are container-specific Initialization Data and should be the same format as the same parameter in generateKeyRequest(). It may be null.

The proposal currently allows addKey() to be called without calling generateKeyRequest(). This has the advantages that simple use cases, especially for Clear Key Simple Decryption, are fairly straightforward and simple. The disadvantages are that user agents need to support multiple flows and applications written for the simple case are different than those written for the more general case. In addition, some container formats may not support the simple case (i.e. if initData is not easily-parsable to obtain a key ID).

It is an open question whether allowing the simple solutions is worth the effects. See this example for an illustration of the impact on simple applications.

It has been proposed that the initData parameter, which would most likely contain inforamation identifying the key or keys needed, be removed from addKey() because any association can be done within the CDM using sessionId. (However, see Session Correlation.) Such a change depends on requiring that generateKeyRequest() always be called before addKey(). Assuming that change is made, removing the parameter simplifies the API but hides all association between a key identifier and key. See this example for an illustration of the impact of this change.

If the first argument is null, throw a SYNTAX_ERR.
If networkState is NETWORK_EMPTY, throw an INVALID_STATE_ERR.

In general, applications should wait for an event named needkey or loadstart (per the resource fetch algorithm) before calling this method.
Initialize handler by following the steps for the first matching condition from the following list:

If keySystem is one of the user agent's supported Key Systems

Let handler be the content decryption module corresponding to keySystem.

Otherwise

Throw a NOT_SUPPORTED_ERR.
If sessionId is not null and is unrecognized, throw an INVALID_ACCESS_ERR.

Should this be handled here or in the task scheduled in the next step. The advantage of handling it here is that what is likely a programming error is immediately and simply reported via an exception. The disadvantage is that the user agent must store session IDs (and track when they are released) for each Key System rather than letting the CDM manage them. This is inconsistent with the goal that the user agent should just pass information. This same issue also applies to cancelKeyRequest().
Schedule a task to handle the call, providing key, initData, and sessionId.

The user agent will asynchronously execute the following steps in the task:
1. Load handler if necessary.
2. Let key stored be false.
3. Let next message be null.
4. Use handler to handle key.
  1. Process key.
  2. If key contains a key or license, store the key.
    1. Let key ID be null.
    2. If sessionId is not null and refers to a session with Initialization Data that contains a key ID, let key ID be that ID.
    3. If key is not null and contains a key ID, let key ID be that ID.
    4. If initData is not null and contains a key ID, let key ID be that ID.
    5. Store the key by following the steps for the first matching condition from the following list:
      If key ID is not null
      
      Clear any key not associated with a key ID.
      
      If a key already exists for key ID, delete that element.
      
      Store the key and/or license in key indexed by key ID. The replacement algorithm is Key System-dependent.
      
      Otherwise
      
      Clear all stored keys.
      
      Store the key and/or license in key with no associated key ID.
      At most one key may be stored if key IDs are not used.
      
      Clearing keys avoids needing to handle a mixture of keys with and without IDs in the Encrypted Block Encountered algorithm.
      
      Note: It is recommended that CDM providers support a standard and reasonably high minimum number of cached keys/licenses (with IDs) per media element as well as a standard replacement algorithm. This enables a reasonable number of key rotation algorithms to be implemented across user agents and may reduce the likelihood of playback interruptions in use cases that involve various streams in the same element (i.e. adaptive streams, various audio and video tracks) using different keys.
    6. Let key stored be true.
  3. If another message needs to be sent to the server, let next message be that message.
5. If key stored is true and the media element is waiting for a key, queue a task to attempt to resume playback.
  In other words, resume playback if the necessary key is provided.
6. Fire the appropriate event by following the steps for the first matching condition from the following list:
  If next message is null
  queue a task to fire a simple event named keyadded at the media element
  
  The event is of type MediaKeyCompleteEvent and has:
  - keySystem = keySystem
    sessionId = sessionId
  Otherwise
  queue a task to fire a simple event named keymessage at the media element
  
  The event is of type MediaKeyMessageEvent and has:
  - keySystem = keySystem
    sessionId = sessionId
    message = next message
    defaultUrl = null
    Is there a reason that this cannot be null?
  If any of the preceding steps in the task failed, queue a task to fire a simple event named keyerror at the media element.
  
  The event is of type MediaKeyErrorEvent and has:
  - keySystem = keySystem
    sessionId = sessionId
    errorCode = the appropriate MediaKeyError code
    systemCode = a Key System-specific value, if provided, and 0 otherwise

The key acquisition process may involve the web page handling keymessage events, sending the message to a Key System-specific service, and calling addKey with the response message. This continues until the keyadded event is fired. During the process, the web page may wish to cancel the acquisition process. For example, if the page cannot contact the license service because of network issues it may wish to fallback to an alternative key system. The page calls cancelKeyRequest() to cancel the a key acquisition and return the media element to a state where generateKeyRequest() may be called again.

The cancelKeyRequest(keySystem, sessionId) method must run the following steps:

If the first argument is null, throw a SYNTAX_ERR.
If sessionId is not null and is unrecognized or not mapped to the keySystem, throw an INVALID_ACCESS_ERR.
If a keyadded event has already been fired for this sessionId, throw an INVALID_STATE_ERR.
Clear any internal state associated with the sessionId (or if this is null with the keySystem for this media element). This sessionId will now be unrecognized.
Can this step be done synchronously or should a task be queued to do it in the background and a needkey event fired when done?
It is an open question what exactly should happen here. The state of the media element is unknown and it may not have even triggered the original generateKeyRequest() call. Should a needkey event be fired regardless of the state? What if the media element is not waiting for a key? Should the media element attempt to resume playback if it is waiting for a key, causing an event if appropriate? Should the application be responsible for calling generateKeyRequest() without an event?

The keySystem attribute of HTMLSourceElement specifies the Key System to be used with the media resource. The resource selection algorithm is modified to check the keySystem attribute after the existing step 5 of the Otherwise branch of step 6:

⌛ If candidate has a keySystem attribute whose value represents a Key System that the user agent knows it cannot use with type, then end the synchronous section, and jump down to the failed step below.

A media element is said to have a selected Key System when one of the following has occurred:

The media source was selected from a HTMLSourceElement.
In this case, the selected key system is the keySystem attribute of the selected HTMLSourceElement.
One of the new methods has been called successfully (asynchronous steps may not have completed)
In this case, the selected key system is the keySystem parameter for the last successful call.

2.1. Error Codes

MediaError is extended, and a new error type is added.

partial interface MediaError {
  const unsigned short MEDIA_ERR_ENCRYPTED = 5;
};

interface MediaKeyError {
  const unsigned short MEDIA_KEYERR_UNKNOWN = 1;
  const unsigned short MEDIA_KEYERR_CLIENT = 2;
  const unsigned short MEDIA_KEYERR_SERVICE = 3;
  const unsigned short MEDIA_KEYERR_OUTPUT = 4;
  const unsigned short MEDIA_KEYERR_HARDWARECHANGE = 5;
  const unsigned short MEDIA_KEYERR_DOMAIN = 6;
};

The code attribute of a MediaError may additionally return the following:

MEDIA_ERR_ENCRYPTED (numeric value 5)

The stream could not be played because it is encrypted and one of the following:

No key was provided and no needkey handler was provided
The provided key could not be successfully applied
The user agent does not support decryption of this media data

It has been suggested that there be a separate error for each of the above cases. This is an option if the community feels that being able to differentiate among them is worthwhile. A single error is consistent with the current broad error codes, though that may be something that should be improved in general. It seems that except for #1, which should only occur in applications that do not support encrypted media, these are all application bugs and not something that would improve the user experience. Any unique handling of the error codes by an application would essentially be describing a bug type. Unique codes might be helpful in tracking down the cause of the bug, but there are probably other options. It is also possible that some of these cases should be reported via MediaKeyErrorEventInit.

A MediaKeyError may be one of the following:

MEDIA_KEYERR_UNKNOWN (numeric value 1): An unspecified error occurred. This value is used for errors that don't match any of the following codes.
MEDIA_KEYERR_CLIENT (numeric value 2): The Key System could not be installed or updated.
Should this be two separate errors?
MEDIA_KEYERR_SERVICE (numeric value 3): The message passed into addKey indicated an error from the license service.
MEDIA_KEYERR_OUTPUT (numeric value 4): There is no available output device with the required characteristics for the content protection system.
MEDIA_KEYERR_HARDWARECHANGE (numeric value 5): A hardware configuration change caused a content protection error.
MEDIA_KEYERR_DOMAIN (numeric value 6): An error occurred in a multi-device domain licensing configuration. The most common error is a failure to join the domain.

3. Events

3.1. Event Definitions

[Constructor(DOMString type, optional MediaKeyNeededEventInit eventInitDict)]
interface MediaKeyNeededEvent : Event {
  readonly attribute DOMString? keySystem;
  readonly attribute DOMString? sessionId;
  readonly attribute Uint8Array? initData;
};

dictionary MediaKeyNeededEventInit : EventInit {
  DOMString? keySystem;
  DOMString? sessionId;
  Uint8Array? initData;
};

[Constructor(DOMString type, optional MediaKeyMessageEventInit eventInitDict)]
interface MediaKeyMessageEvent : Event {
  readonly attribute DOMString keySystem;
  readonly attribute DOMString? sessionId;
  readonly attribute Uint8Array message;
  readonly attribute DOMString? defaultUrl;
};

dictionary MediaKeyMessageEventInit : EventInit {
  DOMString keySystem;
  DOMString? sessionId;
  Uint8Array message;
  DOMString? defaultUrl;
};

[Constructor(DOMString type, optional MediaKeyCompleteEventInit eventInitDict)]
interface MediaKeyCompleteEvent : Event {
  readonly attribute DOMString keySystem;
  readonly attribute DOMString? sessionId;
};

dictionary MediaKeyCompleteEventInit : EventInit {
  DOMString keySystem;
  DOMString? sessionId;
};

[Constructor(DOMString type, optional MediaKeyErrorEventInit eventInitDict)]
interface MediaKeyErrorEvent : Event {
  readonly attribute DOMString keySystem;
  readonly attribute DOMString? sessionId;
  readonly attribute MediaKeyError errorCode;
  readonly attribute unsigned short systemCode;
};

dictionary MediaKeyErrorEventInit : EventInit {
  DOMString keySystem;
  DOMString? sessionId;
  MediaKeyError errorCode;
  unsigned short systemCode;
};

event . keySystem: Returns the name of the Key System that generated the event.
event . sessionId: Returns the Session ID the event is related to, if applicable.
event . initData: Returns the Initialization Data related to the event.
event . message: Returns the message (i.e. key request) to send.
event . defaultUrl: Returns the default key exchange URL.
event . errorCode: Returns the MediaKeyError for the error that occurred.
event . systemCode: Returns a Key System-dependent status code for the error that occurred.

The keySystem attribute is an identifier for the Key System that generated the event. It may be null in the needkey event if the media element does not have a selected Key System.

The sessionId attribute is the Session ID for the key or license that this event refers to. It may be null.

The initData attribute contains Initialization Data specific to the event.

The message attribute contains a message from the CDM. Messages are Key System-specific. In most cases, it should be sent to a key server.

The defaultUrl is the default URL to send the key request to as provided by the media data. It may be null.

The errorCode attribute contains the MediaKeyError code for the error that occurred.

The systemCode attribute contains a Key System-dependent status code for the error that occurred. This allows a more granular status to be returned than the more general errorCode. It should be 0 if there is no associated status code or such status codes are not supported by the Key System.

If a response (i.e. a license) is necessary, applications should use one of the new methods to provide the response.

3.2. Event Summary

Event name	Interface	Dispatched when...	Preconditions
`keyadded`	`MediaKeyCompleteEvent`	A key has been added as the result of a `addKey()` call.
`keyerror`	`MediaKeyErrorEvent`	An error occurs in one of the new methods or CDM.
`keymessage`	`MediaKeyMessageEvent`	A message has been generated (and likely needs to be sent to a key server). For example, a key request has been generated as the result of a `generateKeyRequest()` call or another message must be sent in response to an `addKey()` call.
`needkey`	`MediaKeyNeededEvent`	The user agent needs a key or license to begin or continue playback. It may have encountered media data that may/does require decryption to load or play OR need a new key/license to continue playback.	`readyState` is equal to `HAVE_METADATA` or greater. It is possible that the element is playing or has played.

It has been proposed that needkey be a simple event. In this case, it would not provide any indication of the key that is needed and the application would need to call generateKeyRequest() to get an appropriate message or identifier, including for the Clear Key case. Such a change assumes that the consistent flow option is selected. See this example for an illustration of the impact of this change.

(Because sessionId is not included in needkey and is not generated until generateKeyRequest() generates a keymessage event, this cange would not result in the loss of any correlation. See Session Correlation for a discussion of the general lack of correlation.)

4. Key Release

4.1. Introduction

This section is non-normative.

The above sections provide for delivery of key/license information to a Content Decryption Module. This section provides for management of the entire key/license lifecycle, that is, secure proof of key release. Use cases for such proof include any service where is it necessary for the service to know, reliably, which granted keys/licences are still available for use by the user and which have been deleted. Examples include a service with restrictions on the number of concurrent streams available to a user or a service where content is available on a rental basis, for use offline.

Secure proof of key release must necessarily involve the CDM due to the relative ease with which scripts may be modified. The CDM must provide a message asserting, in a CDM-specific form, that a specific key or license has been destroyed. Such messages must be cached in the CDM until acknowledgement of their delivery to the service has been received. This acknowledgement must also be in the form of a CDM-specific message.

The mechanism for secure proof of key release operates outside the scope of any media element. This is because proof-of-release messages may be cached in CDMs after the associated media elements have been destroyed. Proof-of-key-release messages are subject to the same origin policy: they shall only be delivered to scripts with the same origin as the script which created the media element that provided the key/license.

4.2. Key Release Manager

The following interface is defined for management of key release messages:

    [Constructor()]
    interface KeyReleaseManager : EventTarget {
        void getKeyReleases(in DOMString keySystem);
        void addKeyReleaseCommit(in DOMString keySystem, in DOMString sessionId, in Uint8Array message);
    }

The getKeyReleases(keysystem) method must run the following steps:

If the first argument is null, throw a SYNTAX_ERR.
Initialize handler by following the steps for the first matching condition from the following list:

If keysystem is one of the user agent's supported Key Systems

Let handler be the content decryption module corresponding to keySystem.

Otherwise

Throw a NOT_SUPPORTED_ERR.
Schedule a task to handle the call.

The user agent will asynchronously execute the following steps in the task:
1. Load handler if necessary.
2. Use handler to generate one or more key release messages, if supported. handler should follow the steps for the first matching condition from the following list:
  
  If generating a key release message is not supported
  
  Let key release messages be null
  
  Otherwise
  
  Let key release messages be a set of key release messages generated by the CDM for the current origin.
3. For each key release message in key release messages, queue a task to fire a simple event named keyrelease at the key release manager.
  
  The event is of type MediaKeyMessageEvent and has:
  - keySystem = keySystem
    sessionId = the sessionId originally associated with the provision of the key
    message = key release message
    defaultUrl = value of the default URL, if stored by the CDM.

The addKeyReleaseCommit(keysystem, sessionId, message) method must run the following steps:

If the first argument is null, throw a SYNTAX_ERR.
Initialize handler by following the steps for the first matching condition from the following list:

If keysystem is one of the user agent's supported Key Systems

Let handler be the content decryption module corresponding to keySystem.

Otherwise

Throw a NOT_SUPPORTED_ERR.
Schedule a task to handle the call, providing sessionId and message.

The user agent will asynchronously execute the following steps in the task:
1. Load handler if necessary.
2. Use handler to commit the message. handler should follow the steps for the first matching condition from the following list:
  If committing a key release message is supported and the message is valid:
  queue a task to fire a simple event named keyreleasecommitted at the key release manager.
  
  The event is of type MediaKeyCompleteEvent and has:
  - keySystem = keySystem
    sessionId = sessionId
  Otherwise
  queue a task to fire a simple event named keyerror at the key release manager.
  
  The event is of type MediaKeyErrorEvent and has:
  - keySystem = keySystem
    sessionId = sessionId
    errorCode = the appropriate MediaKeyError code
    systemCode = a Key System-specific value, if provided, and 0 otherwise

5. Algorithms

5.1. Encrypted Block Encountered

The following steps are run when the media element encounters a block (i.e. frame) of encrypted media data during the resource fetch algorithm:

Let key system be null.
Let handler be null.
Let block initData be null.
Let block key be null.
If the block (or its parent entity) has Initialization Data, let block initData be that initialization data.
Select the key system and handler by following the steps for the first matching condition from the following list:
If the media element has a selected Key System
Run the following steps:
1. Let key system be the selected Key System.
2. Let handler be the content decryption module corresponding to key system.
Otherwise

Jump to the Key Presence step below.
Load handler if necessary.
Use handler to select the key:
1. Let block key ID be null.
2. If block initData is not null and contains a key ID, let block key ID be that ID.
3. Select the key by following the steps for the first matching condition from the following list:
  
  If block key ID is not null
  
  Select the key by using handler to follow the steps for the first matching condition from the following list:
  
  If handler has a key cached for block key ID
  
  Let block key be the matching cached key.
  
  If handler has a key cached with no ID (there can be one at most)
  
  Let block key be the single cached key.
  
  Otherwise (handler has no keys cached OR has one or more keys cached, none of which have a matching key ID)
  
  Jump to the Key Presence step below.
  
  Otherwise
  
  Select the key by using handler to follow the steps for the first matching condition from the following list:
  
  If handler has a single key cached (with or without a key ID)
  
  Let block key be the single cached key.
  
  If handler has more than one key cached (all would have IDs)
  
  Abort media element's resource fetch algorithm and run the steps to report a MEDIA_ERR_ENCRYPTED error.
  
  Otherwise
  
  Jump to the Key Presence step below.
Key Presence: Handle the presence of a key by following the steps for the first matching condition from the following list:
If handler is not null and block key is not null.

Use handler to Decrypt the block using block key by following the steps for the first matching condition from the following list:

If decryption fails

Abort media element's resource fetch algorithm and run the steps to report a MEDIA_ERR_ENCRYPTED error.

Otherwise

Continue.

Note: Not all decryption problems (i.e. using the wrong key) will result in a decryption failure. In such cases, no error is fired here but one may be fired during decode.

If there is an event handler for needkey
queue a task to fire a simple event named needkey at the media element.
The event is of type MediaKeyNeededEvent and has:
- keySystem = key system
  sessionId = null
  initData = block initData
The media element is said to be potentially playing unless playback stops because the stream cannot be decrypted, in which case the media element is said to be waiting for a key.
Otherwise

Abort media element's resource fetch algorithm and run the steps to report a MEDIA_ERR_ENCRYPTED error.

For frame-based encryption, this may be implemented as follows when the media element attempts to decode a frame as part of the resource fetch algorithm:

Let encrypted be false.
Detect whether the frame is encrypted.

If the frame is encrypted

Run the steps above.

Otherwise

Continue.
Decode the frame.
Provide the frame for rendering.

The following paragraph is added to Playing the media resource.

A media element is said to be waiting for a key when it would be potentially playing but the user agent has reached a point in the media resource that must be decrypted for the resource to continue and the CDM does not have the necessary key.
The media element leaves this state when seeking but could re-enter it if the same conditions exist.

5.2. Potentially Encrypted Stream Encountered

The following steps are run when the media element encounters a source that may contain encrypted blocks or streams during the resource fetch algorithm:

Let key system be null.
Let handler be null.
Let initData be null.
If Initialization Data was encountered, let initData be that initialization data.
Select the key system and handler by following the steps for the first matching condition from the following list:
If the media element has a selected Key System
Run the following steps:
1. Let key system be the selected Key System.
2. Let handler be the content decryption module corresponding to key system.
Otherwise

Jump to the Need Key step below.
Load handler if necessary.
Use handler to determine whether the key is known:
1. Let key ID be null.
2. If a key ID for the source is known at this time, let key ID be that ID.
3. If initData is not null and contains a key ID, let key ID be that ID.
4. Determine whether the key is already known by following the steps for the first matching condition from the following list:
  
  If key ID is not null
  
  Determine whether the key is known by following the steps for the first matching condition from the following list:
  
  If there is a key cached for key ID
  
  Jump to the Continue Normal Flow step below.
  
  Otherwise
  
  Jump to the Need Key step below.
  
  Otherwise
  
  Determine whether the key is known by following the steps for the first matching condition from the following list:
  
  If there is a single key cached (with or without a key ID)
  
  Jump to the Continue Normal Flow step below.
  
  Otherwise
  
  Jump to the Need Key step below.
Need Key: queue a task to fire a simple event named needkey at the media element.

The event is of type MediaKeyNeededEvent and has:
- keySystem = key system
  sessionId = null
  initData = initData
Firing this event allows the application to begin acquiring the key process before it is needed.

This could be skipped if the key has already been set, but always sending the event seems easier.

Note that readyState is not changed and no algorithms are aborted. This algorithm is merely informative.
Continue Normal Flow: Continue with the existing media element's resource fetch algorithm.

5.3. Addition to Media Element Load Algorithm

The following step is added to the existing media element load algorithm:

Clear all cached keys for this media element.

This also means the keys will be cleared when the src attribute is set or changed per Location of the media resource

6. Simple Decryption

All user agents must support the simple decryption capabilities described in this section regardless of whether they support a more advanced CDM. This ensures that there is a common baseline level of protection that is guaranteed to be supported in all user agents, including those that are entirely open source. Thus, content providers that need only basic protection can build simple applications that will work on all platforms without needing to work with any content protection providers.

6.1. Clear Key

The "org.w3.clearkey" Key System indicates a plain-text clear (unencrypted) key will be used to decrypt the source. No additional client-side content protection is required. Use of this Key System is described below.

The keySystem parameter and keySystem attributes are always "org.w3.clearkey" with the exception of events before the Key System has been selected. All events except needkey have a valid sessionId string, which is numerical.

The initData attribute of the needkey event and the initData parameters of generateKeyRequest() and addKey() are the same container-specific Initialization Data format and values. If supported, these values should provide some type of identification of the content or key that could be used to look up the key (since there is no defined logic for parsing it). For containers that support a simple key ID, it should be a binary array containing the raw key ID. For other containers, it may be some other opaque blob or null.

generateKeyRequest() may optionally be called. The resulting MediaKeyMessageEvent has:

keySystem = "org.w3.clearkey"
sessionId = a unique numerical string
message = a container-specific unique key identifier extracted from the initData parameter (if initData was and null one could not be extracted; otherwise null)
defaultUrl = value of the default URL if present in the media data and null otherwise.

To provide a key using this Key System, pass the following to addKey():

keySystem: "org.w3.clearkey"
key: An array of bytes containing the key
initData:

If generateKeyRequest() was called:

The value of the message attribute of the resulting MediaKeyMessageEvent

Otherwise

Initialization Data corresponding to the key or null.
sessionId:

If generateKeyRequest() was called:

The value of the sessionId attribute of the resulting MediaKeyMessageEvent

Otherwise

null

7. Examples

This section and its subsections are non-normative.

This section contains example solutions for various use cases using the proposed extensions. These are not the only solutions to these use cases. Video elements are used in the examples, but the same would apply to all media elements. In some cases, such as using synchronous XHR, the examples are simplified to keep the focus on the extensions.

7.1. Source and Key Known at Page Load (Clear Key Encryption)

In this simple example, the source file and clear-text key are hard-coded in the page.

This example is very simple because it does not care when the key has been added or associating that event with the addKey() call. It also does not handle errors.

<script>
  function load() {
    var video = document.getElementById("video");
    var key = new Uint8Array([ 0xaa, 0xbb, 0xcc, ... ]);
    video.addKey("org.w3.clearkey", key, null);
  }
</script>

<body onload="load()">
  <video src="foo.webm" autoplay id="video"></video>
</body>

The solution below shows what the simple solution above would become if we choose to require a consistent flow for all applications. In this scenario, the serial solution above becomes the event-based solution shown below. The next example also illustrates the impact.

<script>
  function load() {
    var video = document.getElementById("video");

    video.generateKeyRequest("org.w3.clearkey", null);
  }

  function handleMessage(event) {
    if (event.keySystem != "org.w3.clearkey")
      throw "Unhandled keySystem in event";
    var video = event.target;

    var key = new Uint8Array([ 0xaa, 0xbb, 0xcc, ... ]);
    video.addKey("org.w3.clearkey", key, null, event.sessionId);
  }
</script>

<body onload="load()">
  <video src="foo.webm" autoplay id="video" onkeymessage="handleMessage(event)"></video>
</body>

7.2. Source Known but Key Not Known at Page Load

In this case, the Initialization Data is contained in the media data. If this was not the case, handleKeyNeeded() could obtain and provide it instead of getting it from the event.

If any asynchronous operation is required to get the key in handleKeyNeeded(), it could be called a second time if the stream is detected as potentially encrypted before an encrypted block/frame is encountered. In this case, applications may want to handle subsequent calls specially to avoid redundant license requests. This is not shown in the examples below.

7.2.1. Clear Key Encryption

This solution uses the Clear Key Simple Decryption.

As with the previous example, this one is very simple because it does not care when the key has been added or handle errors.

<script>
  function handleKeyNeeded(event) {
    if (event.keySystem && event.keySystem != "org.w3.clearkey")
      throw "Unhandled keySystem in event";
    var initData = event.initData;
    var video = event.target;

    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", "http://.../getkey", false);
    xmlhttp.send(initData);
    var key = new Uint8Array(xmlhttp.response);
    video.addKey("org.w3.clearkey", key, initData, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)"></video>

The solution below shows what the solution above would become if we choose to require a consistent flow, make needkey a simple event, and removed from the data parameter from addKey().

<script>
  function handleKeyNeeded(event) {
    if (event.keySystem && event.keySystem != "org.w3.clearkey")
      throw "Unhandled keySystem in event";
    var video = event.target;

    // Note: The CDM will generate a request for whatever Initialization Data it chooses since there is no association with the current event. 
    video.generateKeyRequest("org.w3.clearkey", null);
  }
  
  function handleMessage(event) {
    if (event.keySystem != "org.w3.clearkey")
      throw "Unhandled keySystem in event";
    var message = event.message;
    var video = event.target;

    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", "http://.../getkey", false);
    xmlhttp.send(message);
    var key = new Uint8Array(xmlhttp.response);
    // Note: The CDM will find the Initialization Data based on the sessionId.
    video.addKey("org.w3.clearkey", key, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="handleMessage(event)"></video>

Some differences of note:

The example is longer and involves an additional event handler. (This is primarily due to requiring applications to use a consistent flow.)
There is no association between the needkey event and the method calls since Session ID is not created until generateKeyRequest() completes. This is a general problem that the first solution only works around. See Session Correlation.
The Initialization Data is handled behind the scenes by the CDM and never exposed to the application.
The example is much closer to the Other Content Decryption Module version below.

7.2.2. Other Content Decryption Module

This solution uses more advanced decryption from a fictitious content decryption module called Some System.

<script>
  function handleKeyNeeded(event) {
    if (event.keySystem && event.keySystem != "com.example.somesystem.1_0")
      throw "Unhandled keySystem in event";
    var initData = event.initData;
    var video = event.target;

    video.generateKeyRequest("com.example.somesystem.1_0", initData);
  }

  function licenseRequestReady(event) {
    if (event.keySystem != "com.example.somesystem.1_0")
      throw "Unhandled keySystem in event";
    var request = event.message;
    if (!request)
      throw "Could not create license request";

    var video = event.target;

    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", "http://.../getkey", false);
    xmlhttp.send(request);
    var license = new Uint8Array(xmlhttp.response);
    video.addKey("com.example.somesystem.1_0", license, null, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="licenseRequestReady(event)"></video>

7.3. Selecting a Supported Key System

Below is an example of detecting supported Key System using canPlayType() and selecting one.

<script>
  var keySystem;
  var licenseUrl;

  function selectKeySystem(video) {
    if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem") != “”) {
      licenseUrl = “https://license.example.com/getkey”; // OR “https://example.<My Video Site domain>”
      if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem.2_0") != “”) {
        keySystem = “com.example.somesystem.2_0”;
      } else if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem.1_5") != “”) {
        keySystem = “com.example.somesystem.1_5”;
      }
    } else if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "foobar") != “” {
      licenseUrl = “https://license.foobar.com/request”;
      keySystem = “foobar”;
    } else {
      throw “Key System not supported”;
    }
  }

  function handleKeyNeeded(event) {
    var targetKeySystem = event.keySystem;  
    if (targetKeySystem == null) {
      selectKeySystem(video);  // See previous example for implementation.
      targetKeySystem = keySystem;
    }
    var initData = event.initData;
    var video = event.target;

    video.generateKeyRequest(targetKeySystem, initData);
  }

  function licenseRequestReady(event) {
    if (event.keySystem != keySystem)
      throw "Message from unexpected Key System";
    var request = event.message;
    if (!request)
      throw "Could not create license request";

    var video = event.target;
    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", licenseUrl, false);
    xmlhttp.send(request);
    var license = new Uint8Array(xmlhttp.response);
    video.addKey(keySystem, license, null, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="licenseRequestReady(event)"></video>

7.4. Using All Events

This is a more complete example showing all events being used along with asynchronous XHR.

Note that handleKeyMessage could be called multiple times, including in response to the addKey() call if multiple round trips are required and for any other reason the Key System might need to send a message.

<script>
  var keySystem;
  var licenseUrl;

  function handleMessageResponse() {
    var license = new Uint8Array(xmlhttp.response);
    var video = document.getElementById(“video”);
    video.addKey(keySystem, license, null, this.sessionId);
  }
  
  function sendMessage(message, sessionId) {
    xmlhttp = new XMLHttpRequest();
    xmlhttp.sessionId = sessionId;
    xmlhttp.onreadystatechange = handleMessageResponse;
    xmlhttp.open("POST", licenseUrl, true);
    xmlhttp.send(message);
  }

  function handleKeyNeeded(event) {
    var targetKeySystem = event.keySystem;  
    if (targetKeySystem == null) {
      selectKeySystem(video);  // See previous example for implementation.
      targetKeySystem = keySystem;
    }
    var initData = event.initData;
    var video = event.target;

    video.generateKeyRequest(targetKeySystem, initData);
  }

  function handleKeyMessage(event) {
    if (event.keySystem != keySystem)
      throw "Message from unexpected Key System";
    var message = event.message;
    if (!message)
      throw "Invalid key message";
  
    sendMessage(message, event.sessionId);
  }

  function handleKeyComplete(event) {
    // Do some bookkeeping with event.sessionId if necessary.
  }

  function handleKeyError(event) {
    // Report event.errorCode and do some bookkeeping with event.sessionId if necessary.
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="handleKeyMessage(event)" onkeyadded="handleKeyComplete(event)" onkeyerror="handleKeyError(event)"></video>

8. FAQ

This section and its subsections are non-normative.

8.1. Use Cases

What use cases does this support?

Everything from user-generated content to be shared with family (user is not an adversary) to online radio to feature-length movies.

Is this proposal compatible with adaptive streaming?

Yes, this proposal is compatible with both "Type 1" and "Type 3" adaptive streaming modes as defined by the W3C Web & TV Interest Group.

If adaptive streaming is handled within the user agent (Type 1), the adaptive implementation can expose the events and methods needed to provide key(s) for the streams via the APIs in this proposal.
If adaptive streaming is handled in the application/JavaScript (Type 3), behavior should be very similar to the non-adaptive case. For example, the proposed MediaSource Extensions allow the application to provide a dynamic src, which could be encrypted and handled just like a normal stream.

Is key rotation supported?

Yes.

Can I encrypt captions / <track> elements?

No, this proposal only supports decrypting audio and video that are part of the media data.

Can I let the user agent select the appropriate CDM using `<source>` elements?

Yes, using the keySystem attribute of the HTMLSourceElement. When used with type attribute, this will select the first <source> element (container, codec, and Key System) that the user agent might support. The selected CDM will not be reported to the application until an event is fired.

Is a heartbeat supported?

Yes.

Heartbeat is a mode of operation where the Content Decryption Module requires to receive an explicit heartbeat message from its server on a regular basis, otherwise decryption is blocked. This enables use-cases requiring strict online control of access to the content. Heartbeat must be supported by the CDM and is implemented in this model by supplying an expiration time or valid duration in the license provided to the CDM. Before expiry of this license, the CDM must trigger a new message exchange to obtain an updated license.

It is an open question whether CDMs should

Use keymessage to continue the current session
OR start a new message exchange procedes in exactly the same way as the initial message exchange, with the exception that the Key Systemand Session ID are known when the needley event is sent.

The latter option may impact the MediaKeyNeededEvent definition. See the related issue.

8.2. Use

Can I send a token for the signed-in user with the license request?

Yes. The application can add this to the license request (sent via XMLHttpRequest in the examples) or send it to the CDM via generateKeyRequest() to be included in the license request.

How do I resume playback after receiving the `needkey` event in the Encrypted Block Encountered algorithm?

Assuming there are no other issues, playback will resume when the needed key is provided by addKey() and processed.

Can an application use multiple content protection providers / Key Systems?

Yes, this will likely be necessary to support all or a majority of user agents. An application could also use different Key Systems on a single user agent for different purposes.

How do I add support for a CDM to my application?

We envision CDM providers creating JavaScript libraries that application developers can include. canPlayType() can then be used to select from supported libraries.

How do I determine if the UA supports specific capabilities for a given provider?

This is vendor-/Key System-specific.

Obtaining this information could take time and is open-ended, so it is not appropriate for canPlayType(). There is also no way for canPlayType() to attest to capabilities anyway. Some basic Key System feature detection may be available via canPlayType().

How should an application handle a `needkey` event with a null `keySystem` attribute?

This is a very common scenario because it happens when the user agent encounters encrypted media and does not have an appropriate key. If the application does not already know which Key System to use, it should use canPlayType() to select an appropriate one. When the keySystem attribute is null, the initData attribute is always independent of the Key System.

What is a license URL (`licenseUrl`) in the examples?

This is the URL for a server capable of providing the key for the stream, usually using the Initialization Data and often after verifying the requesting user. The URL is application- and/or Key System-specific and may be a content provider or a Key System provider depending on the solution.

This is too complex and hard to use.

That's not a question, but we'll try to address it anyway. As shown in the examples, the basic use cases are reasonably simple and only require a little setup to get the key and provide it to the user agent. We believe most small content sites can add basic protection to their applications with minimal efforts.

The more complex cases, such as fast time to first frame and various license management algorithms, require more complex code, but professional-strength content protection is complex and that is to be expected. Professional-strength content protection requires server components and working with one or more content protection vendors, so this isn’t really any more complex. In fact, if you implement a few solutions, it will work on any browser-based platform, avoiding the need for per-platform solutions on both the server and client. The fixed set of interfaces may even lead to more consistent patterns and behavior across various solutions. It is generally the large content providers that have more complex requirements, and we believe they will have the appropriate resources to implement applications that meet their requirements.

Providers of content decryption modules will need to provide detailed specifications for actions and events to guide content providers in designing the algorithms in their applications. They can also provide a JavaScript libraries for their solution that can be integrated into any application. An application would then basically select a solution and delegate a lot of the work to the appropriate library.

8.3. API

How is the decryption algorithm specified?

This is container specific. A container may standardize on a specific algorithm (i.e. AES-128) and/or allow it to be specified. The user agent must know and/or detect the appropriate algorithm to use with the key provided by this API.

What are the advantages of doing license/key exchange in the application?

Advantages include:

Simple Decryption works in the same way as more advanced solutions and without additional APIs.
The user agent is not responsible for deciding which decryption mechanism to use.
The application has full control (i.e. of deciding what streams to offer) and does not need to rely on errors from (or a detailed API to) the CDM.
The application can manage its own license protocol, authentication refresh, key rotation etc. without relying on changes to or specific use cases being implemented in each user agent's CDM (as long as the appropriate primitives are available in the CDM).
Error handling can be tailored to the application without needing to expose status information and failure conditions from the CDM through the API.
The content provider can decide whether and what to proxy to the (potentially third-party) license server without client modifications.
Reduces the complexity and size of the CDM.

Why does `canPlayType()` need to be modified? Why doesn't it provide more information?

The modifications allow applications to detect whether the user agent is capable of supporting the application's encrypted content (at any level of protection) and to allow the application to branch to the appropriate code and/or select a CDM library.

At the same time, we do not want to put too much burden on canPlayType() and it must remain a synchronous method that can be processed from static data. See the related question.

Why does `canPlayType()` need a second parameter? Why not just add Key System to the `type` parameter string (or MIME type)?

This could have gone either way, and the behavior of both existing user agents and those that implement these extensions would be the same. (Existing user agents ignore it in both cases.) The main reason for using a separate parameter is that the Key System is not part of the MIME type (see the related question), and the type parameter is generally used interchangeably with the MIME type. Separating the Key System from the MIME type should avoid confusion.

The downside is that the same type string cannot be used for both canPlayType() and the source element's type attribute. Instead, the Key System is passed as a second parameter to canPlayType() and as a separate attribute to the source element.

Will I be informed if a call to one of the new methods fails?

Errors that occur during synchronous portion of the algorithms will be thrown. For asynchronous portions (i.e. when a task is scheduled), a MediaKeyErrorEvent will be fired.

Why isn't the Key System part of the MIME type (like codecs)?

In many cases (especially the direction the content providers and standards are moving), the stream is not specific to any one Key System or provider. Multiple Key Systems could be used to decrypt the same generic stream. Thus, the Key System is not information about the file and should not be part of the MIME type.

One could argue that the encryption algorithm (e.g. AES-128) and configuration should be in the MIME type. That is not required for this proposal, so it is not addressed here.

Why do we need another event?

While many use case could be implemented without an additional event (by requiring the app to provide all the information up front), some use cases may be better handled by an event.

Why does the event need multiple attributes?

The keySystem attribute ensures that the application knows which CDM caused the event so it can know how to handle the event. While the application could probably know or discover this in other ways, this makes it simple for the application.

Why do we need a new `MediaError` code?

Without a new error code (MEDIA_ERR_ENCRYPTED), it is not possible for user agents to clearly indicate to an application that playback failed because the content was encrypted and user agents will likely need to fire a MEDIA_ERR_DECODE or MEDIA_ERR_SRC_NOT_SUPPORTED, which would be confusing.

Will adding a new error code to `MediaError` break existing applications?

Applications that are not aware of the new error code (MEDIA_ERR_ENCRYPTED) may not correctly handle it, but they should still be able to detect that an error has occurred because a) an error event is fired and b) media . error is not null.

Why do we need a new error type (`MediaKeyError`) and event (`MediaKeyErrorEvent`)?

While key/license exchange errors are fatal to the exchange session, most are not fatal to playback. This is especially true if the media element already has a key for the current (and future) frames or, for example, the exchange was for a different stream in an adaptive streaming scenario. The separation allows the media element to continue playback while the application attempts to resolve the exchange problem or until the requested key/license is actually needed.

What happens if a response to the `needkey` event from a encountering a potentially encrypted stream is not received before encountering an encrypted block?

The Encrypted Block Encountered algorithm will proceed as normal. If no appropriate key has been provided, a second needkey event will be fired and decoding will stop.

The same `needkey` event with the same attributes is fired for both Encrypted Block Encountered and Potentially Encrypted Stream Encountered. How can an application distinguish between the two?

The same event was used intentionally to reduce the complexity of applications. Ideally, they would not need to know.

What if a different [supported] Key System is passed to one of the new methods in subsequent calls to the same `HTMLMediaElement`?

(Expanding on the question, this relates to the new methods, including generateKeyRequest() and addKey(), that modify state and does not apply to canPlayType(), which is explicitly intended to be called with multiple Key System strings. For example, what if generateKeyRequest() is called with one Key System then addKey() is called with another; or if addKey() is called twice with two different Key Systems.)

If a load occurs between calls with different Key Systems, then there is no problem.

Otherwise, the calls will be treated separately. generateKeyRequest() will start a new session with a new Session ID. addKey() will behave as normal unless sessionId parameter is not null and is unrecognized for the specified keySystem parameter.

What if a key/license for the same Initialization Data (i.e. key ID) is provided more than once to `addKey()`?

Replace it, updating the ordering to reflect that this key ID was most recently added. In other words, simply replacing the existing key data is not sufficient. The exact algorithm is covered in addKey().

8.4. Source, Containers, and Streams

What containers and codecs are supported?

Containers and codecs are not specified. A user agent may support decryption of whichever container and codec combination(s) it wishes.

If a user agent support decryption of a container/codec combination (as reported by canPlayType()), it must also support Simple Decryption of that combination.

What if a container/codec does not support indicating the stream or a frame/block is encrypted?

The application must use addKey() to indicate the stream is encrypted and provide the key before decoding starts.

Must the container provide Initialization Data or a content key ID?

This is ideal, but the API would also support the application sending the Initialization Data or ID directly to the server or providing it to the CDM via generateKeyRequest().

What if a container/codec does not support key IDs or bit(s)?

The application will need to use some other mechanism to select the appropriate key for the content. The user agent will only be able to use one key at a time. Key rotation will be much more complex or impossible.

Can I use different keys for each stream (adaptive streaming)?

Yes, though you may want to consider the complexity and performance drawbacks. For the best user experience, you will want to provide keys for the streams to the user agent before the switch.

What elements of the source are encrypted?

This depends on the container/codec being used. This proposal should support all cases, including entirely encrypted streams, individual frames encrypted separately, groups of frames encrypted, and portions of frames encrypted. If not all blocks or frames are encrypted, the user agent should be able to easily detect this, either based on an indication in the container or the block/frame.

Must all blocks/frames in a stream be encrypted?

No, subject to container/codec limitations.

What cipher and parameters should be used for decryption?

The cipher and parameters should be implicit in or specified by the container. If some are optional, the application must know what is supported by the CDM.

What cipher and parameters should be used for Simple Decryption? Which must the user agent support?

As in the above question, these are either implicit in or specified by the container. User agents must support any default or baseline ciphers and parameters in the container specification. Practically, user agents should support all ciphers and parameters commonly used with the container.

8.5. Content and Key Protection

Can I ensure the content key is protected without working with a content protection provider?

No. Protecting the content key would require that the browser's media stack have some secret that cannot easily be obtained. This is the type of thing DRM solutions provide. Establishing a standard mechanism to support this is beyond the scope of HTML5 standards and should be deferred to specific user agent solutions. In addition, it is not something that fully open source browsers could natively support.

Content protected using this proposal without a content protection provider is still more secure and a higher barrier than providing an unencrypted file over HTTP or HTTPS. We would also argue that it is no less secure than encrypted HLS. For long streams, key rotation can provide additional protection.

It is also possible to extend the proposed specification in the future to support a more robust basic case without changing the API.

Can a user agent support multiple content protection providers?

Yes. The application will query the user agent's capabilities and select the Key System to use.

Can a user agent protect the compressed content?

Yes, this proposal naturally supports such protection.

Can a user agent protect the rendering path or protect the uncompressed content after decoding?

Yes, a user agent could use platform-specific capabilities to protect the rendering path.

9. Open Issues

This section and its subsections are non-normative.

This section describes some open issues on which comments are requested.

9.1. API Encapsulation

It has been suggested that only a single key manager attribute be added to the HTMLMediaElement itself in order to improve encapsulation. For example:

partial interface HTMLMediaElement {
    attribute MediaKeyManager keymanager;
};

interface MediaKeyManager {
  void generateKeyRequest(in DOMString keySystem, in Uint8Array? initData);
  void addKey(in DOMString keySystem, in Uint8Array key, in Uint8Array? initData, in DOMString? sessionId);
  void cancelKeyRequest(in DOMString keySystem, in DOMString? sessionId);
};

9.2. Object-Oriented API Design

A variant of the API with the same functionality has been suggested in which key exchange 'sessions' are explicitly represented as objects. The methods used to supply a key/license or cancel the session become methods on this object, not the HTMLMediaElement itself.

partial interface HTMLMediaElement {
    MediaKeySession generateKeyRequest(in DOMString keySystem, in Uint8Array? initData);
};

interface MediaKeySession : EventTarget {
    readonly attribute DOMString keySystem;
    readonly attribute DOMString? sessionId;
    
    void addKey(in Uint8Array key);
    void cancel();
};

The following event would be fired at the MediaKeySession when a message is ready to be sent.

[Constructor(DOMString type, optional MediaKeyMessageEventInit eventInitDict)]
interface MediaKeyMessageEvent : Event {
    readonly attribute Uint8Array message;
    readonly attribute DOMString? defaultUrl;
};

dictionary MediaKeyMessageEventInit : EventInit  {
    Uint8Array message;
    DOMString? defaultUrl;
};

Note that in the MediaKeySession interface, sessionId is guaranteed to be initialized only after the first MediaKeyMessageEvent.

The following event would be fired at the MediaKeySession when the transaction is complete. (It could be replaced by a simple event.)

[Constructor(DOMString type)]
interface MediaKeyCompleteEvent : Event {
};

Finally, the following event would be fired at MediaKeySession if getKeyRequest() or addKey() results in an error.

[Constructor(DOMString type, optional MediaKeyErrorEventInit eventInitDict)]
interface MediaKeyErrorEvent : Event {
    readonly attribute MediaKeyError errorCode;
    readonly attribute unsigned short systemCode;
};

dictionary MediaKeyErrorEventInit : EventInit  {
    MediaKeyError errorCode;
    unsigned short systemCode;
};

9.3. Session Correlation

The current API design allows for multiple parallel key requests to be in flight. Each call to generateKeyRequest() begins a message exchange resulting ultimately in a keyadded or keyerror event. The first keymessage event may contain a Session ID identifying the session. This session ID is later used to enable correlation between messages conveyed in keymessage and responses added in addKey.

However, the current design does not support correlation between specific generateKeyRequest() calls (and the needkey event that might have triggered it) and subsequent sessions. If a page knows it needs two keys, it can call generateKeyRequest() twice but there is no way to know which keymessage or keyerror results from each call. This might be particularly important for the error case. Modifications to the API such as those described in Object-Oriented API Design could address this issue.

9.4. Working with MediaController

HTML5 defines a MediaController that is used to coordinate playback of multiple media elements. The current proposal does not support a scenario where a single key is required for multiple media elements coordinated through a single MediaController. One way to solve this would be to create a new interface that provides the Media Element Extensions and then provide an instance of this interface on both the HTMLMediaElement and on the MediaController interfaces. The changes outlined in section Object-Oriented API Design might be modified to support this approach.

9.5. Multiple Keys in a Stream

It is possible that a stream may encounter a different key for a given stream after a key request session as been completed. How this should be handled is not explicitly described; it may be up to the Key System and/or application but that might lead to confusion and inconsistencies.

One option is to fire a keymessage to be sent to the server, which would return a new license to provide via addKey(). The same Session ID would be used because generateKeyRequest() is not called again. Note that this means a keymessage even can occur after a keyadded event for the same session.

Another option is to fire a needkey event and follow the same steps as for the first key. In this case, the application should call generateKeyRequest() to generate the message. This would result in the generation of a new Session ID, which is consistent with the first key.

If we select the first option, MediaKeyNeededEvent, the type of the needkey event can be simplified because it would never be called with a known keySystem or sessionId. If we select the second option, keySystem should almost certainly be retained on MediaKeyNeededEvent and sessionId probably should be retained.

This decision should account for other use cases, such as heartbeat. For heartbeat and any other CDM-originated message that isn't requesting a new key, it probably makes sense to use the same Session ID and provide the request directly via a keymessage event. This is essentially the first option above. Selecting the second option for multiple keys does not necessarily mean that heartbeat cannot work differently.

10. Revision History

Version	Comment
0.1	Initial Proposal

Encrypted Media Extensions v0.1

Draft Proposal

Status of this Document

Abstract

Table of Contents

1. Introduction

1.1 Goals

1.2. Definitions

1.2.1. Content Decryption Module (CDM)

1.2.2. Key System

1.2.3. Session ID

1.2.4. Initialization Data

2. Media Element Extensions

2.1. Error Codes

3. Events

3.1. Event Definitions

3.2. Event Summary

4. Key Release

4.1. Introduction

4.2. Key Release Manager

5. Algorithms

5.1. Encrypted Block Encountered

5.2. Potentially Encrypted Stream Encountered

5.3. Addition to Media Element Load Algorithm

6. Simple Decryption

6.1. Clear Key

7. Examples

7.1. Source and Key Known at Page Load (Clear Key Encryption)

7.2. Source Known but Key Not Known at Page Load

7.2.1. Clear Key Encryption

7.2.2. Other Content Decryption Module

7.3. Selecting a Supported Key System

7.4. Using All Events

8. FAQ

8.1. Use Cases

What use cases does this support?

Is this proposal compatible with adaptive streaming?

Is key rotation supported?

Can I encrypt captions / <track> elements?

Can I let the user agent select the appropriate CDM using <source> elements?

Is a heartbeat supported?

8.2. Use

Can I send a token for the signed-in user with the license request?

How do I resume playback after receiving the needkey event in the Encrypted Block Encountered algorithm?

Can an application use multiple content protection providers / Key Systems?

How do I add support for a CDM to my application?

How do I determine if the UA supports specific capabilities for a given provider?

How should an application handle a needkey event with a null keySystem attribute?

What is a license URL (licenseUrl) in the examples?

This is too complex and hard to use.

8.3. API

How is the decryption algorithm specified?

What are the advantages of doing license/key exchange in the application?

Why does canPlayType() need to be modified? Why doesn't it provide more information?

Why does canPlayType() need a second parameter? Why not just add Key System to the type parameter string (or MIME type)?

Will I be informed if a call to one of the new methods fails?

Why isn't the Key System part of the MIME type (like codecs)?

Why do we need another event?

Why does the event need multiple attributes?

Why do we need a new MediaError code?

Will adding a new error code to MediaError break existing applications?

Why do we need a new error type (MediaKeyError) and event (MediaKeyErrorEvent)?

What happens if a response to the needkey event from a encountering a potentially encrypted stream is not received before encountering an encrypted block?

The same needkey event with the same attributes is fired for both Encrypted Block Encountered and Potentially Encrypted Stream Encountered. How can an application distinguish between the two?

What if a different [supported] Key System is passed to one of the new methods in subsequent calls to the same HTMLMediaElement?

What if a key/license for the same Initialization Data (i.e. key ID) is provided more than once to addKey()?

8.4. Source, Containers, and Streams

What containers and codecs are supported?

What if a container/codec does not support indicating the stream or a frame/block is encrypted?

Must the container provide Initialization Data or a content key ID?

What if a container/codec does not support key IDs or bit(s)?

Can I use different keys for each stream (adaptive streaming)?

What elements of the source are encrypted?

Must all blocks/frames in a stream be encrypted?

What cipher and parameters should be used for decryption?

What cipher and parameters should be used for Simple Decryption? Which must the user agent support?

8.5. Content and Key Protection

Can I ensure the content key is protected without working with a content protection provider?

Can a user agent support multiple content protection providers?

Can a user agent protect the compressed content?

Can I let the user agent select the appropriate CDM using `<source>` elements?

How do I resume playback after receiving the `needkey` event in the Encrypted Block Encountered algorithm?

How should an application handle a `needkey` event with a null `keySystem` attribute?

What is a license URL (`licenseUrl`) in the examples?

Why does `canPlayType()` need to be modified? Why doesn't it provide more information?

Why does `canPlayType()` need a second parameter? Why not just add Key System to the `type` parameter string (or MIME type)?

Why do we need a new `MediaError` code?

Will adding a new error code to `MediaError` break existing applications?

Why do we need a new error type (`MediaKeyError`) and event (`MediaKeyErrorEvent`)?

What happens if a response to the `needkey` event from a encountering a potentially encrypted stream is not received before encountering an encrypted block?

The same `needkey` event with the same attributes is fired for both Encrypted Block Encountered and Potentially Encrypted Stream Encountered. How can an application distinguish between the two?

What if a different [supported] Key System is passed to one of the new methods in subsequent calls to the same `HTMLMediaElement`?

What if a key/license for the same Initialization Data (i.e. key ID) is provided more than once to `addKey()`?