Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was submitted to the HTML Working Group as an Unofficial Draft. If you wish to make comments regarding this document, please send them to public-html@w3.org (subscribe, archives). You may send feedback to public-html-comments@w3.org (subscribe, archives) without joining the working group. All feedback is welcome.
Publication as a Unofficial Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This proposal extends HTMLMediaElement to enable playback of protected content. The proposed API supports use cases ranging from simple clear key decryption to high value video (given an appropriate user agent implementation). License/key exchange is controlled by the application, facilitating the development of robust playback applications supporting a range of content decryption and protection technologies. No "DRM" is added to the HTML5 specification, and only simple clear key decryption is required as a common baseline.
This section is non-normative.
This proposal allows JavaScript to select content protection mechanisms, control license/key exchange, and implement custom license management algorithms. It supports a wide range of use cases without requiring client-side modifications in each user agent for each use case. This also enables content providers to develop a single application solution for all devices. A generic stack implemented using the proposed APIs is shown below. This is just an example flow and is not intended to show all possible communication or uses.
This section is non-normative.
This proposal was designed with the following goals in mind:
This section is non-normative.
The Content Decryption Module (CDM) is a generic term for a part of or add-on to the user agent that provides functionality for one or more Key Systems. Implementations may or may not separate the implementations of CDMs and may or may not treat them as separate from the user agent. This is transparent to the API and application. A user agent may support one or more CDMs.
A Key System is a generic term for a decryption mechanism and/or content protection provider. Key System strings provide unique identification of a Key System. They used by the user agent to select the Content Decryption Modules and identify the source of a key-related event. Simple Decryption Key Systems are supported by all user agents. User agents may also provide additional CDMs with corresponding Key System strings.
Key System strings are always a reverse domain name. For example, "com.example.somesystem". Within a given system ("somesystem" in the example), subsystems may be defined as determined by the key system provider. For example, "com.example.somesystem.1" and "com.example.somesystem.1_5". Key system providers should keep in mind that these will be used for comparison and discovery, so they should be easy to compare and the structure should remain reasonably simple.
It may make sense to provide informal guidelines to avoid these diverging too much. There are probably best practices too. Should platform-specific or protection capability information be contained in these strings?)
If a user agent returns "maybe" or "probably" for any subsystem string, it must return "maybe" when a parent system string is passed to canPlayType()
.
For example, if a user agent returns "maybe" or "probably" for "com.example.somesystem.1_5", it must return "maybe" for "com.example.somesystem".
A session ID is an optional string ID used to associate calls related to a key/license lifetime, starting with the request.
It is a local binding between a request and key/license.
It does not associate keys or licenses for different streams (i.e. audio and video).
If supported by the Key System, it is generated by the user agent/CDM and provided to the application in the keymessage
event.
(Session IDs need not necessarily be supported by the underlying content protection client or server.)
Each successful call to generateKeyRequest()
generates a new Session ID (returned in the keymessage
event).
Applications should always provide the session ID from an event in subsequent calls for this key or license.
(This is a best practice, even if the current Key System does not support session IDs.)
This may mean that the application must associate a server response with the session ID and provide them both to addKey()
.
If Session IDs are supported, a new one will be created each time generateKeyRequest()
is called.
The user agent/CDM manage the lifetime of Session IDs.
All Session IDs are cleared from the media element when a load occurs, although
the CDM may retain them for longer.
NOTE: The key acquisition process (calling generateKeyRequest()
/addKey()
) may be executed multiple times for different sessions (each identified by a sessionId).
The current proposal does not support a mechanism to release keys. It is expected that the User Agent and CDM will release keys that are no longer needed as necessary to free resources. No use case for triggering this release from JavaScript has been identified.
This section is non-normative.
Initialization Data is a generic term for container-specific data that is used by Content Decryption Modules to generate a key request. It should always allow unique identification of the key or keys needed to decrypt the content, possibly after being parsed by a CDM or server.
Key Systems usually require a block of initialization data containing information about the stream to be decrypted before they can construct a key request message. This block could be as simple as a key or content ID to send to a server or as complex as an opaque Key System-specific collection of data. This initialization information may be obtained in some application-specific way or may be stored with the media data. Container formats may provide for storage of such information, possibly for multiple Key Systems in a single media file.
Initialization data found in the media data is provided to the application in the initData
attribute of the needkey
event.
This data has a container-specific format and is assumed to contain one or more generic or Key System-specific sets of initialization information.
Initialization Data - generic or containing information for the selected Key System - must be provided, in the same format, in the first media element method call that specifies a keySystem.
We extend media element to allow decryption key acquisition to be handled in JavaScript. We also extend canPlayType()
to provide basic information about the Key Systems supported by the user agent.
Note: For some CDMs, "key" and "key request" correspond to "license" and "license request", respectively.
partial interface HTMLMediaElement { // No API changes. 'type' string is extended. DOMString canPlayType(in DOMString type, in DOMstring? keySystem); void generateKeyRequest(in DOMString keySystem, in Uint8Array? initData); void addKey(in DOMString keySystem, in Uint8Array key, in Uint8Array? initData, in DOMString? sessionId); void cancelKeyRequest(in DOMString keySystem, in DOMString? sessionId); }; partial interface HTMLSourceElement { attribute DOMString keySystem; };
The canPlayType(type, keySystem)
method is modified to add an optional second parameter to specify the Key System.
The following list shows some examples of how to use the keySystem parameter in canPlayType()
calls.
video.canPlayType(null, "com.example.somesystem")
video.canPlayType(null, "com.example.somesystem.1_5")
video.canPlayType(mimeType, "com.example.somesystem")
video.canPlayType(mimeType, "org.w3.clearkey")
video.canPlayType(mimeType)
video.canPlayType(mimeType, null)
video.canPlayType(mimeType, "")
The canPlayType()
method provides a simple capability detection mechanism for Key System capabilities.
If multiple versions of a protection system exist with different capabilities, these can be allocated distinct identifiers by the owner of that Key System.
This can be extended even to feature discovery, for example "com.example.somesystem.ignite" and "com.example.somesystem.explode" might identify features of the "com.example.somesystem" keysystem.
It is an open question whether this usage is desirable or sufficient or whether more detailed capability detection mechanisms are needed.
In addition to the steps in the current specification, this method must run the following steps:
Check whether the Key System is supported with the specified container and codec type(s) by following the steps for the first matching condition from the following list:
Return "maybe" or "probably" as appropriate per the existing specification of canPlayType()
.
The generateKeyRequest(keySystem, initData)
method must run the following steps:
Note: The contents of initData are container-specific Initialization Data.
If the first argument is null, throw a SYNTAX_ERR
.
If networkState
is NETWORK_EMPTY
, throw an INVALID_STATE_ERR
.
In general, applications should wait for an event named needkey
or loadstart
(per the resource fetch algorithm) before calling this method.
Initialize handler by following the steps for the first matching condition from the following list:
NOT_SUPPORTED_ERR
.Schedule a task to handle the call, providing initData.
The user agent will asynchronously execute the following steps in the task:
Load handler if necessary.
Let defaultUrl be null.
Use handler to generate a key request and follow the steps for the first matching condition from the following list:
Let key request be a key request generated by the CDM using initData, if provided.
Note: handler must not use any data, including media data, not provided via initData.
If initData is not null and contains a default URL for keySystem, let defaultUrl be that URL.
queue a task to fire a simple event named keyerror
at the media element and abort the task.
The event is of type MediaKeyErrorEvent
and has:
keySystem
= keySystemsessionId
= nullerrorCode
= the appropriate MediaKeyError
codesystemCode
= a Key System-specific value, if provided, and 0 otherwise
Let sessionId be a unique Session ID string. It may be generated by handler.
queue a task to fire a simple event named keymessage
at the media element
The event is of type MediaKeyMessageEvent
and has:
keySystem
= keySystemsessionId
= sessionIdmessage
= key requestdefaultUrl
= defaultUrl
Note: message
may be a request for multiple keys, depending on the keySystem and/or initData. This is transparent to the application.
The addKey(keySystem, key, initData, sessionId)
method must run the following steps:
Note: The contents of key are keySystem-specific. It may be a raw key or a license containing a key. The contents may also vary depending on the container, key length, etc.
Note: The contents of initData are container-specific Initialization Data and should be the same format as the same parameter in generateKeyRequest()
.
It may be null.
The proposal currently allows addKey()
to be called without calling generateKeyRequest()
.
This has the advantages that simple use cases, especially for Clear Key Simple Decryption, are fairly straightforward and simple.
The disadvantages are that user agents need to support multiple flows and applications written for the simple case are different than those written for the more general case.
In addition, some container formats may not support the simple case (i.e. if initData is not easily-parsable to obtain a key ID).
It has been proposed that the initData parameter, which would most likely contain inforamation identifying the key or keys needed, be removed from addKey()
because any association can be done within the CDM using sessionId.
(However, see Session Correlation.)
Such a change depends on requiring that generateKeyRequest() always be called before addKey().
Assuming that change is made, removing the parameter simplifies the API but hides all association between a key identifier and key.
See this example for an illustration of the impact of this change.
If the first argument is null, throw a SYNTAX_ERR
.
If networkState
is NETWORK_EMPTY
, throw an INVALID_STATE_ERR
.
In general, applications should wait for an event named needkey
or loadstart
(per the resource fetch algorithm) before calling this method.
Initialize handler by following the steps for the first matching condition from the following list:
NOT_SUPPORTED_ERR
.If sessionId is not null and is unrecognized, throw an INVALID_ACCESS_ERR
.
Should this be handled here or in the task scheduled in the next step.
The advantage of handling it here is that what is likely a programming error is immediately and simply reported via an exception.
The disadvantage is that the user agent must store session IDs (and track when they are released) for each Key System rather than letting the CDM manage them.
This is inconsistent with the goal that the user agent should just pass information.
This same issue also applies to cancelKeyRequest()
.
Schedule a task to handle the call, providing key, initData, and sessionId.
The user agent will asynchronously execute the following steps in the task:
Load handler if necessary.
Let key stored be false.
Let next message be null.
Use handler to handle key.
Process key.
If key contains a key or license, store the key.
Let key ID be null.
If sessionId is not null and refers to a session with Initialization Data that contains a key ID, let key ID be that ID.
If key is not null and contains a key ID, let key ID be that ID.
If initData is not null and contains a key ID, let key ID be that ID.
Store the key by following the steps for the first matching condition from the following list:
Clear any key not associated with a key ID.
If a key already exists for key ID, delete that element.
Store the key and/or license in key indexed by key ID. The replacement algorithm is Key System-dependent.
Clear all stored keys.
Store the key and/or license in key with no associated key ID.
At most one key may be stored if key IDs are not used.
Clearing keys avoids needing to handle a mixture of keys with and without IDs in the Encrypted Block Encountered algorithm.
Note: It is recommended that CDM providers support a standard and reasonably high minimum number of cached keys/licenses (with IDs) per media element as well as a standard replacement algorithm. This enables a reasonable number of key rotation algorithms to be implemented across user agents and may reduce the likelihood of playback interruptions in use cases that involve various streams in the same element (i.e. adaptive streams, various audio and video tracks) using different keys.
Let key stored be true.
If another message needs to be sent to the server, let next message be that message.
In other words, resume playback if the necessary key is provided.
Fire the appropriate event by following the steps for the first matching condition from the following list:
queue a task to fire a simple event named keyadded
at the media element
The event is of type MediaKeyCompleteEvent
and has:
queue a task to fire a simple event named keymessage
at the media element
The event is of type MediaKeyMessageEvent
and has:
keySystem
= keySystemsessionId
= sessionIdmessage
= next messagedefaultUrl
= null
Is there a reason that this cannot be null?
If any of the preceding steps in the task failed, queue a task to fire a simple event named keyerror
at the media element.
The event is of type MediaKeyErrorEvent
and has:
keySystem
= keySystemsessionId
= sessionIderrorCode
= the appropriate MediaKeyError
codesystemCode
= a Key System-specific value, if provided, and 0 otherwise
The key acquisition process may involve the web page handling keymessage
events, sending the message to a Key System-specific service, and calling addKey
with the response message.
This continues until the keyadded
event is fired.
During the process, the web page may wish to cancel the acquisition process.
For example, if the page cannot contact the license service because of network issues it may wish to fallback to an alternative key system.
The page calls cancelKeyRequest()
to cancel the a key acquisition and return the media element to a state where generateKeyRequest()
may be called again.
The cancelKeyRequest(keySystem, sessionId)
method must run the following steps:
If the first argument is null, throw a SYNTAX_ERR
.
If sessionId is not null and is unrecognized or not mapped to the keySystem, throw an INVALID_ACCESS_ERR
.
keyadded
event has already been fired for this sessionId, throw an INVALID_STATE_ERR
.Can this step be done synchronously or should a task be queued to do it in the background and a needkey
event fired when done?
It is an open question what exactly should happen here.
The state of the media element is unknown and it may not have even triggered the original generateKeyRequest()
call.
Should a needkey
event be fired regardless of the state? What if the media element is not waiting for a key?
Should the media element attempt to resume playback if it is waiting for a key, causing an event if appropriate?
Should the application be responsible for calling generateKeyRequest()
without an event?
The keySystem
attribute of HTMLSourceElement
specifies the Key System to be used with the media resource
.
The resource selection algorithm is modified to check the keySystem
attribute after the existing step 5 of the Otherwise branch of step 6:
⌛ If candidate has a keySystem
attribute whose value represents a Key System that the user agent knows it cannot use with type
, then end the synchronous section, and jump down to the failed step below.
A media element is said to have a selected Key System when one of the following has occurred:
HTMLSourceElement
.
In this case, the selected key system is the keySystem
attribute of the selected HTMLSourceElement
.
In this case, the selected key system is the keySystem parameter for the last successful call.
MediaError
is extended, and a new error type is added.
partial interface MediaError { const unsigned short MEDIA_ERR_ENCRYPTED = 5; }; interface MediaKeyError { const unsigned short MEDIA_KEYERR_UNKNOWN = 1; const unsigned short MEDIA_KEYERR_CLIENT = 2; const unsigned short MEDIA_KEYERR_SERVICE = 3; const unsigned short MEDIA_KEYERR_OUTPUT = 4; const unsigned short MEDIA_KEYERR_HARDWARECHANGE = 5; const unsigned short MEDIA_KEYERR_DOMAIN = 6; };
The code
attribute of a MediaError
may additionally return the following:
MEDIA_ERR_ENCRYPTED
(numeric value 5)needkey
handler was providedIt has been suggested that there be a separate error for each of the above cases.
This is an option if the community feels that being able to differentiate among them is worthwhile.
A single error is consistent with the current broad error codes, though that may be something that should be improved in general.
It seems that except for #1, which should only occur in applications that do not support encrypted media, these are all application bugs and not something that would improve the user experience.
Any unique handling of the error codes by an application would essentially be describing a bug type.
Unique codes might be helpful in tracking down the cause of the bug, but there are probably other options.
It is also possible that some of these cases should be reported via MediaKeyErrorEventInit
.
A MediaKeyError
may be one of the following:
MEDIA_KEYERR_UNKNOWN
(numeric value 1)MEDIA_KEYERR_CLIENT
(numeric value 2)Should this be two separate errors?
MEDIA_KEYERR_SERVICE
(numeric value 3)addKey
indicated an error from the license service.MEDIA_KEYERR_OUTPUT
(numeric value 4)MEDIA_KEYERR_HARDWARECHANGE
(numeric value 5)MEDIA_KEYERR_DOMAIN
(numeric value 6)[Constructor(DOMString type, optional MediaKeyNeededEventInit eventInitDict)] interface MediaKeyNeededEvent : Event { readonly attribute DOMString? keySystem; readonly attribute DOMString? sessionId; readonly attribute Uint8Array? initData; }; dictionary MediaKeyNeededEventInit : EventInit { DOMString? keySystem; DOMString? sessionId; Uint8Array? initData; };
[Constructor(DOMString type, optional MediaKeyMessageEventInit eventInitDict)] interface MediaKeyMessageEvent : Event { readonly attribute DOMString keySystem; readonly attribute DOMString? sessionId; readonly attribute Uint8Array message; readonly attribute DOMString? defaultUrl; }; dictionary MediaKeyMessageEventInit : EventInit { DOMString keySystem; DOMString? sessionId; Uint8Array message; DOMString? defaultUrl; };
[Constructor(DOMString type, optional MediaKeyCompleteEventInit eventInitDict)] interface MediaKeyCompleteEvent : Event { readonly attribute DOMString keySystem; readonly attribute DOMString? sessionId; }; dictionary MediaKeyCompleteEventInit : EventInit { DOMString keySystem; DOMString? sessionId; };
[Constructor(DOMString type, optional MediaKeyErrorEventInit eventInitDict)] interface MediaKeyErrorEvent : Event { readonly attribute DOMString keySystem; readonly attribute DOMString? sessionId; readonly attribute MediaKeyError errorCode; readonly attribute unsigned short systemCode; }; dictionary MediaKeyErrorEventInit : EventInit { DOMString keySystem; DOMString? sessionId; MediaKeyError errorCode; unsigned short systemCode; };
keySystem
Returns the name of the Key System that generated the event.
sessionId
Returns the Session ID the event is related to, if applicable.
initData
Returns the Initialization Data related to the event.
message
Returns the message (i.e. key request) to send.
defaultUrl
Returns the default key exchange URL.
errorCode
Returns the MediaKeyError
for the error that occurred.
systemCode
Returns a Key System-dependent status code for the error that occurred.
The keySystem
attribute is an identifier for the Key System that generated the event.
It may be null in the needkey
event if the media element does not have a selected Key System.
The sessionId
attribute is the Session ID for the key or license that this event refers to. It may be null.
The initData
attribute contains Initialization Data specific to the event.
The message
attribute contains a message from the CDM. Messages are Key System-specific. In most cases, it should be sent to a key server.
The defaultUrl
is the default URL to send the key request to as provided by the media data. It may be null.
The errorCode
attribute contains the MediaKeyError
code for the error that occurred.
The systemCode
attribute contains a Key System-dependent status code for the error that occurred.
This allows a more granular status to be returned than the more general errorCode
.
It should be 0 if there is no associated status code or such status codes are not supported by the Key System.
If a response (i.e. a license) is necessary, applications should use one of the new methods to provide the response.
Event name | Interface | Dispatched when... | Preconditions |
---|---|---|---|
keyadded |
MediaKeyCompleteEvent |
A key has been added as the result of a addKey() call.
|
|
keyerror |
MediaKeyErrorEvent |
An error occurs in one of the new methods or CDM. | |
keymessage |
MediaKeyMessageEvent |
A message has been generated (and likely needs to be sent to a key server).
For example, a key request has been generated as the result of a generateKeyRequest() call or another message must be sent in response to an addKey() call.
|
|
needkey |
MediaKeyNeededEvent |
The user agent needs a key or license to begin or continue playback.
It may have encountered media data that may/does require decryption to load or play OR need a new key/license to continue playback. |
readyState is equal to HAVE_METADATA or greater.
It is possible that the element is playing or has played.
|
It has been proposed that needkey
be a simple event.
In this case, it would not provide any indication of the key that is needed and the application would need to call generateKeyRequest()
to get an appropriate message or identifier, including for the Clear Key case.
Such a change assumes that the consistent flow option is selected.
See this example for an illustration of the impact of this change.
sessionId
is not included in needkey
and is not generated until generateKeyRequest()
generates a keymessage
event, this cange would not result in the loss of any correlation.
See Session Correlation for a discussion of the general lack of correlation.)
This section is non-normative.
The above sections provide for delivery of key/license information to a Content Decryption Module. This section provides for management of the entire key/license lifecycle, that is, secure proof of key release. Use cases for such proof include any service where is it necessary for the service to know, reliably, which granted keys/licences are still available for use by the user and which have been deleted. Examples include a service with restrictions on the number of concurrent streams available to a user or a service where content is available on a rental basis, for use offline.
Secure proof of key release must necessarily involve the CDM due to the relative ease with which scripts may be modified. The CDM must provide a message asserting, in a CDM-specific form, that a specific key or license has been destroyed. Such messages must be cached in the CDM until acknowledgement of their delivery to the service has been received. This acknowledgement must also be in the form of a CDM-specific message.
The mechanism for secure proof of key release operates outside the scope of any media element. This is because proof-of-release messages may be cached in CDMs after the associated media elements have been destroyed. Proof-of-key-release messages are subject to the same origin policy: they shall only be delivered to scripts with the same origin as the script which created the media element that provided the key/license.
The following interface is defined for management of key release messages:
[Constructor()] interface KeyReleaseManager : EventTarget { void getKeyReleases(in DOMString keySystem); void addKeyReleaseCommit(in DOMString keySystem, in DOMString sessionId, in Uint8Array message); }
The getKeyReleases(keysystem)
method must run the following steps:
If the first argument is null, throw a SYNTAX_ERR
.
Initialize handler by following the steps for the first matching condition from the following list:
NOT_SUPPORTED_ERR
.Schedule a task to handle the call.
The user agent will asynchronously execute the following steps in the task:
Load handler if necessary.
Use handler to generate one or more key release messages, if supported. handler should follow the steps for the first matching condition from the following list:
For each key release message in key release messages, queue a task to fire a simple event named keyrelease
at the key release manager.
The event is of type MediaKeyMessageEvent
and has:
keySystem
= keySystemsessionId
= the sessionId originally associated with the provision of the keymessage
= key release messagedefaultUrl
= value of the default URL, if stored by the CDM.
The addKeyReleaseCommit(keysystem,
sessionId,
message)
method must run the following steps:
If the first argument is null, throw a SYNTAX_ERR
.
Initialize handler by following the steps for the first matching condition from the following list:
NOT_SUPPORTED_ERR
.Schedule a task to handle the call, providing sessionId and message.
The user agent will asynchronously execute the following steps in the task:
Load handler if necessary.
Use handler to commit the message. handler should follow the steps for the first matching condition from the following list:
queue a task to fire a simple event named keyreleasecommitted
at the key release manager.
The event is of type MediaKeyCompleteEvent
and has:
queue a task to fire a simple event named keyerror
at the key release manager.
The event is of type MediaKeyErrorEvent
and has:
keySystem
= keySystemsessionId
= sessionIderrorCode
= the appropriate MediaKeyError
codesystemCode
= a Key System-specific value, if provided, and 0 otherwise
The following steps are run when the media element encounters a block (i.e. frame) of encrypted media data during the resource fetch algorithm:
Let key system be null.
Let handler be null.
Let block initData be null.
Let block key be null.
If the block (or its parent entity) has Initialization Data, let block initData be that initialization data.
Select the key system and handler by following the steps for the first matching condition from the following list:
Let key system be the selected Key System.
Let handler be the content decryption module corresponding to key system.
Load handler if necessary.
Use handler to select the key:
Let block key ID be null.
If block initData is not null and contains a key ID, let block key ID be that ID.
Select the key by following the steps for the first matching condition from the following list:
Select the key by using handler to follow the steps for the first matching condition from the following list:
Select the key by using handler to follow the steps for the first matching condition from the following list:
MEDIA_ERR_ENCRYPTED
error.Key Presence: Handle the presence of a key by following the steps for the first matching condition from the following list:
MEDIA_ERR_ENCRYPTED
error.Note: Not all decryption problems (i.e. using the wrong key) will result in a decryption failure. In such cases, no error is fired here but one may be fired during decode.
needkey
needkey
at the media element.
The event is of type MediaKeyNeededEvent
and has:
The media element is said to be potentially playing
unless playback stops because the stream cannot be decrypted, in which case the media element is said to be waiting for a key.
MEDIA_ERR_ENCRYPTED
error.For frame-based encryption, this may be implemented as follows when the media element attempts to decode a frame as part of the resource fetch algorithm:
Let encrypted be false.
Detect whether the frame is encrypted.
Decode the frame.
Provide the frame for rendering.
The following paragraph is added to Playing the media resource.
potentially playing
but
the user agent has reached a point in the media resource
that must be decrypted for the resource to continue and the CDM does not have the necessary key.
The following steps are run when the media element encounters a source that may contain encrypted blocks or streams during the resource fetch algorithm:
Let key system be null.
Let handler be null.
Let initData be null.
If Initialization Data was encountered, let initData be that initialization data.
Select the key system and handler by following the steps for the first matching condition from the following list:
Let key system be the selected Key System.
Let handler be the content decryption module corresponding to key system.
Load handler if necessary.
Use handler to determine whether the key is known:
Let key ID be null.
If a key ID for the source is known at this time, let key ID be that ID.
If initData is not null and contains a key ID, let key ID be that ID.
Determine whether the key is already known by following the steps for the first matching condition from the following list:
Determine whether the key is known by following the steps for the first matching condition from the following list:
Determine whether the key is known by following the steps for the first matching condition from the following list:
Need Key: queue a task to fire a simple event named needkey
at the media element.
The event is of type MediaKeyNeededEvent
and has:
Firing this event allows the application to begin acquiring the key process before it is needed.
This could be skipped if the key has already been set, but always sending the event seems easier.
Note that readyState
is not changed and no algorithms are aborted. This algorithm is merely informative.
Continue Normal Flow: Continue with the existing media element's resource fetch algorithm.
The following step is added to the existing media element load algorithm:
Clear all cached keys for this media element.
This also means the keys will be cleared when the src
attribute is set or changed per Location of the media resource
All user agents must support the simple decryption capabilities described in this section regardless of whether they support a more advanced CDM. This ensures that there is a common baseline level of protection that is guaranteed to be supported in all user agents, including those that are entirely open source. Thus, content providers that need only basic protection can build simple applications that will work on all platforms without needing to work with any content protection providers.
The "org.w3.clearkey" Key System indicates a plain-text clear (unencrypted) key will be used to decrypt the source. No additional client-side content protection is required. Use of this Key System is described below.
The keySystem parameter and keySystem
attributes are always "org.w3.clearkey"
with the exception of events before the Key System has been selected.
All events except needkey
have a valid sessionId
string, which is numerical.
The initData
attribute of the needkey
event and the initData parameters of generateKeyRequest()
and addKey()
are the same container-specific Initialization Data format and values.
If supported, these values should provide some type of identification of the content or key that could be used to look up the key (since there is no defined logic for parsing it).
For containers that support a simple key ID, it should be a binary array containing the raw key ID.
For other containers, it may be some other opaque blob or null.
generateKeyRequest()
may optionally be called.
The resulting MediaKeyMessageEvent
has:
keySystem
= "org.w3.clearkey"
sessionId
= a unique numerical stringmessage
= a container-specific unique key identifier extracted from the initData parameter (if initData was and null one could not be extracted; otherwise null)defaultUrl
= value of the default URL if present in the media data and null otherwise.
To provide a key using this Key System, pass the following to addKey()
:
"org.w3.clearkey"
generateKeyRequest()
was called:message
attribute of the resulting MediaKeyMessageEvent
generateKeyRequest()
was called:sessionId
attribute of the resulting MediaKeyMessageEvent
This section and its subsections are non-normative.
This section contains example solutions for various use cases using the proposed extensions. These are not the only solutions to these use cases. Video elements are used in the examples, but the same would apply to all media elements. In some cases, such as using synchronous XHR, the examples are simplified to keep the focus on the extensions.
In this simple example, the source file and clear-text key are hard-coded in the page.
This example is very simple because it does not care when the key has been added or associating that event with the addKey()
call. It also does not handle errors.
<script> function load() { var video = document.getElementById("video"); var key = new Uint8Array([ 0xaa, 0xbb, 0xcc, ... ]); video.addKey("org.w3.clearkey", key, null); } </script> <body onload="load()"> <video src="foo.webm" autoplay id="video"></video> </body>
The solution below shows what the simple solution above would become if we choose to require a consistent flow for all applications. In this scenario, the serial solution above becomes the event-based solution shown below. The next example also illustrates the impact.
<script> function load() { var video = document.getElementById("video"); video.generateKeyRequest("org.w3.clearkey", null); } function handleMessage(event) { if (event.keySystem != "org.w3.clearkey") throw "Unhandled keySystem in event"; var video = event.target; var key = new Uint8Array([ 0xaa, 0xbb, 0xcc, ... ]); video.addKey("org.w3.clearkey", key, null, event.sessionId); } </script> <body onload="load()"> <video src="foo.webm" autoplay id="video" onkeymessage="handleMessage(event)"></video> </body>
In this case, the Initialization Data is contained in the media data.
If this was not the case, handleKeyNeeded()
could obtain and provide it instead of getting it from the event.
If any asynchronous operation is required to get the key in handleKeyNeeded()
, it could be called a second time if the stream is detected as potentially encrypted before an encrypted block/frame is encountered. In this case, applications may want to handle subsequent calls specially to avoid redundant license requests. This is not shown in the examples below.
This solution uses the Clear Key Simple Decryption.
As with the previous example, this one is very simple because it does not care when the key has been added or handle errors.
<script> function handleKeyNeeded(event) { if (event.keySystem && event.keySystem != "org.w3.clearkey") throw "Unhandled keySystem in event"; var initData = event.initData; var video = event.target; var xmlhttp = new XMLHttpRequest(); xmlhttp.open("POST", "http://.../getkey", false); xmlhttp.send(initData); var key = new Uint8Array(xmlhttp.response); video.addKey("org.w3.clearkey", key, initData, event.sessionId); } </script> <video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)"></video>
The solution below shows what the solution above would become if we choose to require a consistent flow, make needkey a simple event, and removed from the data parameter from addKey().
<script> function handleKeyNeeded(event) { if (event.keySystem && event.keySystem != "org.w3.clearkey") throw "Unhandled keySystem in event"; var video = event.target; // Note: The CDM will generate a request for whatever Initialization Data it chooses since there is no association with the current event. video.generateKeyRequest("org.w3.clearkey", null); } function handleMessage(event) { if (event.keySystem != "org.w3.clearkey") throw "Unhandled keySystem in event"; var message = event.message; var video = event.target; var xmlhttp = new XMLHttpRequest(); xmlhttp.open("POST", "http://.../getkey", false); xmlhttp.send(message); var key = new Uint8Array(xmlhttp.response); // Note: The CDM will find the Initialization Data based on the sessionId. video.addKey("org.w3.clearkey", key, event.sessionId); } </script> <video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="handleMessage(event)"></video>
Some differences of note:
needkey
event and the method calls since Session ID is not created until generateKeyRequest()
completes.
This is a general problem that the first solution only works around.
See Session Correlation.
This solution uses more advanced decryption from a fictitious content decryption module called Some System.
<script> function handleKeyNeeded(event) { if (event.keySystem && event.keySystem != "com.example.somesystem.1_0") throw "Unhandled keySystem in event"; var initData = event.initData; var video = event.target; video.generateKeyRequest("com.example.somesystem.1_0", initData); } function licenseRequestReady(event) { if (event.keySystem != "com.example.somesystem.1_0") throw "Unhandled keySystem in event"; var request = event.message; if (!request) throw "Could not create license request"; var video = event.target; var xmlhttp = new XMLHttpRequest(); xmlhttp.open("POST", "http://.../getkey", false); xmlhttp.send(request); var license = new Uint8Array(xmlhttp.response); video.addKey("com.example.somesystem.1_0", license, null, event.sessionId); } </script> <video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="licenseRequestReady(event)"></video>
Below is an example of detecting supported Key System using canPlayType()
and selecting one.
<script> var keySystem; var licenseUrl; function selectKeySystem(video) { if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem") != “”) { licenseUrl = “https://license.example.com/getkey”; // OR “https://example.<My Video Site domain>” if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem.2_0") != “”) { keySystem = “com.example.somesystem.2_0”; } else if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem.1_5") != “”) { keySystem = “com.example.somesystem.1_5”; } } else if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "foobar") != “” { licenseUrl = “https://license.foobar.com/request”; keySystem = “foobar”; } else { throw “Key System not supported”; } } function handleKeyNeeded(event) { var targetKeySystem = event.keySystem; if (targetKeySystem == null) { selectKeySystem(video); // See previous example for implementation. targetKeySystem = keySystem; } var initData = event.initData; var video = event.target; video.generateKeyRequest(targetKeySystem, initData); } function licenseRequestReady(event) { if (event.keySystem != keySystem) throw "Message from unexpected Key System"; var request = event.message; if (!request) throw "Could not create license request"; var video = event.target; var xmlhttp = new XMLHttpRequest(); xmlhttp.open("POST", licenseUrl, false); xmlhttp.send(request); var license = new Uint8Array(xmlhttp.response); video.addKey(keySystem, license, null, event.sessionId); } </script> <video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="licenseRequestReady(event)"></video>
This is a more complete example showing all events being used along with asynchronous XHR.
Note that handleKeyMessage
could be called multiple times, including in response to the addKey()
call if multiple round trips are required and for any other reason the Key System might need to send a message.
<script> var keySystem; var licenseUrl; function handleMessageResponse() { var license = new Uint8Array(xmlhttp.response); var video = document.getElementById(“video”); video.addKey(keySystem, license, null, this.sessionId); } function sendMessage(message, sessionId) { xmlhttp = new XMLHttpRequest(); xmlhttp.sessionId = sessionId; xmlhttp.onreadystatechange = handleMessageResponse; xmlhttp.open("POST", licenseUrl, true); xmlhttp.send(message); } function handleKeyNeeded(event) { var targetKeySystem = event.keySystem; if (targetKeySystem == null) { selectKeySystem(video); // See previous example for implementation. targetKeySystem = keySystem; } var initData = event.initData; var video = event.target; video.generateKeyRequest(targetKeySystem, initData); } function handleKeyMessage(event) { if (event.keySystem != keySystem) throw "Message from unexpected Key System"; var message = event.message; if (!message) throw "Invalid key message"; sendMessage(message, event.sessionId); } function handleKeyComplete(event) { // Do some bookkeeping with event.sessionId if necessary. } function handleKeyError(event) { // Report event.errorCode and do some bookkeeping with event.sessionId if necessary. } </script> <video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="handleKeyMessage(event)" onkeyadded="handleKeyComplete(event)" onkeyerror="handleKeyError(event)"></video>
This section and its subsections are non-normative.
Everything from user-generated content to be shared with family (user is not an adversary) to online radio to feature-length movies.
Yes, this proposal is compatible with both "Type 1" and "Type 3" adaptive streaming modes as defined by the W3C Web & TV Interest Group.
src
, which could be encrypted and handled just like a normal stream.
Yes.
No, this proposal only supports decrypting audio and video that are part of the media data.
<source>
elements?Yes, using the keySystem
attribute of the HTMLSourceElement
.
When used with type
attribute, this will select the first <source>
element (container, codec, and Key System) that the user agent might support.
The selected CDM will not be reported to the application until an event is fired.
Yes.
Heartbeat is a mode of operation where the Content Decryption Module requires to receive an explicit heartbeat message from its server on a regular basis, otherwise decryption is blocked. This enables use-cases requiring strict online control of access to the content. Heartbeat must be supported by the CDM and is implemented in this model by supplying an expiration time or valid duration in the license provided to the CDM. Before expiry of this license, the CDM must trigger a new message exchange to obtain an updated license.
It is an open question whether CDMs should
keymessage
to continue the current sessionMediaKeyNeededEvent
definition. See the related issue.
Yes. The application can add this to the license request (sent via XMLHttpRequest
in the examples) or send it to the CDM via generateKeyRequest()
to be included in the license request.
needkey
event in the Encrypted Block Encountered algorithm?Assuming there are no other issues, playback will resume when the needed key is provided by addKey()
and processed.
Yes, this will likely be necessary to support all or a majority of user agents. An application could also use different Key Systems on a single user agent for different purposes.
We envision CDM providers creating JavaScript libraries that application developers can include. canPlayType()
can then be used to select from supported libraries.
This is vendor-/Key System-specific.
Obtaining this information could take time and is open-ended, so it is not appropriate for canPlayType()
.
There is also no way for canPlayType()
to attest to capabilities anyway.
Some basic Key System feature detection may be available via canPlayType().
needkey
event with a null keySystem
attribute?This is a very common scenario because it happens when the user agent encounters encrypted media and does not have an appropriate key.
If the application does not already know which Key System to use, it should use canPlayType()
to select an appropriate one.
When the keySystem
attribute is null, the initData
attribute is always independent of the Key System.
licenseUrl
) in the examples?This is the URL for a server capable of providing the key for the stream, usually using the Initialization Data and often after verifying the requesting user. The URL is application- and/or Key System-specific and may be a content provider or a Key System provider depending on the solution.
That's not a question, but we'll try to address it anyway. As shown in the examples, the basic use cases are reasonably simple and only require a little setup to get the key and provide it to the user agent. We believe most small content sites can add basic protection to their applications with minimal efforts.
The more complex cases, such as fast time to first frame and various license management algorithms, require more complex code, but professional-strength content protection is complex and that is to be expected. Professional-strength content protection requires server components and working with one or more content protection vendors, so this isn’t really any more complex. In fact, if you implement a few solutions, it will work on any browser-based platform, avoiding the need for per-platform solutions on both the server and client. The fixed set of interfaces may even lead to more consistent patterns and behavior across various solutions. It is generally the large content providers that have more complex requirements, and we believe they will have the appropriate resources to implement applications that meet their requirements.
Providers of content decryption modules will need to provide detailed specifications for actions and events to guide content providers in designing the algorithms in their applications. They can also provide a JavaScript libraries for their solution that can be integrated into any application. An application would then basically select a solution and delegate a lot of the work to the appropriate library.
This is container specific. A container may standardize on a specific algorithm (i.e. AES-128) and/or allow it to be specified. The user agent must know and/or detect the appropriate algorithm to use with the key provided by this API.
Advantages include:
canPlayType()
need to be modified? Why doesn't it provide more information?The modifications allow applications to detect whether the user agent is capable of supporting the application's encrypted content (at any level of protection) and to allow the application to branch to the appropriate code and/or select a CDM library.
At the same time, we do not want to put too much burden on canPlayType()
and it must remain a synchronous method that can be processed from static data. See the related question.
canPlayType()
need a second parameter? Why not just add Key System to the type
parameter string (or MIME type)?This could have gone either way, and the behavior of both existing user agents and those that implement these extensions would be the same. (Existing user agents ignore it in both cases.)
The main reason for using a separate parameter is that the Key System is not part of the MIME type (see the related question), and the type
parameter is generally used interchangeably with the MIME type.
Separating the Key System from the MIME type should avoid confusion.
The downside is that the same type
string cannot be used for both canPlayType()
and the source element's type
attribute.
Instead, the Key System is passed as a second parameter to canPlayType()
and as a separate attribute to the source element.
Errors that occur during synchronous portion of the algorithms will be thrown.
For asynchronous portions (i.e. when a task is scheduled), a MediaKeyErrorEvent
will be fired.
In many cases (especially the direction the content providers and standards are moving), the stream is not specific to any one Key System or provider. Multiple Key Systems could be used to decrypt the same generic stream. Thus, the Key System is not information about the file and should not be part of the MIME type.
One could argue that the encryption algorithm (e.g. AES-128) and configuration should be in the MIME type. That is not required for this proposal, so it is not addressed here.
While many use case could be implemented without an additional event (by requiring the app to provide all the information up front), some use cases may be better handled by an event.
The keySystem
attribute ensures that the application knows which CDM caused the event so it can know how to handle the event. While the application could probably know or discover this in other ways, this makes it simple for the application.
MediaError
code?Without a new error code (MEDIA_ERR_ENCRYPTED
), it is not possible for user agents to clearly indicate to an application that playback failed because the content was encrypted and user agents will likely need to fire a MEDIA_ERR_DECODE
or MEDIA_ERR_SRC_NOT_SUPPORTED
, which would be confusing.
MediaError
break existing applications?Applications that are not aware of the new error code (MEDIA_ERR_ENCRYPTED
) may not correctly handle it, but they should still be able to detect that an error has occurred because a) an error event is fired and b) media .
is not null.error
MediaKeyError
) and event (MediaKeyErrorEvent
)?While key/license exchange errors are fatal to the exchange session, most are not fatal to playback. This is especially true if the media element already has a key for the current (and future) frames or, for example, the exchange was for a different stream in an adaptive streaming scenario. The separation allows the media element to continue playback while the application attempts to resolve the exchange problem or until the requested key/license is actually needed.
needkey
event from a encountering a potentially encrypted stream is not received before encountering an encrypted block?The Encrypted Block Encountered algorithm will proceed as normal.
If no appropriate key has been provided, a second needkey
event will be fired and decoding will stop.
needkey
event with the same attributes is fired for both Encrypted Block Encountered and Potentially Encrypted Stream Encountered. How can an application distinguish between the two?The same event was used intentionally to reduce the complexity of applications. Ideally, they would not need to know.
HTMLMediaElement
?(Expanding on the question, this relates to the new methods, including generateKeyRequest()
and addKey()
, that modify state and does not apply to canPlayType()
, which is explicitly intended to be called with multiple Key System strings.
For example, what if generateKeyRequest()
is called with one Key System then addKey()
is called with another; or if addKey()
is called twice with two different Key Systems.)
If a load occurs between calls with different Key Systems, then there is no problem.
Otherwise, the calls will be treated separately.
generateKeyRequest()
will start a new session with a new Session ID.
addKey()
will behave as normal unless sessionId parameter is not null and is unrecognized for the specified keySystem parameter.
addKey()
?Replace it, updating the ordering to reflect that this key ID was most recently added.
In other words, simply replacing the existing key data is not sufficient.
The exact algorithm is covered in addKey()
.
Containers and codecs are not specified. A user agent may support decryption of whichever container and codec combination(s) it wishes.
If a user agent support decryption of a container/codec combination (as reported by canPlayType()
), it must also support Simple Decryption of that combination.
The application must use addKey()
to indicate the stream is encrypted and provide the key before decoding starts.
This is ideal, but the API would also support the application sending the Initialization Data or ID directly to the server or providing it to the CDM via generateKeyRequest()
.
The application will need to use some other mechanism to select the appropriate key for the content. The user agent will only be able to use one key at a time. Key rotation will be much more complex or impossible.
Yes, though you may want to consider the complexity and performance drawbacks. For the best user experience, you will want to provide keys for the streams to the user agent before the switch.
This depends on the container/codec being used. This proposal should support all cases, including entirely encrypted streams, individual frames encrypted separately, groups of frames encrypted, and portions of frames encrypted. If not all blocks or frames are encrypted, the user agent should be able to easily detect this, either based on an indication in the container or the block/frame.
No, subject to container/codec limitations.
The cipher and parameters should be implicit in or specified by the container. If some are optional, the application must know what is supported by the CDM.
As in the above question, these are either implicit in or specified by the container. User agents must support any default or baseline ciphers and parameters in the container specification. Practically, user agents should support all ciphers and parameters commonly used with the container.
No. Protecting the content key would require that the browser's media stack have some secret that cannot easily be obtained. This is the type of thing DRM solutions provide. Establishing a standard mechanism to support this is beyond the scope of HTML5 standards and should be deferred to specific user agent solutions. In addition, it is not something that fully open source browsers could natively support.
Content protected using this proposal without a content protection provider is still more secure and a higher barrier than providing an unencrypted file over HTTP or HTTPS. We would also argue that it is no less secure than encrypted HLS. For long streams, key rotation can provide additional protection.
It is also possible to extend the proposed specification in the future to support a more robust basic case without changing the API.
Yes. The application will query the user agent's capabilities and select the Key System to use.
Yes, this proposal naturally supports such protection.
Yes, a user agent could use platform-specific capabilities to protect the rendering path.
This section and its subsections are non-normative.
This section describes some open issues on which comments are requested.
It has been suggested that only a single key manager attribute be added to the HTMLMediaElement itself in order to improve encapsulation. For example:
partial interface HTMLMediaElement { attribute MediaKeyManager keymanager; }; interface MediaKeyManager { void generateKeyRequest(in DOMString keySystem, in Uint8Array? initData); void addKey(in DOMString keySystem, in Uint8Array key, in Uint8Array? initData, in DOMString? sessionId); void cancelKeyRequest(in DOMString keySystem, in DOMString? sessionId); };
A variant of the API with the same functionality has been suggested in which key exchange 'sessions' are explicitly represented as objects.
The methods used to supply a key/license or cancel the session become methods on this object, not the HTMLMediaElement
itself.
partial interface HTMLMediaElement { MediaKeySession generateKeyRequest(in DOMString keySystem, in Uint8Array? initData); }; interface MediaKeySession : EventTarget { readonly attribute DOMString keySystem; readonly attribute DOMString? sessionId; void addKey(in Uint8Array key); void cancel(); };
The following event would be fired at the MediaKeySession
when a message is ready to be sent.
[Constructor(DOMString type, optional MediaKeyMessageEventInit eventInitDict)] interface MediaKeyMessageEvent : Event { readonly attribute Uint8Array message; readonly attribute DOMString? defaultUrl; }; dictionary MediaKeyMessageEventInit : EventInit { Uint8Array message; DOMString? defaultUrl; };
Note that in the MediaKeySession
interface, sessionId
is guaranteed to be initialized only after the first MediaKeyMessageEvent
.
The following event would be fired at the MediaKeySession
when the transaction is complete. (It could be replaced by a simple event.)
[Constructor(DOMString type)] interface MediaKeyCompleteEvent : Event { };
Finally, the following event would be fired at MediaKeySession
if getKeyRequest()
or addKey()
results in an error.
[Constructor(DOMString type, optional MediaKeyErrorEventInit eventInitDict)] interface MediaKeyErrorEvent : Event { readonly attribute MediaKeyError errorCode; readonly attribute unsigned short systemCode; }; dictionary MediaKeyErrorEventInit : EventInit { MediaKeyError errorCode; unsigned short systemCode; };
The current API design allows for multiple parallel key requests to be in flight. Each call to generateKeyRequest()
begins a message exchange resulting ultimately in a keyadded
or keyerror
event.
The first keymessage
event may contain a Session ID identifying the session.
This session ID is later used to enable correlation between messages conveyed in keymessage
and responses added in addKey
.
However, the current design does not support correlation between specific generateKeyRequest()
calls (and the needkey
event that might have triggered it) and subsequent sessions.
If a page knows it needs two keys, it can call generateKeyRequest()
twice but there is no way to know which keymessage
or keyerror
results from each call.
This might be particularly important for the error case. Modifications to the API such as those described in Object-Oriented API Design could address this issue.
HTML5 defines a MediaController
that is used to coordinate playback of multiple media elements.
The current proposal does not support a scenario where a single key is required for multiple media elements coordinated through a single MediaController
.
One way to solve this would be to create a new interface that provides the Media Element Extensions and then provide an instance of this interface on both the HTMLMediaElement
and on the MediaController
interfaces.
The changes outlined in section Object-Oriented API Design might be modified to support this approach.
It is possible that a stream may encounter a different key for a given stream after a key request session as been completed. How this should be handled is not explicitly described; it may be up to the Key System and/or application but that might lead to confusion and inconsistencies.
One option is to fire a keymessage
to be sent to the server, which would return a new license to provide via addKey()
.
The same Session ID would be used because generateKeyRequest()
is not called again.
Note that this means a keymessage
even can occur after a keyadded
event for the same session.
Another option is to fire a needkey
event and follow the same steps as for the first key.
In this case, the application should call generateKeyRequest()
to generate the message.
This would result in the generation of a new Session ID, which is consistent with the first key.
If we select the first option, MediaKeyNeededEvent
, the type of the needkey
event can be simplified because it would never be called with a known keySystem
or sessionId
.
If we select the second option, keySystem
should almost certainly be retained on MediaKeyNeededEvent
and sessionId
probably should be retained.
This decision should account for other use cases, such as heartbeat.
For heartbeat and any other CDM-originated message that isn't requesting a new key, it probably makes sense to use the same Session ID and provide the request directly via a keymessage
event.
This is essentially the first option above.
Selecting the second option for multiple keys does not necessarily mean that heartbeat cannot work differently.
Version | Comment |
---|---|
0.1 | Initial Proposal |