Media Source Extensions

The working group maintains a list of all bug reports that the editors have not yet tried to address; there may also be open bugs in the previous bug tracker. This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid.

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the mailing list mentioned below and take part in the discussions.

For this specification to exit the Candidate Recommendation stage, two independent implementations as detailed in the CR Exit Criteria (Public Permissive version 3) document will be required to pass each test in the MSE test suite to be developed by the HTML Media Extensions WG.

This specification extends HTMLMediaElement [[!HTML51]] to allow JavaScript to generate media streams for playback. Allowing JavaScript to generate streams facilitates a variety of use cases like adaptive streaming and time shifting live streams.

If you wish to make comments or file bugs regarding this document in a manner that is tracked by the editors, please submit them via our public bug database.

Goals

This specification was designed with the following goals in mind:

Allow JavaScript to construct media streams independent of how the media is fetched.
Define a splicing and buffering model that facilitates use cases like adaptive streaming, ad-insertion, time-shifting, and video editing.
Minimize the need for media parsing in JavaScript.
Leverage the browser cache as much as possible.
Provide requirements for byte stream format specifications.
Not require support for any particular media format or codec.

This specification defines:

Normative behavior for user agents to enable interoperability between user agents and web applications when processing media data.
Normative requirements to enable other specifications to define media formats to be used within this specification.

Definitions

Active Track Buffers

The that provide for the , the , and the or . All these tracks are associated with SourceBuffer objects in the list.

Append Window

A range used to filter out while appending. The append window represents a single continuous time range with a single start time and end time. Coded frames with within this range are allowed to be appended to the SourceBuffer while coded frames outside this range are filtered out. The append window start and end times are controlled by the and attributes respectively.

Coded Frame

A unit of media data that has a , a , and a .

Coded Frame Duration

The duration of a . For video and text, the duration indicates how long the video frame or text SHOULD be displayed. For audio, the duration represents the sum of all the samples contained within the coded frame. For example, if an audio frame contained 441 samples @44100Hz the frame duration would be 10 milliseconds.

Coded Frame End Timestamp

The sum of a and its . It represents the that immediately follows the coded frame.

Coded Frame Group

A group of that are adjacent and have monotonically increasing without any gaps. Discontinuities detected by the and calls trigger the start of a new coded frame group.

Decode Timestamp

The decode timestamp indicates the latest time at which the frame needs to be decoded assuming instantaneous decoding and rendering of this and any dependant frames (this is equal to the of the earliest frame, in , that is dependant on this frame). If frames can be decoded out of , then the decode timestamp MUST be present in or derivable from the byte stream. The user agent MUST run the if this is not the case. If frames cannot be decoded out of and a decode timestamp is not present in the byte stream, then the decode timestamp is equal to the .

Initialization Segment

A sequence of bytes that contain all of the initialization information required to decode a sequence of . This includes codec initialization data, mappings for multiplexed segments, and timestamp offsets (e.g., edit lists).

The in the byte stream format registry [[MSE-REGISTRY]] contain format specific examples.

Media Segment

A sequence of bytes that contain packetized & timestamped media data for a portion of the . Media segments are always associated with the most recently appended .

The in the byte stream format registry [[MSE-REGISTRY]] contain format specific examples.

MediaSource object URL

A MediaSource object URL is a unique [[!FILE-API]] created by . It is used to attach a MediaSource object to an HTMLMediaElement.

These URLs are the same as a , except that anything in the definition of that feature that refers to and objects is hereby extended to also apply to MediaSource objects.

The origin of the MediaSource object URL is the of this during the call to .

For example, the of the MediaSource object URL affects the way that the media element is consumed by canvas.

Parent Media Source

The parent media source of a SourceBuffer object is the MediaSource object that created it.

Presentation Start Time

The presentation start time is the earliest time point in the presentation and specifies the initial playback position and earliest possible position. All presentations created using this specification have a presentation start time of 0.

For the purposes of determining if contains a that includes the current playback position, implementations MAY choose to allow a current playback position at or after and before the first to play the first if that starts within a reasonably short time, like 1 second, after . This allowance accommodates the reality that muxed streams commonly do not begin all tracks precisely at . Implementations MUST report the actual buffered range, regardless of this allowance.

Presentation Interval

The presentation interval of a is the time interval from its to the plus the . For example, if a coded frame has a presentation timestamp of 10 seconds and a of 100 milliseconds, then the presentation interval would be [10-10.1). Note that the start of the range is inclusive, but the end of the range is exclusive.

Presentation Order

The order that are rendered in the presentation. The presentation order is achieved by ordering in monotonically increasing order by their .

Presentation Timestamp

A reference to a specific time in the presentation. The presentation timestamp in a indicates when the frame SHOULD be rendered.

Random Access Point

A position in a where decoding and continuous playback can begin without relying on any previous data in the segment. For video this tends to be the location of I-frames. In the case of audio, most audio frames can be treated as a random access point. Since video tracks tend to have a more sparse distribution of random access points, the location of these points are usually considered the random access points for multiplexed streams.

SourceBuffer byte stream format specification

The specific that describes the format of the byte stream accepted by a SourceBuffer instance. The , for a SourceBuffer object, is selected based on the type passed to the call that created the object.

SourceBuffer configuration

A specific set of tracks distributed across one or more SourceBuffer objects owned by a single MediaSource instance.

Implementations MUST support at least 1 MediaSource object with the following configurations:

A single SourceBuffer with 1 audio track and/or 1 video track.
Two SourceBuffers with one handling a single audio track and the other handling a single video track.

MediaSource objects MUST support each of the configurations above, but they are only required to support one configuration at a time. Supporting multiple configurations at once or additional configurations is a quality of implementation issue.

Track Description

A byte stream format specific structure that provides the , codec configuration, and other metadata for a single track. Each track description inside a single has a unique . The user agent MUST run the if the is not unique within the .

Track ID

A Track ID is a byte stream format specific identifier that marks sections of the byte stream as being part of a specific track. The Track ID in a identifies which sections of a belong to that track.

MediaSource Object

The MediaSource object represents a source of media data for an HTMLMediaElement. It keeps track of the for this source as well as a list of SourceBuffer objects that can be used to add media data to the presentation. MediaSource objects are created by the web application and then attached to an HTMLMediaElement. The application uses the SourceBuffer objects in to add media data to this source. The HTMLMediaElement fetches this media data from the MediaSource object when it is needed during playback.

Each MediaSource object has a live seekable range variable that stores a . This variable is initialized to an empty object when the MediaSource object is created, is maintained by and , and is used in HTMLMediaElement Extensions to modify behavior.

enum ReadyState {
    "closed",
    "open",
    "ended"
};

Enumeration description
`closed`	Indicates the source is not currently attached to a media element.
`open`	The source has been opened by a media element and is ready for data to be appended to the SourceBuffer objects in .
`ended`	The source is still attached to a media element, but has been called.

enum EndOfStreamError {
    "network",
    "decode"
};

Enumeration description

Enumeration description
`network`	Terminates playback and signals that a network error has occured. JavaScript applications SHOULD use this status code to terminate playback with a network error. For example, if a network error occurs while fetching media data.
`decode`	Terminates playback and signals that a decoding error has occured. JavaScript applications SHOULD use this status code to terminate playback with a decode error. For example, if a parsing error occurs while processing out-of-band media data.

network

Terminates playback and signals that a network error has occured.

JavaScript applications SHOULD use this status code to terminate playback with a network error. For example, if a network error occurs while fetching media data.

decode

Terminates playback and signals that a decoding error has occured.

JavaScript applications SHOULD use this status code to terminate playback with a decode error. For example, if a parsing error occurs while processing out-of-band media data.

[Constructor]
interface MediaSource : EventTarget {
    readonly        attribute SourceBufferList    sourceBuffers;
    readonly        attribute SourceBufferList    activeSourceBuffers;
    readonly        attribute ReadyState          readyState;
                    attribute unrestricted double duration;
                    attribute EventHandler        onsourceopen;
                    attribute EventHandler        onsourceended;
                    attribute EventHandler        onsourceclose;
    SourceBuffer   addSourceBuffer (DOMString type);
    void           removeSourceBuffer (SourceBuffer sourceBuffer);
    void           endOfStream (optional EndOfStreamError error);
    void           setLiveSeekableRange (double start, double end);
    void           clearLiveSeekableRange ();
    static boolean isTypeSupported (DOMString type);
};

Attributes

sourceBuffers of type SourceBufferList, readonly

Contains the list of SourceBuffer objects associated with this MediaSource. When equals this list will be empty. Once transitions to SourceBuffer objects can be added to this list by using .

activeSourceBuffers of type SourceBufferList, readonly

Contains the subset of that are providing the selected video track, the enabled audio track(s), and the or text track(s).

SourceBuffer objects in this list MUST appear in the same order as they appear in the attribute; e.g., if only sourceBuffers[0] and sourceBuffers[3] are in , then activeSourceBuffers[0] MUST equal sourceBuffers[0] and activeSourceBuffers[1] MUST equal sourceBuffers[3].

The Changes to selected/enabled track state section describes how this attribute gets updated.

readyState of type ReadyState, readonly

Indicates the current state of the MediaSource object. When the MediaSource is created MUST be set to .

duration of type unrestricted double

Allows the web application to set the presentation duration. The duration is initially set to NaN when the MediaSource object is created.

On getting, run the following steps:

If the attribute is then return NaN and abort these steps.
Return the current value of the attribute.

On setting, run the following steps:

If the value being set is negative or NaN then throw a exception and abort these steps.
If the attribute is not then throw an exception and abort these steps.
If the attribute equals true on any SourceBuffer in , then throw an exception and abort these steps.
Run the with new duration set to the value being assigned to this attribute.
The will adjust new duration higher if there is any currently buffered coded frame with a higher end time.

and can update the duration under certain circumstances.

onsourceopen of type EventHandler

The event handler for the event.

onsourceended of type EventHandler

The event handler for the event.

onsourceclose of type EventHandler

The event handler for the event.

Methods

addSourceBuffer

Adds a new SourceBuffer to .

If type is an empty string then throw a exception and abort these steps.
If type contains a MIME type that is not supported or contains a MIME type that is not supported with the types specified for the other SourceBuffer objects in , then throw a exception and abort these steps.
If the user agent can't handle any more SourceBuffer objects or if creating a SourceBuffer based on type would result in an unsupported , then throw a exception and abort these steps.
For example, a user agent MAY throw a exception if the media element has reached the readyState. This can occur if the user agent's media engine does not support adding more tracks during playback.
If the attribute is not in the state then throw an exception and abort these steps.
Create a new SourceBuffer object and associated resources.
Set the on the new object to the value in the "Generate Timestamps Flag" column of the byte stream format registry [[MSE-REGISTRY]] entry that is associated with type.
If the equals true:

Set the attribute on the new object to .

Otherwise:

Set the attribute on the new object to .
Add the new object to and at .
Return the new object.

Parameter	Type	Nullable	Optional	Description
type	`DOMString`	✘	✘

Return type: SourceBuffer

removeSourceBuffer

Removes a SourceBuffer from .

If sourceBuffer specifies an object that is not in then throw a exception and abort these steps.
If the sourceBuffer. attribute equals true, then run the following steps:
1. Abort the algorithm if it is running.
2. Set the sourceBuffer. attribute to false.
3. at sourceBuffer.
4. at sourceBuffer.
Let SourceBuffer audioTracks list equal the object returned by sourceBuffer..
If the SourceBuffer audioTracks list is not empty, then run the following steps:
1. Let HTMLMediaElement audioTracks list equal the object returned by the attribute on the HTMLMediaElement.
2. For each object in the SourceBuffer audioTracks list, run the following steps:
  1. Set the attribute on the object to null.
  2. Remove the object from the HTMLMediaElement audioTracks list.
  3. Remove the object from the SourceBuffer audioTracks list.
Let SourceBuffer videoTracks list equal the object returned by sourceBuffer..
If the SourceBuffer videoTracks list is not empty, then run the following steps:
1. Let HTMLMediaElement videoTracks list equal the object returned by the attribute on the HTMLMediaElement.
2. For each object in the SourceBuffer videoTracks list, run the following steps:
  1. Set the attribute on the object to null.
  2. Remove the object from the HTMLMediaElement videoTracks list.
  3. Remove the object from the SourceBuffer videoTracks list.
Let SourceBuffer textTracks list equal the object returned by sourceBuffer..
If the SourceBuffer textTracks list is not empty, then run the following steps:
1. Let HTMLMediaElement textTracks list equal the object returned by the attribute on the HTMLMediaElement.
2. For each object in the SourceBuffer textTracks list, run the following steps:
  1. Set the attribute on the object to null.
  2. Remove the object from the HTMLMediaElement textTracks list.
  3. Remove the object from the SourceBuffer textTracks list.
If sourceBuffer is in , then remove sourceBuffer from and at the SourceBufferList returned by .
Remove sourceBuffer from and at the SourceBufferList returned by .
Destroy all resources for sourceBuffer.

Parameter	Type	Nullable	Optional	Description
sourceBuffer	`SourceBuffer`	✘	✘

Return type: void

endOfStream

Signals the end of the stream.

If the attribute is not in the state then throw an exception and abort these steps.
If the attribute equals true on any SourceBuffer in , then throw an exception and abort these steps.
Run the with the error parameter set to error.

Parameter	Type	Nullable	Optional	Description
error	`EndOfStreamError`	✘	✔

Return type: void

setLiveSeekableRange

Updates the variable used in HTMLMediaElement Extensions to modify behavior.

If the attribute is not then throw an exception and abort these steps.
If start is negative or greater than end, then throw a exception and abort these steps.
Set to be a new containing a single range whose start position is start and end position is end.

Parameter	Type	Nullable	Optional	Description
start	`double`	✘	✘	The start of the range, in seconds measured from . While set, and if equals positive Infinity, will return a non-empty TimeRanges object with a lowest range start timestamp no greater than `start`.
end	`double`	✘	✘	The end of range, in seconds measured from . While set, and if equals positive Infinity, will return a non-empty TimeRanges object with a highest range end timestamp no less than `end`.

Return type: void

clearLiveSeekableRange

Updates the variable used in HTMLMediaElement Extensions to modify behavior.

If the attribute is not then throw an exception and abort these steps.
If contains a range, then set to be a new empty object.

No parameters.

Return type: void

isTypeSupported, static

Check to see whether the MediaSource is capable of creating SourceBuffer objects for the specified MIME type.

If type is an empty string, then return false.
If type does not contain a valid MIME type string, then return false.
If type contains a media type or media subtype that the MediaSource does not support, then return false.
If type contains a codec that the MediaSource does not support, then return false.
If the MediaSource does not support the specified combination of media type, media subtype, and codecs then return false.
Return true.

If true is returned from this method, it only indicates that the MediaSource implementation is capable of creating SourceBuffer objects for the specified MIME type. An call SHOULD still fail if sufficient resources are not available to support the addition of a new SourceBuffer.

This method returning true implies that HTMLMediaElement.canPlayType() will return "maybe" or "probably" since it does not make sense for a MediaSource to support a type the HTMLMediaElement knows it cannot play.

Parameter	Type	Nullable	Optional	Description
type	`DOMString`	✘	✘

Return type: boolean

Event Summary

Event name	Interface	Dispatched when...
sourceopen	`Event`	transitions from to or from to .
sourceended	`Event`	transitions from to .
sourceclose	`Event`	transitions from to or to .

Algorithms

Attaching to a media element

A MediaSource object can be attached to a media element by assigning a to the media element attribute or the src attribute of a <source> inside a media element. A is created by passing a MediaSource object to .

If the was invoked with a media provider object that is a MediaSource object or a URL record whose object is a MediaSource object, then let mode be local, skip the first step in the (which may otherwise set mode to remote) and add the steps and clarifications below to the section of the .

The 's first step is expected to eventually align with selecting local mode for URL records whose objects are media provider objects. The intent is that if the HTMLMediaElement's src attribute or selected child <source>'s src attribute is a blob: URL matching a when the respective src attribute was last changed, then that MediaSource object is used as the media provider object and current media resource in the local mode logic in the . This also means that the remote mode logic that includes observance of any preload attribute is skipped when a MediaSource object is attached. Even with that eventual change to [[HTML51]], the execution of the following steps at the beginning of the local mode logic is still required when the current media resource is a MediaSource object.

Relative to the action which triggered the media element's resource selection algorithm, these steps are asynchronous. The resource fetch algorithm is run after the task that invoked the resource selection algorithm is allowed to continue and a stable state is reached. Implementations may delay the steps in the "Otherwise" clause, below, until the MediaSource object is ready for use.

If is NOT set to

Run the steps of the 's .

Otherwise

Set the media element's to false.
Set the attribute to .
at the MediaSource.
Continue the by running the remaining steps, with these clarifications:
1. Text in the or the that refers to "the download", "bytes received", or "whenever new data for the current media resource becomes available" refers to data passed in via .
2. References to HTTP in the and the do not apply because the HTMLMediaElement does not fetch media data via HTTP when a MediaSource is attached.

An attached MediaSource does not use the remote mode steps in the , so the media element will not fire "suspend" events. Though future versions of this specification will likely remove "progress" and "stalled" events from a media element with an attached MediaSource, user agents conforming to this version of the specification may still fire these two events as these [[HTML51]] references changed after implementations of this specification stabilized.

Detaching from a media element

The following steps are run in any case where the media element is going to transition to NETWORK_EMPTY and emptied at the media element. These steps SHOULD be run right before the transition.

Set the attribute to .
Update to NaN.
Remove all the SourceBuffer objects from .
at .
Remove all the SourceBuffer objects from .
at .
at the MediaSource.

Going forward, this algorithm is intended to be externally called and run in any case where the attached MediaSource, if any, must be detached from the media element. It MAY be called on HTMLMediaElement [[HTML51]] operations like load() and resource fetch algorithm failures in addition to, or in place of, when the media element transitions to NETWORK_EMPTY. Resource fetch algorithm failures are those which abort either the resource fetch algorithm or the resource selection algorithm, with the exception that the "Final step" [[HTML51]] is not considered a failure that triggers detachment.

Seeking

Run the following steps as part of the "Wait until the user agent has established whether or not the media data for the new playback position is available, and, if it is, until it has decoded enough data to play back that position" step of the :

The media element looks for containing the new playback position in each SourceBuffer object in . Any position within a in the current value of the attribute has all necessary media segments buffered for that position.

If new playback position is not in any of

If the attribute is greater than , then set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.

The media element waits until an call causes the to set the attribute to a value greater than .
The web application can use and to determine what the media element needs to resume playback.

Otherwise

Continue

If the attribute is and the new playback position is within a currently in , then the seek operation must continue to completion here even if one or more currently selected or enabled track buffers has a less than new playback position. This condition should only occur due to logic in when is .

The media element resets all decoders and initializes each one with data from the appropriate .
The media element feeds from the into the decoders starting with the closest before the new playback position.
Resume the at the "Await a stable state" step.

SourceBuffer Monitoring

The following steps are periodically run during playback to make sure that all of the SourceBuffer objects in have . Changes to also cause these steps to run because they affect the conditions that trigger state transitions.

Having enough data to ensure uninterrupted playback is an implementation specific condition where the user agent determines that it currently has enough data to play the presentation without stalling for a meaningful period of time. This condition is constantly evaluated to determine when to transition the media element into and out of the ready state. These transitions indicate when the user agent believes it has enough data buffered or it needs more data respectively.

An implementation MAY choose to use bytes buffered, time buffered, the append rate, or any other metric it sees fit to determine when it has enough data. The metrics used MAY change during playback so web applications SHOULD only rely on the value of to determine whether more data is needed or not.

When the media element needs more data, the user agent SHOULD transition it from to early enough for a web application to be able to respond without causing an interruption in playback. For example, transitioning when the current playback position is 500ms before the end of the buffered data gives the application roughly 500ms to append more data before playback stalls.

If the the attribute equals :

Abort these steps.

If does not contain a for the current playback position:

Set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.

Abort these steps.

If contains a that includes the current playback position and :

Set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.

Playback may resume at this point if it was previously suspended by a transition to .
Abort these steps.

If contains a that includes the current playback position and some time beyond the current playback position, then run the following steps:

Set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.

Playback may resume at this point if it was previously suspended by a transition to .
Abort these steps.

If contains a that ends at the current playback position and does not have a range covering the time immediately after the current position:

Set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.

Playback is suspended at this point since the media element doesn't have enough data to advance the .
Abort these steps.

Changes to selected/enabled track state

During playback needs to be updated if the selected video track, the enabled audio track(s), or a text track mode changes. When one or more of these changes occur the following steps need to be followed.

If the selected video track changes, then run the following steps:

If the SourceBuffer associated with the previously selected video track is not associated with any other enabled tracks, run the following steps:
1. Remove the SourceBuffer from .
2. at
If the SourceBuffer associated with the newly selected video track is not already in , run the following steps:
1. Add the SourceBuffer to .
2. at

If an audio track becomes disabled and the SourceBuffer associated with this track is not associated with any other enabled or selected track, then run the following steps:

Remove the SourceBuffer associated with the audio track from
at

If an audio track becomes enabled and the SourceBuffer associated with this track is not already in , then run the following steps:

Add the SourceBuffer associated with the audio track to
at

If a text track mode becomes and the SourceBuffer associated with this track is not associated with any other enabled or selected track, then run the following steps:

Remove the SourceBuffer associated with the text track from
at

If a text track mode becomes or and the SourceBuffer associated with this track is not already in , then run the following steps:

Add the SourceBuffer associated with the text track to
at

Duration change

Follow these steps when needs to change to a new duration.

If the current value of is equal to new duration, then return.
If new duration is less than the highest of any buffered for all SourceBuffer objects in , then throw an exception and abort these steps.

Duration reductions that would truncate currently buffered media are disallowed. When truncation is necessary, use to reduce the buffered range before updating .

Let highest end time be the largest end time across all the across all SourceBuffer objects in .
If new duration is less than highest end time, then
1. Update new duration to equal highest end time.
Update to new duration.
Update the to new duration and run the .

End of stream algorithm

This algorithm gets called when the application signals the end of stream via an call or an algorithm needs to signal a decode error. This algorithm takes an error parameter that indicates whether an error will be signalled.

Change the attribute value to .
at the MediaSource.
If error is not set
1. Run the with new duration set to the largest end time across all the across all SourceBuffer objects in .
2. Notify the media element that it now has all of the media data.
If error is set to

If the attribute equals

Run the steps of the 's .

If the attribute is greater than

Run the "If the connection is interrupted after some media data has been received, causing the user agent to give up trying to fetch the resource" steps of the 's .

If error is set to

If the attribute equals

Run the "If the media data can be fetched but is found by inspection to be in an unsupported format, or can otherwise not be rendered at all" steps of the 's .

If the attribute is greater than

Run the steps of the 's .

SourceBuffer Object

enum AppendMode {
    "segments",
    "sequence"
};

Enumeration description

Enumeration description
`segments`	The timestamps in the media segment determine where the are placed in the presentation. Media segments can be appended in any order.
`sequence`	Media segments will be treated as adjacent in time independent of the timestamps in the media segment. Coded frames in a new media segment will be placed immediately after the coded frames in the previous media segment. The attribute will be updated if a new offset is needed to make the new media segments adjacent to the previous media segment. Setting the attribute in mode allows a media segment to be placed at a specific position in the timeline without any knowledge of the timestamps in the media segment.

segments

The timestamps in the media segment determine where the are placed in the presentation. Media segments can be appended in any order.

sequence

Media segments will be treated as adjacent in time independent of the timestamps in the media segment. Coded frames in a new media segment will be placed immediately after the coded frames in the previous media segment. The attribute will be updated if a new offset is needed to make the new media segments adjacent to the previous media segment. Setting the attribute in mode allows a media segment to be placed at a specific position in the timeline without any knowledge of the timestamps in the media segment.

interface SourceBuffer : EventTarget {
                    attribute AppendMode          mode;
    readonly        attribute boolean             updating;
    readonly        attribute TimeRanges          buffered;
                    attribute double              timestampOffset;
    readonly        attribute AudioTrackList      audioTracks;
    readonly        attribute VideoTrackList      videoTracks;
    readonly        attribute TextTrackList       textTracks;
                    attribute double              appendWindowStart;
                    attribute unrestricted double appendWindowEnd;
                    attribute EventHandler        onupdatestart;
                    attribute EventHandler        onupdate;
                    attribute EventHandler        onupdateend;
                    attribute EventHandler        onerror;
                    attribute EventHandler        onabort;
    void appendBuffer (BufferSource data);
    void abort ();
    void remove (double start, unrestricted double end);
};

Attributes

mode of type AppendMode

Controls how a sequence of are handled. This attribute is initially set by after the object is created.

On getting, Return the initial value or the last value that was successfully set.

On setting, run the following steps:

If this object has been removed from the attribute of the , then throw an exception and abort these steps.
If the attribute equals true, then throw an exception and abort these steps.
Let new mode equal the new value being assigned to this attribute.
If equals true and new mode equals , then throw a exception and abort these steps.
If the attribute of the is in the state then run the following steps:
1. Set the attribute of the to
2. at the .
If the equals , then throw an and abort these steps.
If the new mode equals , then set the to the .
Update the attribute to new mode.

updating of type boolean, readonly

Indicates whether the asynchronous continuation of an or operation is still being processed. This attribute is initially set to false when the object is created.

buffered of type TimeRanges, readonly

Indicates what are buffered in the SourceBuffer. This attribute is initially set to an empty object when the object is created.

When the attribute is read the following steps MUST occur:

If this object has been removed from the attribute of the then throw an exception and abort these steps.
Let highest end time be the largest end time across all the managed by this SourceBuffer object.
Let intersection ranges equal a object containing a single range from 0 to highest end time.
For each audio and video managed by this SourceBuffer, run the following steps:
1. Let track ranges equal the for the current .
2. If is , then set the end time on the last range in track ranges to highest end time.
3. Let new intersection ranges equal the intersection between the intersection ranges and the track ranges.
4. Replace the ranges in intersection ranges with the new intersection ranges.
If intersection ranges does not contain the exact same range information as the current value of this attribute, then update the current value of this attribute to intersection ranges.
Return the current value of this attribute.

timestampOffset of type double

Controls the offset applied to timestamps inside subsequent that are appended to this SourceBuffer. The is initially set to 0 which indicates that no offset is being applied.

On getting, Return the initial value or the last value that was successfully set.

On setting, run the following steps:

Let new timestamp offset equal the new value being assigned to this attribute.
If this object has been removed from the attribute of the , then throw an exception and abort these steps.
If the attribute equals true, then throw an exception and abort these steps.
If the attribute of the is in the state then run the following steps:
1. Set the attribute of the to
2. at the .
If the equals , then throw an and abort these steps.
If the attribute equals , then set the to new timestamp offset.
Update the attribute to new timestamp offset.

audioTracks of type AudioTrackList, readonly

The list of objects created by this object.

videoTracks of type VideoTrackList, readonly

The list of objects created by this object.

textTracks of type TextTrackList, readonly

The list of objects created by this object.

appendWindowStart of type double

The for the start of the . This attribute is initially set to the .

On getting, Return the initial value or the last value that was successfully set.

On setting, run the following steps:

If this object has been removed from the attribute of the , then throw an exception and abort these steps.
If the attribute equals true, then throw an exception and abort these steps.
If the new value is less than 0 or greater than or equal to then throw a exception and abort these steps.
Update the attribute to the new value.

appendWindowEnd of type unrestricted double

The for the end of the . This attribute is initially set to positive Infinity.

On getting, Return the initial value or the last value that was successfully set.

On setting, run the following steps:

If this object has been removed from the attribute of the , then throw an exception and abort these steps.
If the attribute equals true, then throw an exception and abort these steps.
If the new value equals NaN, then throw a and abort these steps.
If the new value is less than or equal to then throw a exception and abort these steps.
Update the attribute to the new value.

onupdatestart of type EventHandler

The event handler for the event.

onupdate of type EventHandler

The event handler for the event.

onupdateend of type EventHandler

The event handler for the event.

onerror of type EventHandler

The event handler for the event.

onabort of type EventHandler

The event handler for the event.

Methods

appendBuffer

Appends the segment data in an BufferSource[[!WEBIDL]] to the source buffer.

Run the algorithm.
Add data to the end of the .
Set the attribute to true.
at this SourceBuffer object.
Asynchronously run the algorithm.

Parameter	Type	Nullable	Optional	Description
data	`BufferSource`	✘	✘

Return type: void

abort

Aborts the current segment and resets the segment parser.

If this object has been removed from the attribute of the then throw an exception and abort these steps.
If the attribute of the is not in the state then throw an exception and abort these steps.
If the algorithm is running, then throw an exception and abort these steps.
If the attribute equals true, then run the following steps:
1. Abort the algorithm if it is running.
2. Set the attribute to false.
3. at this SourceBuffer object.
4. at this SourceBuffer object.
Run the .
Set to the .
Set to positive Infinity.

No parameters.

Return type: void

remove

Removes media for a specific time range.

If this object has been removed from the attribute of the then throw an exception and abort these steps.
If the attribute equals true, then throw an exception and abort these steps.
If equals NaN, then throw a exception and abort these steps.
If start is negative or greater than , then throw a exception and abort these steps.
If end is less than or equal to start or end equals NaN, then throw a exception and abort these steps.
If the attribute of the is in the state then run the following steps:
1. Set the attribute of the to
2. at the .
Run the algorithm with start and end as the start and end of the removal range.

Parameter	Type	Nullable	Optional	Description
start	`double`	✘	✘	The start of the removal range, in seconds measured from .
end	`unrestricted double`	✘	✘	The end of the removal range, in seconds measured from .

Return type: void

Track Buffers

A track buffer stores the and for an individual track. The track buffer is updated as and are appended to the SourceBuffer.

Each has a last decode timestamp variable that stores the decode timestamp of the last appended in the current . The variable is initially unset to indicate that no have been appended yet.

Each has a last frame duration variable that stores the of the last appended in the current . The variable is initially unset to indicate that no have been appended yet.

Each has a highest end timestamp variable that stores the highest across all in the current that were appended to this track buffer. The variable is initially unset to indicate that no have been appended yet.

Each has a need random access point flag variable that keeps track of whether the track buffer is waiting for a . The variable is initially set to true to indicate that is needed before anything can be added to the .

Each has a track buffer ranges variable that represents the presentation time ranges occupied by the currently stored in the track buffer.

For track buffer ranges, these presentation time ranges are based on presentation timestamps, frame durations, and potentially coded frame group start times for coded frame groups across track buffers in a muxed SourceBuffer.

For specification purposes, this information is treated as if it were stored in a . Intersected are used to report , and MUST therefore support uninterrupted playback within each range of .

These coded frame group start times differ slightly from those mentioned in the in that they are the earliest across all track buffers following a discontinuity. Discontinuities can occur within the or result from the , regardless of . The threshold for determining disjointness of is implementation-specific. For example, to reduce unexpected playback stalls, implementations MAY approximate the 's discontinuity detection logic by coalescing adjacent ranges separated by a gap smaller than 2 times the maximum frame duration buffered so far in this . Implementations MAY also use coded frame group start times as range start times across track buffers in a muxed SourceBuffer to further reduce unexpected playback stalls.

Event Summary

Event name	Interface	Dispatched when...
updatestart	`Event`	transitions from false to true.
update	`Event`	The append or remove has successfully completed. transitions from true to false.
updateend	`Event`	The append or remove has ended.
error	`Event`	An error occurred during the append. transitions from true to false.
abort	`Event`	The append or remove was aborted by an call. transitions from true to false.

Algorithms

Segment Parser Loop

All SourceBuffer objects have an internal append state variable that keeps track of the high-level segment parsing state. It is initially set to and can transition to the following states as data is appended.

Append state name	Description
WAITING_FOR_SEGMENT	Waiting for the start of an or to be appended.
PARSING_INIT_SEGMENT	Currently parsing an .
PARSING_MEDIA_SEGMENT	Currently parsing a .

The input buffer is a byte buffer that is used to hold unparsed bytes across calls. The buffer is empty when the SourceBuffer object is created.

The buffer full flag keeps track of whether is allowed to accept more bytes. It is set to false when the SourceBuffer object is created and gets updated as data is appended and removed.

The group start timestamp variable keeps track of the starting timestamp for a new in the mode. It is unset when the SourceBuffer object is created and gets updated when the attribute equals and the attribute is set, or the runs.

The group end timestamp variable stores the highest across all in the current . It is set to 0 when the SourceBuffer object is created and gets updated by the .

The stores the highest across all in a SourceBuffer. Therefore, care should be taken in setting the attribute when appending multiplexed segments in which the timestamps are not aligned across tracks.

The generate timestamps flag is a boolean variable that keeps track of whether timestamps need to be generated for the passed to the . This flag is set by when the SourceBuffer object is created.

When the segment parser loop algorithm is invoked, run the following steps:

Loop Top: If the is empty, then jump to the need more data step below.
If the contains bytes that violate the , then run the and abort this algorithm.
Remove any bytes that the say MUST be ignored from the start of the .
If the equals , then run the following steps:
1. If the beginning of the indicates the start of an , set the to .
2. If the beginning of the indicates the start of a , set to .
3. Jump to the loop top step above.
If the equals , then run the following steps:
1. If the does not contain a complete yet, then jump to the need more data step below.
2. Run the .
3. Remove the bytes from the beginning of the .
4. Set to .
5. Jump to the loop top step above.
If the equals , then run the following steps:
1. If the is false, then run the and abort this algorithm.
2. If the contains one or more complete , then run the .
  The frequency at which the coded frame processing algorithm is run is implementation-specific. The coded frame processing algorithm MAY be called when the input buffer contains the complete media segment or it MAY be called multiple times as complete coded frames are added to the input buffer.
3. If this SourceBuffer is full and cannot accept more media data, then set the to true.
4. If the does not contain a complete , then jump to the need more data step below.
5. Remove the bytes from the beginning of the .
6. Set to .
7. Jump to the loop top step above.
Need more data: Return control to the calling algorithm.

Reset Parser State

When the parser state needs to be reset, run the following steps:

If the equals and the contains some complete , then run the until all of these complete have been processed.
Unset the on all .
Unset the on all .
Unset the on all .
Set the on all to true.
If the attribute equals , then set the to the
Remove all bytes from the .
Set to .

Append Error Algorithm

This algorithm is called when an error occurs during an append. This algorithm takes a decode error parameter that indicates whether should be called.

Run the .
Set the attribute to false.
at this SourceBuffer object.
at this SourceBuffer object.
If decode error is true, then run the with the error parameter set to .

Prepare Append Algorithm

When an append operation begins, the follow steps are run to validate and prepare the SourceBuffer.

If the SourceBuffer has been removed from the attribute of the then throw an exception and abort these steps.
If the attribute equals true, then throw an exception and abort these steps.
If the attribute is not null, then throw an exception and abort these steps.
If the attribute of the is in the state then run the following steps:
1. Set the attribute of the to
2. at the .
Run the .
If the equals true, then throw a exception and abort these step.

This is the signal that the implementation was unable to evict enough data to accommodate the append or the append is too big. The web application SHOULD use to explicitly free up space and/or reduce the size of the append.

Buffer Append Algorithm

When is called, the following steps are run to process the appended data.

Run the algorithm.
If the algorithm in the previous step was aborted, then abort this algorithm.
Set the attribute to false.
at this SourceBuffer object.
at this SourceBuffer object.

Range Removal

Follow these steps when a caller needs to initiate a JavaScript visible range removal operation that blocks other SourceBuffer updates:

Let start equal the starting for the removal range, in seconds measured from .
Let end equal the end for the removal range, in seconds measured from .
Set the attribute to true.
at this SourceBuffer object.
Return control to the caller and run the rest of the steps asynchronously.
Run the with start and end as the start and end of the removal range.
Set the attribute to false.
at this SourceBuffer object.
at this SourceBuffer object.

Initialization Segment Received

The following steps are run when the successfully parses a complete :

Each SourceBuffer object has an internal first initialization segment received flag that tracks whether the first has been appended and received by this algorithm. This flag is set to false when the SourceBuffer is created and updated by the algorithm below.

Update the attribute if it currently equals NaN:

If the initialization segment contains a duration:

Run the with new duration set to the duration in the initialization segment.

Otherwise:

Run the with new duration set to positive Infinity.
If the has no audio, video, or text tracks, then run the and abort these steps.
If the is true, then run the following steps:
1. Verify the following properties. If any of the checks fail then run the and abort these steps.
  - The number of audio, video, and text tracks match what was in the first .
  - The codecs for each track, match what was specified in the first .
  - If more than one track for a single type are present (e.g., 2 audio tracks), then the match the ones in the first .
2. Add the appropriate from this to each of the .
3. Set the on all track buffers to true.
Let active track flag equal false.
If the is false, then run the following steps:
1. If the contains tracks with codecs the user agent does not support, then run the and abort these steps.
  User agents MAY consider codecs, that would otherwise be supported, as "not supported" here if the codecs were not specified in the type parameter passed to .
  For example, MediaSource.isTypeSupported('video/webm;codecs="vp8,vorbis"') may return true, but if was called with 'video/webm;codecs="vp8"' and a Vorbis track appears in the , then the user agent MAY use this step to trigger a decode error.
2. For each audio track in the , run following steps:
  1. Let audio byte stream track ID be the for the current track being processed.
  2. Let audio language be a BCP 47 language tag for the language specified in the for this track or an empty string if no language info is present.
  3. If audio language equals the 'und' BCP 47 value, then assign an empty string to audio language.
  4. Let audio label be a label specified in the for this track or an empty string if no label info is present.
  5. Let audio kinds be a sequence of kind strings specified in the for this track or a sequence with a single empty string element in it if no kind information is provided.
  6. For each value in audio kinds, run the following steps:
    1. Let current audio kind equal the value from audio kinds for this iteration of the loop.
    2. Let new audio track be a new object.
    3. Generate a unique ID and assign it to the property on new audio track.
    4. Assign audio language to the property on new audio track.
    5. Assign audio label to the property on new audio track.
    6. Assign current audio kind to the property on new audio track.
    7. If . equals 0, then run the following steps:
      1. Set the property on new audio track to true.
      2. Set active track flag to true.
    8. Add new audio track to the attribute on this SourceBuffer object.
    9. Add new audio track to the attribute on the HTMLMediaElement.
  7. Create a new to store for this track.
  8. Add the for this track to the .
3. For each video track in the , run following steps:
  1. Let video byte stream track ID be the for the current track being processed.
  2. Let video language be a BCP 47 language tag for the language specified in the for this track or an empty string if no language info is present.
  3. If video language equals the 'und' BCP 47 value, then assign an empty string to video language.
  4. Let video label be a label specified in the for this track or an empty string if no label info is present.
  5. Let video kinds be a sequence of kind strings specified in the for this track or a sequence with a single empty string element in it if no kind information is provided.
  6. For each value in video kinds, run the following steps:
    1. Let current video kind equal the value from video kinds for this iteration of the loop.
    2. Let new video track be a new object.
    3. Generate a unique ID and assign it to the property on new video track.
    4. Assign video language to the property on new video track.
    5. Assign video label to the property on new video track.
    6. Assign current video kind to the property on new video track.
    7. If . equals 0, then run the following steps:
      1. Set the property on new video track to true.
      2. Set active track flag to true.
    8. Add new video track to the attribute on this SourceBuffer object.
    9. Add new video track to the attribute on the HTMLMediaElement.
  7. Create a new to store for this track.
  8. Add the for this track to the .
4. For each text track in the , run following steps:
  1. Let text byte stream track ID be the for the current track being processed.
  2. Let text language be a BCP 47 language tag for the language specified in the for this track or an empty string if no language info is present.
  3. If text language equals the 'und' BCP 47 value, then assign an empty string to text language.
  4. Let text label be a label specified in the for this track or an empty string if no label info is present.
  5. Let text kinds be a sequence of kind strings specified in the for this track or a sequence with a single empty string element in it if no kind information is provided.
  6. For each value in text kinds, run the following steps:
    1. Let current text kind equal the value from text kinds for this iteration of the loop.
    2. Let new text track be a new object.
    3. Generate a unique ID and assign it to the property on new text track.
    4. Assign text language to the property on new text track.
    5. Assign text label to the property on new text track.
    6. Assign current text kind to the property on new text track.
    7. Populate the remaining properties on new text track with the appropriate information from the .
    8. If the property on new text track equals or , then set active track flag to true.
    9. Add new text track to the attribute on this SourceBuffer object.
    10. Add new text track to the attribute on the HTMLMediaElement.
  7. Create a new to store for this track.
  8. Add the for this track to the .
5. If active track flag equals true, then run the following steps:
  1. Add this SourceBuffer to .
  2. at
6. Set to true.
If the attribute is , then run the following steps:
1. If one or more objects in have set to false, then abort these steps.
2. Set the attribute to .
If the active track flag equals true and the attribute is greater than , then set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.

Coded Frame Processing

When complete have been parsed by the then the following steps are run:

For each in the run the following steps:
1. Loop Top:
  If equals true:
  1. Let presentation timestamp equal 0.
  2. Let decode timestamp equal 0.
  Otherwise:
  1. Let presentation timestamp be a double precision floating point representation of the coded frame's in seconds.
    Special processing may be needed to determine the presentation and decode timestamps for timed text frames since this information may not be explicitly present in the underlying format or may be dependent on the order of the frames. Some metadata text tracks, like MPEG2-TS PSI data, may only have implied timestamps. Format specific rules for these situations SHOULD be in the or in separate extension specifications.
  2. Let decode timestamp be a double precision floating point representation of the coded frame's decode timestamp in seconds.
    Implementations don't have to internally store timestamps in a double precision floating point representation. This representation is used here because it is the represention for timestamps in the HTML spec. The intention here is to make the behavior clear without adding unnecessary complexity to the algorithm to deal with the fact that adding a timestampOffset may cause a timestamp rollover in the underlying timestamp representation used by the byte stream format. Implementations can use any internal timestamp representation they wish, but the addition of timestampOffset SHOULD behave in a similar manner to what would happen if a double precision floating point representation was used.
2. Let frame duration be a double precision floating point representation of the in seconds.
3. If equals and is set, then run the following steps:
  1. Set equal to - presentation timestamp.
  2. Set equal to .
  3. Set the on all to true.
  4. Unset .
4. If is not 0, then run the following steps:
  1. Add to the presentation timestamp.
  2. Add to the decode timestamp.
5. Let track buffer equal the that the coded frame will be added to.
6. If for track buffer is set and decode timestamp is less than :
  
  OR
  
  If for track buffer is set and the difference between decode timestamp and is greater than 2 times :
  1. If equals :
    
    Set to presentation timestamp.
    
    If equals :
    
    Set equal to the .
  2. Unset the on all .
  3. Unset the on all .
  4. Unset the on all .
  5. Set the on all to true.
  6. Jump to the Loop Top step above to restart processing of the current .
  Otherwise:
  
  Continue.
7. Let frame end timestamp equal the sum of presentation timestamp and frame duration.
8. If presentation timestamp is less than , then set the to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
  Some implementations MAY choose to collect some of these coded frames with presentation timestamp less than and use them to generate a splice at the first coded frame that has a greater than or equal to even if that frame is not a . Supporting this requires multiple decoders or faster than real-time decoding so for now this behavior will not be a normative requirement.
9. If frame end timestamp is greater than , then set the to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
  Some implementations MAY choose to collect coded frames with presentation timestamp less than and frame end timestamp greater than and use them to generate a splice across the portion of the collected coded frames within the append window at time of collection, and the beginning portion of later processed frames which only partially overlap the end of the collected coded frames. Supporting this requires multiple decoders or faster than real-time decoding so for now this behavior will not be a normative requirement. In conjunction with collecting coded frames that span , implementations MAY thus support gapless audio splicing.
10. If the on track buffer equals true, then run the following steps:
  1. If the coded frame is not a , then drop the coded frame and jump to the top of the loop to start processing the next coded frame.
  2. Set the on track buffer to false.
11. Let spliced audio frame be an unset variable for holding audio splice information
12. Let spliced timed text frame be an unset variable for holding timed text splice information
13. If for track buffer is unset and presentation timestamp falls within the of a in track buffer, then run the following steps:
  1. Let overlapped frame be the in track buffer that matches the condition above.
  2. If track buffer contains audio :
    
    Run the and if a splice frame is returned, assign it to spliced audio frame.
    
    If track buffer contains video :
    Let remove window timestamp equal the overlapped frame plus 1 microsecond.
    
    If the presentation timestamp is less than the remove window timestamp, then remove overlapped frame from track buffer.
    This is to compensate for minor errors in frame timestamp computations that can appear when converting back and forth between double precision floating point numbers and rationals. This tolerance allows a frame to replace an existing one as long as it is within 1 microsecond of the existing frame's start time. Frames that come slightly before an existing frame are handled by the removal step below.
    If track buffer contains timed text :
    
    Run the and if a splice frame is returned, assign it to spliced timed text frame.
14. Remove existing coded frames in track buffer:
  
  If for track buffer is not set:
  
  Remove all from track buffer that have a greater than or equal to presentation timestamp and less than frame end timestamp.
  
  If for track buffer is set and less than or equal to presentation timestamp:
  
  Remove all from track buffer that have a greater than or equal to and less than frame end timestamp
15. Remove all possible decoding dependencies on the removed in the previous two steps by removing all from track buffer between those frames removed in the previous two steps and the next after those removed frames.
  Removing all until the next is a conservative estimate of the decoding dependencies since it assumes all frames between the removed frames and the next random access point depended on the frames that were removed.
16. If spliced audio frame is set:
  
  Add spliced audio frame to the track buffer.
  
  If spliced timed text frame is set:
  
  Add spliced timed text frame to the track buffer.
  
  Otherwise:
  
  Add the with the presentation timestamp, decode timestamp, and frame duration to the track buffer.
17. Set for track buffer to decode timestamp.
18. Set for track buffer to frame duration.
19. If for track buffer is unset or frame end timestamp is greater than , then set for track buffer to frame end timestamp.
  The greater than check is needed because bidirectional prediction between coded frames can cause presentation timestamp to not be monotonically increasing even though the decode timestamps are monotonically increasing.
20. If frame end timestamp is greater than , then set equal to frame end timestamp.
21. If equals true, then set equal to frame end timestamp.
If the attribute is and the new cause to have a for the current playback position, then set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.
If the attribute is and the new cause to have a that includes the current playback position and some time beyond the current playback position, then set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.
If the attribute is and the new cause to have a that includes the current playback position and , then set the attribute to .

Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.
If the contains data beyond the current , then run the with new duration set to the maximum of the current duration and the .

Coded Frame Removal Algorithm

Follow these steps when for a specific time range need to be removed from the SourceBuffer:

Let start be the starting for the removal range.
Let end be the end for the removal range.
For each in this source buffer, run the following steps:
1. Let remove end timestamp be the current value of
2. If this has a timestamp that is greater than or equal to end, then update remove end timestamp to that random access point timestamp.
  
  Random access point timestamps can be different across tracks because the dependencies between within a track are usually different than the dependencies in another track.
3. Remove all media data, from this , that contain starting timestamps greater than or equal to start and less than the remove end timestamp.
4. Remove all possible decoding dependencies on the removed in the previous step by removing all from this between those frames removed in the previous step and the next after those removed frames.
  Removing all until the next is a conservative estimate of the decoding dependencies since it assumes all frames between the removed frames and the next random access point depended on the frames that were removed.
5. If this object is in , the is greater than or equal to start and less than the remove end timestamp, and is greater than , then set the attribute to and stall playback.
  
  Per [[HTML51]] logic, changes may trigger events on the HTMLMediaElement.
  
  This transition occurs because media data for the current position has been removed. Playback cannot progress until media for the is appended or the selected/enabled tracks change.
If equals true and this object is ready to accept more bytes, then set the to false.

Coded Frame Eviction Algorithm

This algorithm is run to free up space in this source buffer when new data is appended.

Let new data equal the data that is about to be appended to this SourceBuffer.
If the equals false, then abort these steps.
Let removal ranges equal a list of presentation time ranges that can be evicted from the presentation to make room for the new data.
Implementations MAY use different methods for selecting removal ranges so web applications SHOULD NOT depend on a specific behavior. The web application can use the attribute to observe whether portions of the buffered data have been evicted.
For each range in removal ranges, run the with start and end equal to the removal range start and end timestamp respectively.

Audio Splice Frame Algorithm

Follow these steps when the needs to generate a splice frame for two overlapping audio :

Let track buffer be the that will contain the splice.
Let new coded frame be the new , that is being added to track buffer, which triggered the need for a splice.
Let presentation timestamp be the for new coded frame
Let decode timestamp be the decode timestamp for new coded frame.
Let frame duration be the of new coded frame.
Let overlapped frame be the in track buffer with a that contains presentation timestamp.
Update presentation timestamp and decode timestamp to the nearest audio sample timestamp based on sample rate of the audio in overlapped frame. If a timestamp is equidistant from both audio sample timestamps, then use the higher timestamp (e.g., floor(x * sample_rate + 0.5) / sample_rate).
For example, given the following values:
- The of overlapped frame equals 10.
- The sample rate of overlapped frame equals 8000 Hz
- presentation timestamp equals 10.01255
- decode timestamp equals 10.01255
presentation timestamp and decode timestamp are updated to 10.0125 since 10.01255 is closer to 10 + 100/8000 (10.0125) than 10 + 101/8000 (10.012625)
If the user agent does not support crossfading then run the following steps:
1. Remove overlapped frame from track buffer.
2. Add a silence frame to track buffer with the following properties:
  - The set to the overlapped frame .
  - The set to the overlapped frame .
  - The set to difference between presentation timestamp and the overlapped frame .
  Some implementations MAY apply fades to/from silence to coded frames on either side of the inserted silence to make the transition less jarring.
3. Return to caller without providing a splice frame.
  This is intended to allow new coded frame to be added to the track buffer as if overlapped frame had not been in the track buffer to begin with.
Let frame end timestamp equal the sum of presentation timestamp and frame duration.
Let splice end timestamp equal the sum of presentation timestamp and the splice duration of 5 milliseconds.
Let fade out coded frames equal overlapped frame as well as any additional frames in track buffer that have a greater than presentation timestamp and less than splice end timestamp.
Remove all the frames included in fade out coded frames from track buffer.
Return a splice frame with the following properties:
- The set to the overlapped frame .
- The set to the overlapped frame .
- The set to difference between frame end timestamp and the overlapped frame .
- The fade out coded frames equals fade-out coded frames.
- The fade in coded frame equal new coded frame.
  If the new coded frame is less than 5 milliseconds in duration, then coded frames that are appended after the new coded frame will be needed to properly render the splice.
- The splice timestamp equals presentation timestamp.
See the for details on how this splice frame is rendered.

Audio Splice Rendering Algorithm

The following steps are run when a spliced frame, generated by the , needs to be rendered by the media element:

Let fade out coded frames be the that are faded out during the splice.
Let fade in coded frames be the that are faded in during the splice.
Let presentation timestamp be the of the first coded frame in fade out coded frames.
Let end timestamp be the sum of the and the of the last frame in fade in coded frames.
Let splice timestamp be the where the splice starts. This corresponds with the of the first frame in fade in coded frames.
Let splice end timestamp equal splice timestamp plus five milliseconds.
Let fade out samples be the samples generated by decoding fade out coded frames.
Trim fade out samples so that it only contains samples between presentation timestamp and splice end timestamp.
Let fade in samples be the samples generated by decoding fade in coded frames.
If fade out samples and fade in samples do not have a common sample rate and channel layout, then convert fade out samples and fade in samples to a common sample rate and channel layout.
Let output samples be a buffer to hold the output samples.
Apply a linear gain fade out with a starting gain of 1 and an ending gain of 0 to the samples between splice timestamp and splice end timestamp in fade out samples.
Apply a linear gain fade in with a starting gain of 0 and an ending gain of 1 to the samples between splice timestamp and splice end timestamp in fade in samples.
Copy samples between presentation timestamp to splice timestamp from fade out samples into output samples.
For each sample between splice timestamp and splice end timestamp, compute the sum of a sample from fade out samples and the corresponding sample in fade in samples and store the result in output samples.
Copy samples between splice end timestamp to end timestamp from fade in samples into output samples.
Render output samples.

Here is a graphical representation of this algorithm.

Text Splice Frame Algorithm

Follow these steps when the needs to generate a splice frame for two overlapping timed text :

Let track buffer be the that will contain the splice.
Let new coded frame be the new , that is being added to track buffer, which triggered the need for a splice.
Let presentation timestamp be the for new coded frame
Let decode timestamp be the decode timestamp for new coded frame.
Let frame duration be the of new coded frame.
Let frame end timestamp equal the sum of presentation timestamp and frame duration.
Let first overlapped frame be the in track buffer with a that contains presentation timestamp.
Let overlapped presentation timestamp be the of the first overlapped frame.
Let overlapped frames equal first overlapped frame as well as any additional frames in track buffer that have a greater than presentation timestamp and less than frame end timestamp.
Remove all the frames included in overlapped frames from track buffer.
Update the of the first overlapped frame to presentation timestamp - overlapped presentation timestamp.
Add first overlapped frame to the track buffer.
Return to caller without providing a splice frame.
This is intended to allow new coded frame to be added to the track buffer as if it hadn't overlapped any frames in track buffer to begin with.

SourceBufferList Object

SourceBufferList is a simple container object for SourceBuffer objects. It provides read-only array access and fires events when the list is modified.

interface SourceBufferList : EventTarget {
    readonly        attribute unsigned long length;
                    attribute EventHandler  onaddsourcebuffer;
                    attribute EventHandler  onremovesourcebuffer;
    getter SourceBuffer (unsigned long index);
};

Attributes

length of type unsigned long, readonly: Indicates the number of SourceBuffer objects in the list.
onaddsourcebuffer of type EventHandler: The event handler for the event.
onremovesourcebuffer of type EventHandler: The event handler for the event.

Methods

getter

Allows the SourceBuffer objects in the list to be accessed with an array operator (i.e., []).

If index is greater than or equal to the attribute then return undefined and abort these steps.
Return the index'th SourceBuffer object in the list.

Parameter	Type	Nullable	Optional	Description
index	`unsigned long`	✘	✘

Return type: SourceBuffer

Event Summary

Event name	Interface	Dispatched when...
addsourcebuffer	`Event`	When a SourceBuffer is added to the list.
removesourcebuffer	`Event`	When a SourceBuffer is removed from the list.

URL Object Extensions

This section specifies extensions to the [[!FILE-API]] object definition.

[Exposed=Window,DedicatedWorker,SharedWorker]
partial interface URL {
    static DOMString createObjectURL (MediaSource mediaSource);
};

Methods

createObjectURL, static

Creates URLs for MediaSource objects.

Return a unique that can be used to dereference the mediaSource argument.

This algorithm is intended to mirror the behavior of the [[!FILE-API]] method, which does not auto-revoke the created URL. Web authors are encouraged to use [[!FILE-API]] for any that is no longer needed for attachment to a media element.

Parameter	Type	Nullable	Optional	Description
mediaSource	`MediaSource`	✘	✘

Return type: DOMString

HTMLMediaElement Extensions

This section specifies what existing attributes on the MUST return when a MediaSource is attached to the element.

The attribute returns a new static created based on the following steps:

If equals NaN:

Return an empty object.

If equals positive Infinity:

If is not empty:
1. Let union ranges be the union of and the attribute.
2. Return a single range with a start time equal to the earliest start time in union ranges and an end time equal to the highest end time in union ranges and abort these steps.
If the attribute returns an empty object, then return an empty object and abort these steps.
Return a single range with a start time of 0 and an end time equal to the highest end time reported by the attribute.

Otherwise:

Return a single range with a start time of 0 and an end time equal to .

The attribute returns a static based on the following steps.

Let intersection ranges equal an empty object.
If .length does not equal 0 then run the following steps:
1. Let active ranges be the ranges returned by for each SourceBuffer object in .
2. Let highest end time be the largest range end time in the active ranges.
3. Let intersection ranges equal a object containing a single range from 0 to highest end time.
4. For each SourceBuffer object in run the following steps:
  1. Let source ranges equal the ranges returned by the attribute on the current SourceBuffer.
  2. If is , then set the end time on the last range in source ranges to highest end time.
  3. Let new intersection ranges equal the intersection between the intersection ranges and the source ranges.
  4. Replace the ranges in intersection ranges with the new intersection ranges.
If the current value of this attribute has not been set by this algorithm or intersection ranges does not contain the exact same range information as the current value of this attribute, then update the current value of this attribute to intersection ranges.
Return the current value of this attribute.

AudioTrack Extensions

This section specifies extensions to the HTML definition.

partial interface AudioTrack {
    readonly        attribute SourceBuffer? sourceBuffer;
};

Attributes

sourceBuffer of type SourceBuffer, readonly , nullable: Returns the SourceBuffer that created this track. Returns null if this track was not created by a SourceBuffer or the SourceBuffer has been removed from the attribute of its .

VideoTrack Extensions

This section specifies extensions to the HTML definition.

partial interface VideoTrack {
    readonly        attribute SourceBuffer? sourceBuffer;
};

Attributes

sourceBuffer of type SourceBuffer, readonly , nullable: Returns the SourceBuffer that created this track. Returns null if this track was not created by a SourceBuffer or the SourceBuffer has been removed from the attribute of its .

TextTrack Extensions

This section specifies extensions to the HTML definition.

partial interface TextTrack {
    readonly        attribute SourceBuffer? sourceBuffer;
};

Attributes

sourceBuffer of type SourceBuffer, readonly , nullable: Returns the SourceBuffer that created this track. Returns null if this track was not created by a SourceBuffer or the SourceBuffer has been removed from the attribute of its .

Byte Stream Formats

The bytes provided through for a SourceBuffer form a logical byte stream. The format and semantics of these byte streams are defined in byte stream format specifications. The byte stream format registry [[MSE-REGISTRY]] provides mappings between a MIME type that may be passed to or and the byte stream format expected by a SourceBuffer created with that MIME type. Implementations are encouraged to register mappings for byte stream formats they support to facilitate interoperability. The byte stream format registry [[MSE-REGISTRY]] is the authoritative source for these mappings. If an implementation claims to support a MIME type listed in the registry, its SourceBuffer implementation MUST conform to the listed in the registry entry.

The byte stream format specifications in the registry are not intended to define new storage formats. They simply outline the subset of existing storage format structures that implementations of this specification will accept.

Byte stream format parsing and validation is implemented in the algorithm.

This section provides general requirements for all byte stream format specifications:

A byte stream format specification MUST define and .
A byte stream format SHOULD provide references for sourcing AudioTrack, VideoTrack, and TextTrack attribute values from data in .
If the byte stream format covers a format similar to one covered in the in-band tracks spec [[INBANDTRACKS]], then it SHOULD try to use the same attribute mappings so that Media Source Extensions playback and non-Media Source Extensions playback provide the same track information.
It MUST be possible to identify segment boundaries and segment type (initialization or media) by examining the byte stream alone.
The user agent MUST run the when any of the following conditions are met:
1. The number and type of tracks are not consistent.
  
  For example, if the first has 2 audio tracks and 1 video track, then all that follow it in the byte stream MUST describe 2 audio tracks and 1 video track.
2. are not the same across , for segments describing multiple tracks of a single type (e.g., 2 audio tracks).
3. Codecs changes across .
  
  For example, a byte stream that starts with an that specifies a single AAC track and later contains an that specifies a single AMR-WB track is not allowed. Support for multiple codecs is handled with multiple SourceBuffer objects.
The user agent MUST support the following:
1. changing across if the segments describes only one track of each type.
2. Video frame size changes. The user agent MUST support seamless playback.
  
  This will cause the <video> display region to change size if the web application does not use CSS or HTML attributes (width/height) to constrain the element size.
3. Audio channel count changes. The user agent MAY support this seamlessly and could trigger downmixing.
  
  This is a quality of implementation issue because changing the channel count may require reinitializing the audio device, resamplers, and channel mixers which tends to be audible.
The following rules apply to all within a byte stream. A user agent MUST:
1. Map all timestamps to the same .
2. Support seamless playback of having a timestamp gap smaller than the audio frame size. User agents MUST NOT reflect these gaps in the attribute.
  This is intended to simplify switching between audio streams where the frame boundaries don't always line up across encodings (e.g., Vorbis).
The user agent MUST run the when any combination of an and any contiguous sequence of satisfies the following conditions:
1. The number and type (audio, video, text, etc.) of all tracks in the are not identified.
2. The decoding capabilities needed to decode each track (i.e., codec and codec parameters) are not provided.
3. Encryption parameters necessary to decrypt the content (except the encryption key itself) are not provided for all encrypted tracks.
4. All information necessary to decode and render the earliest in the sequence of and all subsequence samples in the sequence (in presentation time) are not provided. This includes in particular,
  - Information that determines the of the video (specifically, this requires either the picture or pixel aspect ratio, together with the encoded resolution).
  - Information necessary to convert the video decoder output to a format suitable for display
5. Information necessary to compute the global of every sample in the sequence of is not provided.
For example, if I1 is associated with M1, M2, M3 then the above MUST hold for all the combinations I1+M1, I1+M2, I1+M1+M2, I1+M2+M3, etc.

Byte stream specifications MUST at a minimum define constraints which ensure that the above requirements hold. Additional constraints MAY be defined, for example to simplify implementation.

Examples

Example use of the Media Source Extensions

<script>
  function onSourceOpen(videoTag, e) {
    var mediaSource = e.target;

    if (mediaSource.sourceBuffers.length > 0)
        return;

    var sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vorbis,vp8"');

    videoTag.addEventListener('seeking', onSeeking.bind(videoTag, mediaSource));
    videoTag.addEventListener('progress', onProgress.bind(videoTag, mediaSource));

    var initSegment = GetInitializationSegment();

    if (initSegment == null) {
      // Error fetching the initialization segment. Signal end of stream with an error.
      mediaSource.endOfStream("network");
      return;
    }

    // Append the initialization segment.
    var firstAppendHandler = function(e) {
      var sourceBuffer = e.target;
      sourceBuffer.removeEventListener('updateend', firstAppendHandler);

      // Append some initial media data.
      appendNextMediaSegment(mediaSource);
    };
    sourceBuffer.addEventListener('updateend', firstAppendHandler);
    sourceBuffer.appendBuffer(initSegment);
  }

  function appendNextMediaSegment(mediaSource) {
    if (mediaSource.readyState == "closed")
      return;

    // If we have run out of stream data, then signal end of stream.
    if (!HaveMoreMediaSegments()) {
      mediaSource.endOfStream();
      return;
    }

    // Make sure the previous append is not still pending.
    if (mediaSource.sourceBuffers[0].updating)
        return;

    var mediaSegment = GetNextMediaSegment();

    if (!mediaSegment) {
      // Error fetching the next media segment.
      mediaSource.endOfStream("network");
      return;
    }

    // NOTE: If mediaSource.readyState == “ended”, this appendBuffer() call will
    // cause mediaSource.readyState to transition to "open". The web application
    // should be prepared to handle multiple “sourceopen” events.
    mediaSource.sourceBuffers[0].appendBuffer(mediaSegment);
  }

  function onSeeking(mediaSource, e) {
    var video = e.target;

    if (mediaSource.readyState == "open") {
      // Abort current segment append.
      mediaSource.sourceBuffers[0].abort();
    }

    // Notify the media segment loading code to start fetching data at the
    // new playback position.
    SeekToMediaSegmentAt(video.currentTime);

    // Append a media segment from the new playback position.
    appendNextMediaSegment(mediaSource);
  }

  function onProgress(mediaSource, e) {
    appendNextMediaSegment(mediaSource);
  }
</script>

<video id="v" autoplay> </video>

<script>
  var video = document.getElementById('v');
  var mediaSource = new MediaSource();
  mediaSource.addEventListener('sourceopen', onSourceOpen.bind(this, video));
  video.src = window.URL.createObjectURL(mediaSource);
</script>

Acknowledgments

The editors would like to thank for their contributions to this specification.

Introduction

Goals

Definitions

MediaSource Object

Attributes

Methods

Event Summary

Algorithms

Attaching to a media element

Detaching from a media element

Seeking

SourceBuffer Monitoring

Changes to selected/enabled track state

Duration change

End of stream algorithm

SourceBuffer Object

Attributes

Methods

Track Buffers

Event Summary

Algorithms

Segment Parser Loop

Reset Parser State

Append Error Algorithm

Prepare Append Algorithm

Buffer Append Algorithm

Range Removal

Initialization Segment Received

Coded Frame Processing

Coded Frame Removal Algorithm

Coded Frame Eviction Algorithm

Audio Splice Frame Algorithm

Audio Splice Rendering Algorithm

Text Splice Frame Algorithm

SourceBufferList Object

Attributes

Methods

Event Summary

URL Object Extensions

Methods

HTMLMediaElement Extensions

AudioTrack Extensions

Attributes

VideoTrack Extensions

Attributes

TextTrack Extensions

Attributes

Byte Stream Formats

Examples

Acknowledgments

VideoPlaybackQuality