Comments on this document are welcomed.
This document specifies the takePhoto() and grabFrame() methods, and corresponding camera settings for use with MediaStreamTracks (as defined in Media Capture and Streams [[!GETUSERMEDIA]]).
Introduction ------------

The API defined in this document captures images from a valid MediaStreamTrack. The produced image can be in the form of a Blob (as defined in [[!FILE-API]]) or as an ImageBitmap (as defined in [[!HTML51]])). The source image is provided by the capture device that provides the MediaStreamTrack. Moreover, picture-specific settings can be optionally provided as arguments that can be applied to the device for the capture.

Image Capture API

The User Agent must support Promises in order to implement the Image Capture API. Any Promise object is assumed to have resolver object, with resolve() and reject() methods associated with it.

       [Constructor(MediaStreamTrack track)]
        interface ImageCapture {
          readonly attribute MediaStreamTrack videoStreamTrack;
          Promise<Blob>                 takePhoto ();
          Promise<PhotoCapabilities>    getPhotoCapabilities ();
          Promise<void>                 setOptions (PhotoSettings? photoSettings);
          Promise<ImageBitmap>          grabFrame ();
        };
      

Constructors

ImageCapture
Parameter Type Nullable Optional Description
track MediaStreamTrack The MediaStreamTrack to be used as source of data. This will be the value of the videoStreamTrack attribute. The MediaStreamTrack passed to the constructor MUST have its kind attribute set to "video" otherwise a DOMException of type NotSupportedError will be thrown.

Attributes

videoStreamTrack of type MediaStreamTrack, readonly
The MediaStreamTrack passed into the constructor.

Methods

takePhoto
takePhoto() produces the result of a single photographic exposure using the video capture device sourcing the videoStreamTrack, applying any PhotoSettings previously configured, and returning an encoded image in the form of a Blob if successful. When this method is invoked:
  1. If the readyState of the MediaStreamTrack provided in the constructor is not `live`, throw a new DOMException ([[!WebIDL]]) whose name is "InvalidStateError". Otherwise:
  2. Otherwise it MUST queue a task, using the DOM manipulation task source, that runs the following steps:
    1. Gather data from the MediaStreamTrack into a Blob containing a single still image. The method of doing this will depend on the underlying device. Devices may temporarily stop streaming data, reconfigure themselves with the appropriate photo settings, take the photo, and then resume streaming. In this case, the stopping and restarting of streaming SHOULD cause mute and unmute events to fire on the Track in question.
    2. If the UA is unable to execute the takePhoto() method for any reason (for example, upon invocation of multiple takePhoto() method calls in rapid succession), then the UA MUST return a promise rejected with a new DOMException ([[!WebIDL]]) whose name is "UnknownError".
    3. Return a resolved promise with the Blob object.
getPhotoCapabilities
When getPhotoCapabilities() is used to retrieve the ranges of available configuration options and their current setting values, if any. When this method is invoked:
  1. If the readyState of the MediaStreamTrack provided in the constructor is not `live`, throw a new DOMException ([[!WebIDL]]) whose name is "InvalidStateError".
  2. Otherwise it MUST queue a task, using the DOM manipulation task source, that runs the following steps:
    1. Gather data from the MediaStreamTrack into a PhotoCapabilities object containing the available capabilities of the device, including ranges where appropriate. The resolved PhotoCapabilities will also include the current conditions in which the capabilities of the device are found. The method of doing this will depend on the underlying device.
    2. If the UA is unable to execute the getPhotoCapabilities() method for any reason (for example, the MediaStreamTrack being ended asynchronously), then the UA MUST return a promise rejected with a new DOMException ([[!WebIDL]]) whose name is "OperationError".
    3. Return a resolved promise with the PhotoCapabilities object.
setOptions
setOptions() is used to configure a number of settings affecting the image capture and/or the current video feed in videoStreamTrack. When this method is invoked:
  1. If the readyState of the MediaStreamTrack provided in the constructor is not `live`, throw a new DOMException ([[!WebIDL]]) whose name is "InvalidStateError".
  2. If an invalid PhotoSettings object is passed as argument, throw a new DOMException ([[!WebIDL]]) whose name is "SyntaxError"
  3. Otherwise it MUST queue a task, using the DOM manipulation task source, that runs the following steps:
    1. Configure the underlying image capture device with the settings parameter.
    2. If the UA cannot successfully apply the settings, then the UA MUST return a promise rejected with a new DOMException ([[!WebIDL]]) whose name is "OperationError".
    3. If the UA can successfully apply the settings, then the UA MUST return a resolved promise.
    4. If the UA can successfully apply the settings, the effect MAY be reflected, if visible at all, in videoStreamTrack.
Parameter Type Nullable Optional Description
settings PhotoSettings The PhotoSettings dictionary to be applied.
grabFrame
grabFrame() takes a snapshot of the live video being held in the videoStreamTrack, returning an ImageBitmap if successful. When this method is invoked:
  1. If the readyState of the MediaStreamTrack provided in the constructor is not `live`, throw a new DOMException ([[!WebIDL]]) whose name is "InvalidStateError". Otherwise:
  2. Otherwise it MUST queue a task, using the DOM manipulation task source, that runs the following steps:
    1. Gather data from the MediaStreamTrack into an ImageBitmap object (as defined in [[!HTML51]]). The width and height of the ImageBitmap object are derived from the constraints of the MediaStreamTrack.
    2. Returns a resolved promise with a newly created ImageBitmap object. (Note: grabFrame() returns data only once upon being invoked).
    3. If the UA is unable to execute the takePhoto() method for any reason (for example, upon invocation of multiple takePhoto() method calls in rapid succession), then the UA MUST return a promise rejected with a new DOMException ([[!WebIDL]]) whose name is "UnknownError".

PhotoCapabilities

        interface PhotoCapabilities {
          readonly attribute MeteringMode       whiteBalanceMode;
          readonly attribute MediaSettingsRange colorTemperature;
          readonly attribute MeteringMode       exposureMode;
          readonly attribute MediaSettingsRange exposureCompensation;
          readonly attribute MediaSettingsRange iso;
          readonly attribute boolean            redEyeReduction;
          readonly attribute MeteringMode       focusMode;

          readonly attribute MediaSettingsRange brightness;
          readonly attribute MediaSettingsRange contrast;
          readonly attribute MediaSettingsRange saturation;
          readonly attribute MediaSettingsRange sharpness;
          readonly attribute MediaSettingsRange imageHeight;
          readonly attribute MediaSettingsRange imageWidth;
          readonly attribute MediaSettingsRange zoom;
          readonly attribute FillLightMode      fillLightMode;
        };
      

Attributes

whiteBalanceMode of type MeteringMode
This reflects the current white balance mode setting.
colorTemperature of type MediaSettingsRange
This range reflects the current correlated color temperature being used for the scene white balance calculation and its available range.
exposureMode of type MeteringMode
This reflects the current exposure mode setting.
exposureCompensation of type MediaSettingsRange
This reflects the current exposure compensation setting and permitted range. Values are signed integers multiplied by 100 (to avoid using floating point). The supported range can be, and usually is, centered around 0 EV.
iso of type MediaSettingsRange
This reflects the current camera ISO setting and permitted range. Values are numeric.
redEyeReduction of type boolean
This reflects whether camera red eye reduction is on or off, and is boolean - on is true
focusMode of type MeteringMode
This reflects the current focus mode setting.
brightness of type MediaSettingsRange
This reflects the current brightness setting of the camera and permitted range. Values are numeric.
contrast of type MediaSettingsRange
This reflects the current contrast setting of the camera and permitted range. Values are numeric.
saturation of type MediaSettingsRange
This reflects the current saturation setting of the camera and permitted range. Values are numeric.
sharpness of type MediaSettingsRange
This reflects the current sharpness setting of the camera and permitted range. Values are numeric.
imageHeight of type MediaSettingsRange
This reflects the image height range supported by the UA and the current height setting.
imageWidth of type MediaSettingsRange
This reflects the image width range supported by the UA and the current width setting.
zoom of type MediaSettingsRange
This reflects the zoom value range supported by the UA and the current zoom setting.
fillLightMode of type FillLightMode
This reflects the current fill light (flash) mode setting. Values are of type FillLightMode.
The supported resolutions are presented as segregated imageWidth and imageHeight ranges to prevent increasing the fingerprinting surface and to allow the UA to make a best-effort decision with regards to actual hardware configuration.

Discussion

The PhotoCapabilities interface provides the photo-specific settings options and current settings values. The following definitions are assumed for individual settings and are provided for information purposes:

  1. White balance mode is a setting that cameras use to adjust for different color temperatures. Color temperature is the temperature of background light (usually measured in Kelvin). This setting can usually be automatically and continuously determined by the implementation, but it's also common to offer a `manual` mode in which the estimated temperature of the scene illumination is hinted to the implementation. Typical temperature ranges for popular modes are provided below:
    Mode Kelvin range
    incandescent 2500-3500
    fluorescent 4000-5000
    warm-fluorescent 5000-5500
    daylight 5500-6500
    cloudy-daylight 6500-8000
    twilight 8000-9000
    shade 9000-10000
  2. Exposure is the amount of time during which light is allowed to fall on the photosensitive device. Auto-exposure mode is a camera setting where the exposure levels are automatically adjusted by the implementation based on the subject of the photo.
  3. Exposure Compensation is a numeric camera setting that adjusts the exposure level from the current value used by the implementation. This value can be used to bias the exposure level enabled by auto-exposure, and usually is a symmetric range around 0 EV (the no-compensation value).
  4. The ISO setting of a camera describes the sensitivity of the camera to light. It is a numeric value, where the lower the value the greater the sensitivity. This setting in most implementations relates to shutter speed, and is sometimes known as the ASA setting.
  5. Red Eye Reduction is a feature in cameras that is designed to limit or prevent the appearance of red pupils ("Red Eye") in photography subjects due prolonged exposure to a camera's flash.
  6. Focus mode describes the focus setting of the capture device (e.g. `auto` or `manual`).
  7. [[LIGHTING-VOCABULARY]] defines brightness as "the attribute of a visual sensation according to which an area appears to emit more or less light" and in the context of the present API, it refers to the numeric camera setting that adjusts the perceived amount of light emitting from the photo object. A higher brightness setting increases the intensity of darker areas in a scene while compressing the intensity of brighter parts of the scene. The range and effect of this setting is implementation dependent but in general it translates into a numerical value that is added to each pixel with saturation.
  8. Contrast is the numeric camera setting that controls the difference in brightness between light and dark areas in a scene. A higher contrast setting reflects an expansion in the difference in brightness. The range and effect of this setting is implementation dependent but it can be understood as a transformation of the pixel values so that the luma range in the histogram becomes larger; the transformation is sometimes as simple as a gain factor.
  9. [[LIGHTING-VOCABULARY]] defines saturation as "the colourfulness of an area judged in proportion to its brightness" and in the current context it refers to a numeric camera setting that controls the intensity of color in a scene (i.e. the amount of gray in the scene). Very low saturation levels will result in photos closer to black-and-white. Saturation is similar to contrast but referring to colors, so its implementation, albeit being platform dependent, can be understood as a gain factor applied to the chroma components of a given image.
  10. Sharpness is a numeric camera setting that controls the intensity of edges in a scene. Higher sharpness settings result in higher contrast along the edges, while lower settings result in less contrast and blurrier edges (i.e. soft focus). The implementation is platform dependent, but it can be understood as the linear combination of an edge detection operation applied on the original image and the original image itself; the relative weights being cotrolled by this `sharpness`.
  11. Zoom is a numeric camera setting that controls the focal length of the lens. The setting usually represents a ratio, e.g. 4 is a zoom ratio of 4:1. The minimum value is usually 1, to represent a 1:1 ratio (i.e. no zoom).
  12. Fill light mode describes the flash setting of the capture device (e.g. `auto`, `off`, `on`).
Many of these fields mirror hardware capabilities that can be implemented in a number of ways, preventing further definition. Moreover, hardware manufacturers tend to publish vague definitions to protect their intellectual property.

PhotoSettings

The PhotoSettings object is optionally passed into the setOptions() method in order to modify capture device settings specific to still imagery. Each of the attributes in this object is optional.

        dictionary PhotoSettings {
             MeteringMode  whiteBalanceMode;
             unsigned long colorTemperature;
             MeteringMode  exposureMode;
             unsigned long exposureCompensation;
             unsigned long iso;
             boolean       redEyeReduction;
             MeteringMode  focusMode;
             sequence<Point2D> pointsOfInterest;

             unsigned long brightness;
             unsigned long contrast;
             unsigned long saturation;
             unsigned long sharpness;
             unsigned long zoom;
             unsigned long imageHeight;
             unsigned long imageWidth;
             FillLightMode fillLightMode;
        };
      

Members

whiteBalanceMode of type MeteringMode
This reflects the desired white balance mode setting.
colorTemperature of type unsigned long
Color temperature to be used for the white balance calculation of the scene. This field is only significant if whiteBalanceMode is manual.
exposureMode of type MeteringMode
This reflects the desired exposure mode setting. Acceptable values are of type MeteringMode.
exposureCompensation of type unsigned long, multiplied by 100 (to avoid using floating point).
This reflects the desired exposure compensation setting. A value of 0 EV is interpreted as no exposure compensation.
iso of type unsigned long
This reflects the desired camera ISO setting.
redEyeReduction of type boolean
This reflects whether camera red eye reduction is desired
focusMode of type MeteringMode
This reflects the desired focus mode setting. Acceptable values are of type MeteringMode.
pointsOfInterest of type sequence<Point2D>
A sequence of Point2Ds to be used as metering area centers for other settings, e.g. Focus, Exposure and Auto White Balance.
brightness of type unsigned long
This reflects the desired brightness setting of the camera.
contrast of type unsigned long
This reflects the desired contrast setting of the camera.
saturation of type unsigned long
This reflects the desired saturation setting of the camera.
sharpness of type unsigned long
This reflects the desired sharpness setting of the camera.
zoom of type unsigned long
This reflects the desired zoom setting of the camera.
imageHeight of type unsigned long
This reflects the desired image height. The UA MUST select the closest height value this setting if it supports a discrete set of height options.
imageWidth of type unsigned long
This reflects the desired image width. The UA MUST select the closest width value this setting if it supports a discrete set of width options.
fillLightMode of type FillLightMode
This reflects the desired fill light (flash) mode setting. Acceptable values are of type FillLightMode.

MediaSettingsRange

        interface MediaSettingsRange {
            readonly attribute long max;
            readonly attribute long min;
            readonly attribute long current;
        };
      

Attributes

max of type long, readonly
The maximum value of this setting
min of type long, readonly
The minimum value of this setting
current of type long, readonly
The current value of this setting

FillLightMode

        enum FillLightMode {
            "unavailable",
            "auto",
            "off",
            "flash",
            "torch"
        };
      

Values

unavailable
This source does not have an option to change fill light modes (e.g., the camera does not have a flash)
auto
The video device's fill light will be enabled when required (typically low light conditions). Otherwise it will be off. Note that auto does not guarantee that a flash will fire when `takePhoto()` is called. Use flash to guarantee firing of the flash for the takePhoto() or getFrame() methods.
off
The source's fill light and/or flash will not be used.
flash
This value will always cause the flash to fire for the takePhoto() or getFrame() methods.
torch
The source's fill light will be turned on (and remain on) while the source MediaStreamTrack is active

MeteringMode

Note that MeteringMode is used for both status enumeration and for setting options for capture(s).

        enum MeteringMode {
            "none",
            "manual",
            "single-shot",
            "continuous"
        };
      

Values

none
This source does not offer focus/exposure/white balance mode. For setting, this is interpreted as a command to turn off the feature.
manual
The capture device is set to manually control the lens position/exposure time/white balance, or such a mode is requested to be configured.
single-shot
The capture device is configured for single-sweep autofocus/one-shot exposure/white balance calculation, or such a mode is requested.
continuous
The capture device is configured for continuous focusing for near-zero shutter-lag/continuous auto exposure/white balance calculation, or such continuous focus hunting/exposure/white balance calculation mode is requested.

A Point2D represents a location in a normalized square space with values in `[0.0, 1.0]`. The origin of coordinates `(0.0, 0.0)` represents the upper leftmost corner, with the `y` coordinate pointing downwards.

Point2D

      dictionary Point2D {
        float x = 0.0;
        float y = 0.0;
      };
    

Attributes

x of type float
Value of the normalized horizontal (abscissa) coordinate in the range `[0.0, 1.0]`. Increasing `x` values correspond to increasing column indexes of an image.
y of type float
Value of the normalized vertical (ordinate) coordinate in the range `[0.0, 1.0]`. Increasing `y` values correspond to increasing row indexes of an image.

Examples

##### Grabbing a Frame for Post-Processing
    navigator.mediaDevices.getUserMedia({video: true}).then(gotMedia, failedToGetMedia);

    function gotMedia(mediastream) {
        // Extract video track.
        var videoTrack = mediastream.getVideoTracks()[0];
        // Check if this device supports a picture mode...
        var captureDevice = new ImageCapture(videoTrack);
        if (captureDevice) {
            captureDevice.grabFrame().then(processFrame(imgData));
        }
    }

    function processFrame(e) {
        imgData = e.imageData;
        width = imgData.width;
        height = imgData.height;
        for (j = 3; j < imgData.width; j += 4) {
            // Set all alpha values to medium opacity
            imgData.data[j] = 128;
        }

        // Create new ImageObject with the modified pixel values
        var canvas = document.createElement('canvas');
        ctx = canvas.getContext("2d");
        newImg = ctx.createImageData(width,height);
        for (j = 0; j < imgData.width; j++) {
            newImg.data[j] = imgData.data[j];
        }

        // ... and do something with the modified image ...
        }
    }

    function failedToGetMedia(e) {
        console.log('Stream failure: ' + e);
    }
    
##### Taking a picture with Red Eye Reduction supported and used
    navigator.mediaDevices.getUserMedia({video: true}).then(gotMedia, failedToGetMedia);

    function gotMedia(mediastream) {
        // Extract video track.
        var videoDevice = mediastream.getVideoTracks()[0];
        // Check if this device supports a picture mode...
        var captureDevice = new ImageCapture(videoDevice);
        if (captureDevice) {
            if (captureDevice.photoCapabilities.redEyeReduction) {
                captureDevice.setOptions({redEyeReductionSetting:true})
                    .then(captureDevice.takePhoto()
                    .then(showPicture(blob),function(error){alert("Failed to take photo");}));
            } else {
                console.log('No red eye reduction');
            }
        }
    }

    function showPicture(e) {
        var img = document.querySelector("img");
        img.src = URL.createObjectURL(e.data);
    }

    function failedToGetMedia(e) {
        console.log('Stream failure: ' + e);
    }
    
##### Repeated grabbing of a frame
    <html>
    <body>
    <p><canvas id="frame"></canvas></p>
    <button onclick="stopFunction()">Stop frame grab</button>
    <script>
      var canvas = document.getElementById('frame');
      navigator.mediaDevices.getUserMedia({video: true}).then(gotMedia, failedToGetMedia);

      function gotMedia(mediastream) {
          // Extract video track.
          var videoDevice = mediastream.getVideoTracks()[0];
          // Check if this device supports a picture mode...
          var captureDevice = new ImageCapture(videoDevice);
          var frameVar;
          if (captureDevice) {
              frameVar = setInterval(captureDevice.grabFrame().then(processFrame()), 1000);
          }
      }

      function processFrame(e) {
          imgData = e.imageData;
          canvas.width = imgData.width;
          canvas.height = imgData.height;
          canvas.getContext('2d').drawImage(imgData, 0, 0, imgData.width, imgData.height);
      }

      function stopFunction(e) {
          clearInterval(myVar);
      }

      function failedToGetMedia(e) {
          console.log('Stream failure:', e);
      }
    </script>
    </body>
    </html>