Accelerated Shape Detection in Images
Accelerated Shape Detection in Images
Draft Community Group Report, 30 January 2023
This version:https://wicg.github.io/shape-detection-apiIssue Tracking:GitHubEditors:Miguel Casas-Sanchez (Google LLC)Reilly Grant (Google LLC)Translations (non-normative):简体中文Participate:Join the W3C Community GroupFix the text through GitHub
Copyright © 2023 the Contributors to the Accelerated Shape Detection in Images Specification, published by the Web Platform Incubator Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.
Summary
This document describes an API providing access to accelerated shape detectors (e.g. human faces) for still images and/or live image feeds.
Status of this document
This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
table of contents
1. Introduction
Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces or barcordes/QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. While hardware manufacturers have been supporting these features for a long time, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary.
Text Detection, despite being an interesting field, is not considered stable enough across neither computing platforms nor character sets to be standarized in the context of this document. For reference a sister informative specification is kept in [TEXT-DETECTION-API].
1.1. Shape detection use cases
Please see the Readme/Explainer in the repository.
2. Shape Detection API
Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation.
2.1. Image sources for detection
This section is inspired by HTML Canvas 2D Context § image-sources-for-2d-rendering-contexts.
ImageBitmapSource
allows objects implementing any of a number of interfaces to be used as image sources for the detection process.
When an
ImageBitmapSource
object represents anHTMLImageElement
, the element’s image must be used as the source image. Specifically, when anImageBitmapSource
object represents an animated image in anHTMLImageElement
, the user agent must use the default image of the animation (the one that the format defines is to be used when animation is not supported or is disabled), or, if there is no such image, the first frame of the animation.When an
ImageBitmapSource
object represents anHTMLVideoElement
, then the frame at the current playback position when the method with the argument is invoked must be used as the source image when processing the image, and the source image’s dimensions must be the intrinsic dimensions of the media resource (i.e. after any aspect-ratio correction has been applied).When an
ImageBitmapSource
object represents anHTMLCanvasElement
, the element’s bitmap must be used as the source image.
When the UA is required to use a given type of ImageBitmapSource
as input argument for the detect()
method of whichever detector, it MUST run these steps:
If any
ImageBitmapSource
have an effective script origin (origin) which is not the same as the Document’s effective script origin, then reject the Promise with a newDOMException
whose name isSecurityError
.If the
ImageBitmapSource
is anHTMLImageElement
object that is in theBroken
(HTML Standard §img-error) state, then reject the Promise with a newDOMException
whose name isInvalidStateError
, and abort any further steps.If the
ImageBitmapSource
is anHTMLImageElement
object that is not fully decodable then reject the Promise with a newDOMException
whose name isInvalidStateError
, and abort any further stepsIf the
ImageBitmapSource
is anHTMLVideoElement
object whosereadyState
attribute is eitherHAVE_NOTHING
orHAVE_METADATA
then reject the Promise with a newDOMException
whose name isInvalidStateError
, and abort any further steps.If the
ImageBitmapSource
argument is anHTMLCanvasElement
whose bitmap’sorigin-clean
(HTML Standard §concept-canvas-origin-clean) flag is false, then reject the Promise with a newDOMException
whose name isSecurityError
, and abort any further steps.
Note that if the ImageBitmapSource
is an object with either a horizontal dimension or a vertical dimension equal to zero, then the Promise will be simply resolved with an empty sequence of detected objects.
2.2. Face Detection API
FaceDetector
represents an underlying accelerated platform’s component for detection of human faces in images. It can be created with an optional Dictionary of FaceDetectorOptions
. It provides a single detect()
operation on an ImageBitmapSource
which result is a Promise. This method MUST reject this promise in the cases detailed in § 2.1 Image sources for detection; otherwise it MAY queue a task that utilizes the OS/Platform resources to resolve the Promise with a Sequence of DetectedFace
s, each one essentially consisting on and delimited by a boundingBox
.
Example implementations of face detection are e.g. Android FaceDetector (or the Google Play Services vision library), Apple’s CIFaceFeature / VNDetectFaceLandmarksRequest or Windows 10 FaceDetector.
FaceDetector(optional FaceDetectorOptions faceDetectorOptions)
Constructs a new FaceDetector
with the optional faceDetectorOptions.Detectors may potentially allocate and hold significant resources. Where possible, reuse the same FaceDetector
for several detections.detect(ImageBitmapSource image)
Tries to detect human faces in the ImageBitmapSource
image. The detected faces, if any, are returned as a sequence of DetectedFace
s.
2.2.1. FaceDetectorOptions
maxDetectedFaces
, of type unsigned shortHint to the UA to try and limit the amount of detected faces on the scene to this maximum number.fastMode
, of type booleanHint to the UA to try and prioritise speed over accuracy by e.g. operating on a reduced scale or looking for large features.
2.2.2. DetectedFace
boundingBox
, of type DOMRectReadOnlyA rectangle indicating the position and extent of a detected feature aligned to the image axes.landmarks
, of type FrozenArray<Landmark>, nullableA series of features of interest related to the detected feature.
locations
, of type FrozenArray<Point2D>A point in the center of the detected landmark, or a sequence of points defining the vertices of a simple polygon surrounding the landmark in either a clockwise or counter-clockwise direction.type
, of type LandmarkTypeType of the landmark, if known.
mouth
The landmark is identified as a human mouth.eye
The landmark is identified as a human eye.nose
The landmark is identified as a human nose.Consider adding attributes such as, e.g.:
to DetectedFace
.
2.3. Barcode Detection API
⚠MDN
BarcodeDetector
represents an underlying accelerated platform’s component for detection of linear or two-dimensional barcodes in images. It provides a single detect()
operation on an ImageBitmapSource
which result is a Promise. This method MUST reject this Promise in the cases detailed in § 2.1 Image sources for detection; otherwise it MAY queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedBarcode
s, each one essentially consisting on and delimited by a boundingBox
and a series of Point2D
s, and possibly a rawValue
decoded DOMString
.
Example implementations of Barcode/QR code detection are e.g. Google Play Services or Apple’s CIQRCodeFeature / VNDetectBarcodesRequest.
⚠MDNBarcodeDetector(optional BarcodeDetectorOptions barcodeDetectorOptions)
Constructs a new BarcodeDetector
with barcodeDetectorOptions.
Detectors may potentially allocate and hold significant resources. Where possible, reuse the same BarcodeDetector
for several detections.⚠MDNgetSupportedFormats()
This method, when invoked, MUST return a new Promise
promise and run the following steps in parallel:
Let supportedFormats be a new
Array
.If the UA does not support barcode detection, resolve promise with supportedFormats and abort these steps.
Enumerate the
BarcodeFormat
s that the UA understands as potentially detectable in images. Add these to supportedFormats.The UA cannot give a definitive answer as to whether a given barcode format will always be recognized on an image due to e.g. positioning of the symbols or encoding errors. If a given barcode symbology is not in supportedFormats array, however, it should not be detectable whatsoever.Resolve promise with supportedFormats.
The list of supported BarcodeFormat
s is platform dependent, some examples are the ones supported by Google Play Services and Apple’s QICRCodeFeature.⚠MDNdetect(ImageBitmapSource image)
Tries to detect barcodes in the ImageBitmapSource
image.
2.3.1. BarcodeDetectorOptions
formats
, of type sequence<BarcodeFormat>A series of BarcodeFormat
s to search for in the subsequent detect()
calls. If not present then the UA SHOULD search for all supported formats.Limiting the search to a particular subset of supported formats is likely to provide better performance.
2.3.2. DetectedBarcode
boundingBox
, of type DOMRectReadOnlyA rectangle indicating the position and extent of a detected feature aligned to the imagerawValue
, of type DOMStringString decoded from the barcode. This value might be multiline.format
, of type BarcodeFormatDetect BarcodeFormat
.cornerPoints
, of type FrozenArray<Point2D>A sequence of corner points of the detected barcode, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.
2.3.3. BarcodeFormat
aztec
This entry represents a square two-dimensional matrix following [iso24778] and with a square bullseye pattern at their centre, thus resembling an Aztec pyramid. Does not require a surrounding blank zone.code_128
Code 128 is a linear (one-dimensional), bidirectionally-decodable, self-checking barcode following [iso15417] and able to encode all 128 characters of ASCII (hence the naming).code_39
This part talks about the Code 39 barcode. It is a discrete and variable-length barcode type. [iso16388]code_93
Code 93 is a linear, continuous symbology with a variable length following [bc5]. It offers a larger information density than Code 128 and the visually similar Code 39. Code 93 is used primarily by Canada Post to encode supplementary delivery information.codabar
Codabar is a linear barcode symbology developed in 1972 by Pitney Bowes Corp. (data_matrix
Data Matrix is an orientation-independent two-dimensional barcode composed of black and white modules arranged in either a square or rectangular pattern following [iso16022].ean_13
EAN-13 is a linear barcode based on the UPC-A standard and defined in [iso15420]. It was originally developed by the International Article Numbering Association (EAN) in Europe as a superset of the original 12-digit Universal Product Code (UPC) system developed in the United States (UPC-A codes are represented in EAN-13 with the first character set to 0).ean_8
EAN-8 is a linear barcode defined in [iso15420] and derived from EAN-13.itf
ITF14 barcode is the GS1 implementation of an Interleaved 2 of 5 bar code to encode a Global Trade Item Number. It is continuous, self-checking, bidirectionally decodable and it will always encode 14 digits. was once used in the package delivery industry but replaced by Code 128. [bc2]pdf417
PDF417 refers to a continuous two-dimensional barcode symbology format with multiple rows and columns, bi-directionally decodable and according to the Standard [iso15438].qr_code
QR Code is a two-dimensional barcode respecting the Standard [iso18004]. The information encoded can be text, URL or other data.unknown
This value is used by the platform to signify that it does not know or specify which barcode format is being detected or supported.upc_a
UPC-A is one of the most common linear barcode types and is widely applied to retail in the United States. Define in [iso15420], it represents digits by strips of bars and spaces, each digit being associated to a unique pattern of 2 bars and 2 spaces, both of variable width. UPC-A can encode 12 digits that are uniquely assigned to each trade item, and it’ss technically a subset of EAN-13 (UPC-A codes are represented in EAN-13 with the first character set to 0).upc_e
UPC-E Barcode is a variation of UPC-A defined in [iso15420], compressing out unnecessary zeros for a more compact barcode.
3. Security and Privacy Considerations
This section is non-normative.
This interface reveals information about the contents of an image source. It is critical for implementations to ensure that it cannot be used to bypass protections that would otherwise protect an image source from inspection. § 2.1 Image sources for detection describes the algorithm to accomplish this.
By providing high-performance shape detection capabilities this interface allows developers to run image analysis tasks on the local device. This offers a privacy advantage over offloading computation to a remote system. Developers should consider the results returned by this interface as privacy sensitive as the original image from which they were derived.
4. Examples
This section is non-normative.
Slightly modified/extended versions of these examples (and more) can be found in e.g. this codepen collection.
4.1. Platform support for a given detector
The following example can also be found in e.g. this codepen with minimal modifications.
4.2. Face Detection
The following example can also be found in e.g. this codepen (or this one, with landmarks overlay).
4.3. Barcode Detection
The following example can also be found in e.g. this codepen.
Conformance
Document conventions
Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]
Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example"
, like this:
This is an example of an informative example.
Informative notes begin with the word “Note” and are set apart from the normative text with class="note"
, like this:
Note, this is an informative note.
Conformant Algorithms
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.
Index
Terms defined by this specification
"aztec", in § 2.3.3
aztec, in § 2.3.3
BarcodeDetector, in § 2.3
BarcodeDetector(), in § 2.3
BarcodeDetector(barcodeDetectorOptions), in § 2.3
BarcodeDetectorOptions, in § 2.3.1
BarcodeFormat, in § 2.3.3
boundingBox
dict-member for DetectedBarcode, in § 2.3.2
dict-member for DetectedFace, in § 2.2.2
"codabar", in § 2.3.3
codabar, in § 2.3.3
"code_128", in § 2.3.3
Code 128, in § 2.3.3
code_128, in § 2.3.3
"code_39", in § 2.3.3
Code 39, in § 2.3.3
code_39, in § 2.3.3
"code_93", in § 2.3.3
code_93, in § 2.3.3
constructor()
constructor for BarcodeDetector, in § 2.3
constructor for FaceDetector, in § 2.2
constructor(barcodeDetectorOptions), in § 2.3
constructor(faceDetectorOptions), in § 2.2
cornerPoints, in § 2.3.2
"data_matrix", in § 2.3.3
data_matrix, in § 2.3.3
DetectedBarcode, in § 2.3.2
DetectedFace, in § 2.2.2
detect(image)
method for BarcodeDetector, in § 2.3
method for FaceDetector, in § 2.2
"ean_13", in § 2.3.3
EAN-13, in § 2.3.3
ean_13, in § 2.3.3
"ean_8", in § 2.3.3
ean_8, in § 2.3.3
"eye", in § 2.2.2
eye, in § 2.2.2
FaceDetector, in § 2.2
FaceDetector(), in § 2.2
FaceDetector(faceDetectorOptions), in § 2.2
FaceDetectorOptions, in § 2.2.1
fastMode, in § 2.2.1
format, in § 2.3.2
formats, in § 2.3.1
getSupportedFormats(), in § 2.3
"itf", in § 2.3.3
itf, in § 2.3.3
Landmark, in § 2.2.2
landmarks, in § 2.2.2
LandmarkType, in § 2.2.2
locations, in § 2.2.2
maxDetectedFaces, in § 2.2.1
"mouth", in § 2.2.2
mouth, in § 2.2.2
"nose", in § 2.2.2
nose, in § 2.2.2
"pdf417", in § 2.3.3
pdf417, in § 2.3.3
"qr_code", in § 2.3.3
qr_code, in § 2.3.3
rawValue, in § 2.3.2
type, in § 2.2.2
"unknown", in § 2.3.3
unknown, in § 2.3.3
"upc_a", in § 2.3.3
UPC-A, in § 2.3.3
upc_a, in § 2.3.3
"upc_e", in § 2.3.3
upc_e, in § 2.3.3
Terms defined by reference
[ECMASCRIPT] defines the following terms:
Array
[GEOMETRY-1] defines the following terms:
DOMRectReadOnly
[HTML] defines the following terms:
HAVE_METADATA
HAVE_NOTHING
HTMLCanvasElement
HTMLImageElement
HTMLVideoElement
ImageBitmapSource
in parallel
origin
readyState
[IMAGE-CAPTURE] defines the following terms:
Point2D
[WEBIDL] defines the following terms:
DOMException
DOMString
Exposed
FrozenArray
InvalidStateError
Promise
SecureContext
SecurityError
TypeError
boolean
sequence
unsigned short
References
Normative References
[ECMASCRIPT]ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/[GEOMETRY-1]Simon Pieters; Chris Harrelson. Geometry Interfaces Module Level 1. URL: https://drafts.fxtf.org/geometry/[HTML]Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/[IMAGE-CAPTURE]Miguel Casas-sanchez; Rijubrata Bhaumik. MediaStream Image Capture. URL: https://w3c.github.io/mediacapture-image/[RFC2119]S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119[WEBIDL]Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/
Informative References
[2DCONTEXT]Rik Cabanier; et al. HTML Canvas 2D Context. URL: https://www.w3.org/html/wg/drafts/2dcontext/html5_canvas_CR/[BC2]ANSI/AIM-BC2, Uniform Symbol Specification - Interleaved 2 of 5. 1995.[BC5]ANSI/AIM-BC5, Uniform Symbol Specification - Code 93. 1995.[ISO15417]Information technology -- Automatic identification and data capture techniques -- Code 128 bar code symbology specification. June 2007. URL: https://www.iso.org/standard/43896.html[ISO15420]Information technology -- Automatic identification and data capture techniques -- EAN/UPC bar code symbology specification. Decemver 2009. URL: https://www.iso.org/standard/46143.html[ISO15438]Information technology -- Automatic identification and data capture techniques -- PDF417 bar code symbology specification. September 2015. URL: https://www.iso.org/standard/65502.html[ISO16022]Information technology -- Automatic identification and data capture techniques -- Data Matrix bar code symbology specification. September 2009. URL: https://www.iso.org/standard/44230.html[ISO16388]nformation technology -- Automatic identification and data capture techniques -- Code 39 bar code symbology specification. May 2007. URL: https://www.iso.org/standard/43897.html[ISO18004]Information technology -- Automatic identification and data capture techniques -- QR Code bar code symbology specification. February 2015. URL: https://www.iso.org/standard/62021.html[ISO24778]Information technology -- Automatic identification and data capture techniques -- Aztec Code bar code symbology specification. February 2008. URL: https://www.iso.org/standard/62021.html[TEXT-DETECTION-API]Accelerated Text Detection in Images. cg-draft. URL: https://wicg.github.io/shape-detection-api/text.html
IDL Index
Last updated