Inaccessibility of CAPTCHA

Abstract

Various approaches have been employed over many years to distinguish human users of web sites from robots. While the traditional CAPTCHA approach of asking the user to identify obscured text in an image remains common, other mechanisms are gaining in prominence. These approaches generally require users to perform a task believed to be possible for humans and difficult for robots, but the nature of the task inherently excludes many people with disabilities, resulting in an incorrect denial of service to these users. Research findings also indicate that many popular CAPTCHA techniques are no longer particularly effective or secure, so it is necessary to consider alternative approaches to block robots, yet ensure these approaches support access for people with disabilities. This document examines a number of potential solutions that allow systems to test for human users, and the extent to which these solutions adequately accommodate people with disabilities.

Types of CAPTCHA and access implications

There are many techniques available to web sites to discourage or eliminate fraudulent activities such as inappropriate account creation. Several of them may be as effective as the visual verification technique while being more accessible to people with disabilities. Others may be overlaid as an accommodation for the purposes of accessibility. The following list highlights common CAPTCHA types and their respective accessibility implications.

Traditional character-based CAPTCHA

The traditional character-based CAPTCHA, as previously discussed, is largely inaccessible and insecure. It focuses on the presentation of letters or words presented in an image and designed to be difficult for robots to identify. The user is then asked to enter the CAPTCHA information into a form.

The use of a traditional CAPTCHA is particularly problematic for people who are blind, as the screen readers they rely on to use web content cannot process the image, thus preventing them from uncovering the information required by the form. Because the characters embedded in a CAPTCHA are often distorted or have other characters in close proximity to each other in order to foil technological solution by robots, they are also very difficult for users with other visual disabilities. This common CAPTCHA technique is less reliably solved by users with cognitive and learning disabilities, see The Effect of CAPTCHA on User Experience among Users with and without Learning Disabilities [[captcha-ld]]. Because they’re intentionally distorted to foil robots, they also foil users who do not possess sufficiently acute vision to “see” beyond the presented distortion and uncover the text the site requires in order to proceed.

In addition, there is currently a dominant assumption that all web users can understand English, which is clearly not the case. Native and literate Arabic or Thai speakers, for example, should not be assumed to possess proficiency with the ISO 8859-1 character set [[iso-8859-1]]—demonstrating an important barrier imposed by CAPTCHAs based on written English and related language character sets; see Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness [[captcha-robustness]].

Logic puzzles

The goal of visual verification is to separate human from machine. One reasonable way to do this is to test for logic. Simple mathematical or word puzzles, trivia, or similar logic tests may raise the bar for robots, at least to the point where using them is more attractive elsewhere.

Problems: Users with cognitive disabilities may still have trouble. Answers may need to be handled flexibly, if they require free-form text. A system would have to maintain a vast number of questions, or shift them around programmatically, in order to keep spiders from capturing them all for use by web robots. This approach is also subject to defeat by human operators engaged in crowd-sourcing activity on behalf of attackers.

Sound output

To re-frame the problem, text is easy to manipulate, which is good for assistive technologies, but just as good for robots. One logical solution to this problem is to offer another non-textual method of using the same content. To achieve this, audio is played that contains a series of characters, words, or phrases being read out which the user then needs to enter into a form. As with visual CAPTCHA however, robots are also capable of recognizing spoken content—as Amazon’s Alexa and Android’s Google Assistant, among other spoken dialog systems, have so ably demonstrated. Consequently, the characters, words, or phrases the user is to uncover and transcribe in the form are also distorted in an audio CAPTCHA and are usually played over a sonic environment of obfuscating sounds.

The industry recognized this problem early. CNet reported in Spam-bot tests flunk the blind [[newscom]] that “Hotmail’s sound output, which is itself distorted to avoid the same programmatic abuse, was unintelligible to all four test subjects, all of whom had good hearing.”

If the sound output, which is itself distorted to avoid the same programmatic abuse, can render the CAPTCHA difficult to hear; there can also be confusion in understanding whether a number is to be entered as a numerical value or as a word, e.g. ‘7’ or ‘seven’. Often the audio CAPTCHA user will hear sounds which seem to be words or numerical values that should be entered, but turn out to be just background noise.

Sound is intrinsically temporal, but the import of this unavoidable fact is too often under appreciated—perhaps because the world we live in as seen through the eyes is also temporal. Unlike the real world seen through the eyes however, the traditional CAPTCHA is a still image that can be stared at until comprehension dawns. Sound has no analog to the visual still image.

Whenever any portion of an audio CAPTCHA is not understood; at least some part of the CAPTCHA must be replayed, usually several times. Currently, few audio CAPTCHAs provide an easily invoked replay feature, let alone a pause, rewind, and fast-forward feature. Consequently, an entirely new audio CAPTCHA is often played should any part of one audio CAPTCHA prove difficult to understand.

Some audio CAPTCHA tacitly admit this failure by offering a link allowing the user to Download the audio CAPTCHA, typically as a mp3 file. The implicit assumption is that the user will use a favorite audio player—which does provide pause, play, rewind, and fast forward capabilities—to play the audio CAPTCHA MP3 file again and again until comprehension dawns, perhaps pausing and rewinding the playback and perhaps writing down on the side the text destined for the web form. Clearly this is very inconvenient and subject to web site time outs. It also illustrates why simply providing an audio CAPTCHA alternative to the traditional visual CAPTCHA does not provide equivalent access to the user.

Users who are deaf-blind, don’t have or use a sound card, find themselves in noisy environments, or don’t have required sound plugins properly configured and functioning, are thus also prevented from proceeding. Similarly, users of browsers which do not support easy direction of sound output to a particular audio device, or to all available audio devices on the system, are also hampered.

Although auditory forms of CAPTCHA that present distorted speech create recognition difficulties for screen reader users, the accuracy with which such users can complete the CAPTCHA tasks is increased if the user interface is carefully designed to prevent screen reader audio and CAPTCHA audio from being intermixed. This can be achieved by implementing functions for controlling the audio that do not require the user to move focus away from the text response field; see Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use [[eval-audio]].

Experiments with a combined auditory and visual CAPTCHA requiring users to identify well known objects by recognizing either images or sounds, suggest that this technique is highly usable by screen reader users. However, its security-related properties remain to be explored, as mentioned in Towards a universally usable human interaction proof: evaluation of task completion strategies [[task-completion]].

Limited-use accounts

Users of free accounts very rarely need full and immediate access to a site’s resources. For example, users who are searching for concert tickets may need to conduct only three searches a day, and new email users may only need to send the same notification of their new address to their friends. Sites may create policies that limit the frequency of interaction explicitly (that is, by disabling an account for the rest of the day) or implicitly (by slowing the response times incrementally). Creating limits for new users can be an effective means of making high-value sites unattractive targets to robots.

Drawbacks to this approach include the need to perform sufficient testing and data collection to determine useful limits that will serve human users yet frustrate robots. It requires site designers to look at statistics of normal and exceptional users, and determine whether clear demarcation exists between them.

Non-interactive checks

While CAPTCHA and other interactive approaches to limiting the activities of web robots are sometimes effective, they do make using a site more complex. This is often unnecessary, as a large number of non-interactive mechanisms exist to check for spam or other invalid content typically introduced by robots.

This category contains two popular non-interactive approaches: spam filtering, in which an automated tool evaluates the content of a transaction, and heuristic checks, which evaluate the behavior of the client.

Spam filtering

Applications that use continuous authentication and “hot words” to flag spam content, or Bayesian filtering to detect other patterns consistent with spam, are very popular, and quite effective. While such risk analysis systems may experience false negatives from time to time, properly-tuned systems can achieve results comparable to a traditional visual CAPTCHA, while also removing the added cognitive burden on the user and eliminating access barriers.

Most major blogging software contains spam filtering capabilities, or can be fitted with a plug-in for this functionality. Many of these filters can automatically delete messages that reach a certain spam threshold, and mark questionable messages for manual moderation. More advanced systems can control attacks based on posting frequency, filter content sent using the Trackback protocol, and ban users by IP address range, temporarily or permanently.

Heuristic checks and the Google reCAPTCHA

Heuristics are discoveries in a process that seem to indicate a given result. It may be possible to detect the presence of a robotic user based on the volume of data the user requests, series of common pages visited, IP addresses, data entry methods, or other signature data that can be collected.

Again, this requires a careful examination of site data. If pattern-matching algorithms can’t find good heuristics, then this is not a good solution. Also, polymorphism, or the creation of changing footprints, is apt to result, if it hasn’t already, in robots, just as polymorphic (“stealth”) viruses appeared to get around virus checkers looking for known viral footprints.

Another heuristic approach identified in Botz-4-Sale: Surviving DDos Attacks that Mimic Flash Crowds [[killbots]] involves the use of CAPTCHA images, with a twist: how the user reacts to the test is as important as whether or not it was solved. This system, which was designed to thwart distributed denial of service (DDoS) attacks, bans automated attackers which make repeated attempts to retrieve a certain page, while protecting against marking humans incorrectly as automated traffic. When the server’s load drops below a certain level, the CAPTCHA-based authentication process is removed entirely.

An example of a CAPTCHA based on this approach is the Google reCAPTCHA which features a check box labelled ‘I am not a robot’ or similar phrasing. The process works by collecting data such as mouse movement and keyboard navigation to determine whether the user is human or robot, while keeping the CAPTCHA process relatively simple.

Anecdotal evidence suggests that this CAPTCHA is currently the most accessible CAPTCHA solution and can be completed with a variety of assistive technologies. However, there is little formalized research investigating whether this is indeed the case. There is also the additional concern that a user’s failure to interact appropriately with the reCATPCHA challenge or to satisfy the heuristic tests results in the presentation of a traditional inaccessible CAPTCHA as a fall-back mechanism.

Federated identity systems

Many large companies such as Microsoft, Apple, Amazon, Google and the Kantara Initiative have created competing “federated network identity” systems, which can allow a user to create an account, set his or her preferences, payment data, etc., and have that data persist across all sites and devices that use the same service. Due to large companies now requiring a federated identity to use cloud-based services on their respective digital ecosystems, the popularity of federated identities has increased significantly. As a result, many web sites and services allow a portable form of authentication and identification across the Web.

Single sign-on

Single sign-on services will need to be among the most accessible services on the Web in order to offer equal benefits to people with disabilities. Additionally, use of these services will need to be ubiquitous to truly solve the problems addressed here once and for all.

Public-key infrastructure solutions

Another approach is to use certificates for individuals who wish to verify their identity. The certificate can be issued in such a way as to ensure something close to a one-person-one-vote system; e.g. by issuing these certificates in person and enabling users to develop distributed trust networks, or by having these certificates issued from highly trusted authorities such as governments. These types of systems have been implemented for securing web pages, and for authenticating email.

The cost of creating fraudulent certificates needs to be high enough to destroy the value of producing them in most cases. Sites would need to use mechanisms which are widely implemented in user agents.

A variant of this concept, in which only people with disabilities who are affected by other verification systems would register, is sometimes proposed. However such approaches raise significant privacy and stigmatisation concerns and are usually opposed strongly by people with disabilities themselves and by organizations that serve them. Such approaches should not be confused with situations where people voluntarily self-identify as individuals with disabilities. An example is the U.S. based Bookshare whose services are only available to persons with documented print disabilities. Bookshare provides its users access to printed materials which are otherwise unavailable in accessible alternative formats such as audio or Braille. An American copyright provision known as the Chafee Amendment [[chafee]] allows copyrighted materials to be reproduced in specialized forms that are only usable by print disabled users. A public-key infrastructure system would allow Bookshare’s maintainers to ensure that the site and its users are in compliance with copyright law.

Biometrics

Biometric identifiers have become a popular authentication method, especially on mobile platforms. Some physical characteristic of the user, such as a fingerprint or a facial profile, is first acquired and then recognized to verify the individual’s identity. This process effectively limits the ability of web robots to create a large number of false identities.

However, biometric authentication mechanisms also need to be carefully designed to avoid introducing accessibility barriers. Individuals who lack the biological characteristics required by the particular authentication method, e.g., fingers, or who are unable to perform the enrollment or identification procedures are effectively precluded from using it. This can result in denial of access to certain users with disabilities on systems relying on biometrics for authentication. Consequently, reliance on a single biometric identifier to identify a user is now insufficient to satisfy public sector procurement standards in the European Union EN 301 549, section 5.3 [[en-301-549]] and regulations under section 508 of the Rehabilitation Act and Section 255 of the Communications Act, 36 CFR 1194, Appendix C, section 403 in the United States [[36-cfr-1194]].

Where biometrics are used as an alternative to CAPTCHA, systems should be designed to allow users to choose among multiple and unrelated biometric identifiers. It should also be noted that biometrics can uniquely identify individuals, making this alternative unsuitable for applications in which it is necessary to preserve the user’s anonymity (i.e., the application is required to verify solely that the user is human, without obtaining identifying information).

Multiple user devices

The user of multiple devices such as a computer, smartphone, tablet and/or wearable could provide additional support for user authentication, as in Design, Testing and Implementation of a New Authentication Method Using Multiple Devices [[auth-mult]]. This could assist in addressing accessibility issues by using assistive technologies on each device to confirm the user is a human and is a specific user. The use of biometrics, as previously discussed, could also serve as one such device authentication mechanism.

Image and video

Visual comparison CAPTCHAs

There are a number of new CAPTCHA techniques based on the identification of still images. This can include requiring the user to identify whether an image is a man or a woman, or whether an image is human-shaped or avatar-shaped among other comparison solutions, such as CAPTCHaStar! A novel CAPTCHA based on interactive shape discovery, [[captchastar]], FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers [[facecaptcha]], and Social and egocentric image classification for scientific and privacy applications [[social-classification]].

While alternative audio comparison CAPTCHAs could be provided such as using similar or different sounds for comparison, the reliance on visual comparison alone would make these techniques difficult, if not impossible for people with vision-related disabilities to use.

3D CAPTCHA

A 3D representation of letters and numbers can make it more difficult for OCR software to identify them, in turn increasing the security of the CAPTCHA, described in On the security of text-based 3D CAPTCHAs [[3d-captcha-security]]. However, this solution raises similar accessibility issues to traditional CAPTCHAs.

We recommend further exploration of the use of risk analysis techniques (as exemplified by the approach that Google engineers have taken) to reduce the need for CAPTCHA.

Video Game CAPTCHA

This process suggests the completion of a basic video game as a CAPTCHA, like Game-based image semantic CAPTCHA on handset devices [[game-captcha]]. The benefits include the removal of language barriers, and multiple interface methods could potentially make such a solution accessible. It would also have the benefit of making CAPCHA solving an enjoyable process, reducing the frustrations generally associated with traditional CAPTCHAs.

Abstract

Introduction

Security effectiveness