To the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 25 November 2013, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0, which is available at http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
application/x-www-form-urlencoded
The URL standard sets out to make URLs fully predictable and interoperable. This is the plan:
Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process. (E.g. spaces, other "illegal" code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [URI] [IRI]
Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.
Define URL's existing JavaScript API in full detail and add
enhancements to make it easier to work with. Add a new URL
object as well for URL manipulation without usage of HTML elements. (Useful
for Web Workers.)
As the editor learns more about the subject matter the goals might increase in scope somewhat.
All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this specification are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]
Some terms used in this specification are defined in the Encoding Standard. [ENCODING]
The ASCII digits are code points in the range U+0030 to U+0039.
The ASCII hex digits are ASCII digits or are code points in the range U+0041 to U+0046 or in the range U+0061 to U+0066.
The ASCII alpha are code points in the range U+0041 to U+005A or in the range U+0061 to U+007A.
The ASCII alphanumeric are ASCII digits or ASCII alpha.
The domain label separators are the code points U+002E, U+3002, U+FF0E, and U+FF61.
The EOF code point is a conceptual code point that signifies the end of a string or code point stream.
A parse error indicates a non-fatal mismatch between input and requirements. User agents are encouraged to expose parse errors somehow.
Within a parser algorithm that uses a pointer variable, c references the code point the pointer variable points to.
Within a string-based parser algorithm that uses a pointer variable, remaining references the substring after pointer in the string being processed.
If "mailto:example@example
" is a string being
processed and pointer points to "@
",
c is "@
" and remaining is
"example
".
A percent-encoded byte is "%
", followed by
two ASCII hex digits. Sequences of
percent-encoded bytes, after
conversion to bytes, should not cause
utf-8 decode to run into any
errors.
To percent encode a byte into a
percent-encoded byte, return a string consisting of
"%
", followed by a double-digit, uppercase, hexadecimal
representation of byte.
To percent decode a string using code points in the range U+0000 to U+007F into a byte sequence, run these steps:
Let pointer be a pointer into string, initially zero (pointing to the first code point).
Let bytes be an empty byte sequence.
While c is not the EOF code point, run these substeps:
While c is not "%
" or the
EOF code point, append to bytes a byte
whose value is c's code point and increase
pointer by one.
If c is "%
" and
remaining does not start with two
ASCII hex digits, append to bytes a byte
whose value is c's code point, increase
pointer by one.
Otherwise, while c is "%
" and
remaining starts with two ASCII hex digits,
append to bytes a byte whose value is
remaining's two leading code points, interpreted as
hexadecimal number, and increase pointer by three.
Return bytes.
The simple encode set are all code points less than U+0020 (i.e. excluding U+0020) and all code points greater than U+007E.
The default encode set is the
simple encode set and code points U+0020,
'"
',
"#
",
"<
",
">
",
"?
",
and
"`
".
The password encode set is the
default encode set and code points
"/
",
"@
",
and
"\
".
The username encode set is the
password encode set and code point
":
".
To utf-8 percent encode a code point, using an encode set, run these steps:
If code point is not in encode set, return code point.
Let bytes be the result of running utf-8 encode on code point.
Percent encode each byte in bytes, and then return them concatenated, in the same order.
A host is null or a network address in the form of either a domain or an IPv6 address.
This is a slightly more generic definition of host than its traditional meaning for the sake of convenience.
A domain is an ordered list of one or more domain labels.
An IPv6 address is a 128-bit identifier and for the purposes of this specification represented as an ordered list of eight 16-bit pieces. [IPV6]
The domain label to ASCII algorithm is the IDNA2003 ToASCII algorithm with the AllowUnassigned flag set and the version of Unicode used being the most recent version rather than Unicode 3.2.
The domain label to Unicode algorithm is the IDNA2003 ToUnicode algorithm with the AllowUnassigned flag set and the version of Unicode used being the most recent version rather than Unicode 3.2. [IDNA] [UNICODE]
Using the latest version of Unicode as well as IDNA2003 rather than IDNA2008 are willful violations, to be compatible with widely deployed clients.
The domain to ASCII algorithm takes a domain input and then runs these steps:
Let asciiLabels be an empty list.
On each domain label in input, in order, run the domain label to ASCII algorithm. If that operation failed, return failure. Otherwise, append the result to asciiLabels.
Return asciiLabels.
The domain to Unicode algorithm takes a domain input and then runs these steps:
Let unicodeLabels be an empty list.
On each domain label in input, in order, run the domain label to Unicode algorithm and append the result to unicodeLabels.
Note that the domain label to Unicode algorithm cannot fail.
Return unicodeLabels.
A host must be either a
domain or "[
", followed
by an IPv6 address, followed by
"]
".
A domain is one or more domain labels separated from each other by a domain label separator, optionally followed by a domain label separator.
A trailing domain label separator signifies an empty domain label.
A domain label is ...
An IPv6 address is defined in the "Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture. [IPV6]
The host parser takes a string input and then runs these steps:
If input is the empty string, return failure.
If input starts with "[
", run these
substeps:
If input does not end with
"]
", parse error, return failure.
Return the result of
parsing input
with its leading "[
" and trailing
"]
" removed.
Let host be the result of running utf-8's decoder on the percent decoding of input.
Let domain be the result of splitting host on any domain label separators.
Return the result of running domain to ASCII on domain.
Domain to ASCII can return failure in which case the host parser would return failure here.
The IPv6 parser takes a string input and then runs these steps:
Let address be a new IPv6 address with its 16-bit pieces initialized to 0.
Let piece pointer be a pointer into address's 16-bit pieces, initially zero (pointing to the first 16-bit piece), and let piece be the 16-bit piece it points to.
Let compress pointer be another pointer into pieces, initially null and pointing to nothing.
Let pointer be a pointer into input, initially zero (pointing to the first code point).
If c is ":
", run these substeps:
If remaining does not start with
":
", parse error, return failure.
Increase pointer by two.
Increase piece pointer by one and then set compress pointer to piece pointer.
Main: While c is not the EOF code point, run these substeps:
If piece pointer is eight, parse error, return failure.
If c is ":
", run these inner
substeps:
If compress pointer is not null, parse error, return failure.
Let value and length be 0.
While length is less than 4 and c is an ASCII hex digit, set value to value × 0x10 + c interpreted as hexadecimal number, and increase pointer and length by one.
Based on c:
.
"
If length is 0, parse error, return failure.
Decrease pointer by length.
Jump to IPv4.
:
"
Increase pointer by one.
If c is the EOF code point, parse error, return failure.
Parse error, return failure.
Set piece to value.
Increase piece pointer by one.
If c is the EOF code point, jump to Finale.
IPv4: If piece pointer is greater than six, parse error, return failure.
Let dots seen be 0.
While c is not the EOF code point, run these substeps:
Let value be 0.
If c is not an ASCII digit, parse error, return failure.
While c is an ASCII digit, set value to value × 10 + c interpreted as decimal number and increase pointer by one.
If value is greater than 255, parse error, return failure.
If dots seen is less than 3 and
c is not a ".
",
parse error, return failure.
Set piece to piece × 0x100 + value.
If dots seen is 0 or 2, increase piece pointer by one.
Increase pointer by one.
If dots seen is 3 and c is not the EOF code point, parse error, return failure.
Increase dots seen by one.
Finale: If compress pointer is not null, run these substeps:
Let swaps be piece pointer − compress pointer.
Set piece pointer to seven.
While neither piece pointer nor swaps is zero, swap piece with the piece at pointer compress pointer + swaps − 1, and then decrease piece pointer and swaps by one.
Otherwise, if compress pointer is null and piece pointer is not eight, parse error, return failure.
Return address.
The host serializer takes a host host and then runs these steps:
If host is null, return the empty string.
If host is an
IPv6 address, return
"[
", followed by the result of running the
IPv6 serializer on host,
followed by "]
".
Otherwise, host is a domain, return its labels separated from each other by U+002E.
The IPv6 serializer takes an IPv6 address address and then runs these steps:
Let output be the empty string.
Let compress pointer be a pointer to the first 16-bit piece in the first longest sequences of address's 16-bit pieces that are 0.
In 0:f:0:0:f:f:0:0
it would point to
the second 0.
If there is no sequence of address's 16-bit pieces that are 0 longer than one, set compress pointer to null.
For each piece in address's pieces, run these substeps:
If compress pointer points to
piece, append "::
" to
output if piece is
address's first piece and append
":
" otherwise, and then run these substeps again with all
subsequent pieces in
address's pieces
that are 0 skipped or go the next step in the overall set of steps if
that leaves no pieces.
Append piece, represented as the shortest possible lowercase hexadecimal number, to output.
If piece is not
address's last piece,
append ":
" to output.
Return output.
This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. [IPV6TEXT]
A URL is a string that represents an identifier.
A URL is either a relative URL or an absolute URL. Either form can be followed by a fragment.
A relative URL is a URL without a scheme. A relative URL must be relative to a base URL.
An absolute URL is a URL with a scheme.
A base URL is an absolute URL with a relative scheme.
Parsing (provided it does not return failure) and serializing a URL will turn it into an absolute URL. The intermediate form is named a parsed URL. The components a URL can consist of, and a parsed URL consists of, are scheme, scheme data (not used if scheme is a relative scheme), username, password, host, port, path, query, and fragment.
A relative scheme is a scheme listed in the first column of the following table. A default port is a relative scheme's optional corresponding port and is listed in the second column on the same row.
scheme | port |
---|---|
"ftp " | "21 "
|
"file " | |
"gopher " | "70 "
|
"http " | "80 "
|
"https " | "443 "
|
"ws " | "80 "
|
"wss " | "443 "
|
A URL must be either a
relative URL or an
absolute URL, optionally followed by
"#
" and a
fragment.
An absolute URL must be a
scheme, followed by
":
", followed by
scheme data, optionally followed
by "?
" and a query.
A scheme must be one
ASCII alpha, followed by zero or more of
ASCII alphanumeric, "+
",
"-
", and ".
". A
scheme must be registered
....
The syntax of scheme data depends on the scheme and is typically defined alongside it. For a relative scheme, scheme data must be a scheme-relative URL. For other schemes, specifications or standards must define scheme data within the constraints of zero or more URL units.
A relative URL must be either a
scheme-relative URL, an
absolute-path-relative URL,
or a path-relative URL that
does not start with a scheme and
":
", optionally followed by a "?
" and
a query.
A scheme-relative URL must be
"//
", optionally followed by
userinfo and "@
",
followed by a host, optionally followed
by ":
" and a port,
optionally followed by an
absolute-path-relative URL.
Userinfo must be a
username, optionally followed by a
":
" and a
password.
A username must be zero or more
URL units, excluding "/
",
":
, "?
", and "@
".
A password must be zero or more
URL units, excluding "/
",
"?
", and "@
".
A port must be zero or more ASCII digits.
An
absolute-path-relative URL
must be "/
", followed by a
path-relative URL that does not
start with "/
".
A path-relative URL must be zero or
more path segments separated from each
other by a "/
".
A path segment must be zero or more URL units,
excluding "/
" and "?
".
A query must be zero or more URL units.
A fragment must be zero or more URL units.
The URL code points are ASCII alphanumeric,
"!
",
"$
",
"&
",
"'
",
"(
",
")
",
"*
",
"+
",
",
",
"-
",
".
",
"/
",
":
",
";
",
"=
",
"?
",
"@
",
"_
",
"~
",
and code points in the ranges
U+00A0 to U+D7FF,
U+E000 to U+FDCF,
U+FDF0 to U+FFEF,
U+10000 to U+1FFFD,
U+20000 to U+2FFFD,
U+30000 to U+3FFFD,
U+40000 to U+4FFFD,
U+50000 to U+5FFFD,
U+60000 to U+6FFFD,
U+70000 to U+7FFFD,
U+80000 to U+8FFFD,
U+90000 to U+9FFFD,
U+A0000 to U+AFFFD,
U+B0000 to U+BFFFD,
U+C0000 to U+CFFFD,
U+D0000 to U+DFFFD,
U+E1000 to U+EFFFD,
U+F0000 to U+FFFFD,
U+100000 to U+10FFFD.
Code points higher than U+009F will be converted to percent-encoded bytes by the URL parser.
The URL units are URL code points and percent-encoded bytes.
Aside from the components mentioned earlier, a parsed URL also has an associated relative flag.
The relative flag exists as checking if a
parsed URL's
scheme is a relative scheme can give
incorrect results due to the protocol
attribute.
Add the ability to halt on the first conformance error.
The URL parser takes a string input, optionally with a base URL base, optionally with an encoding encoding override, optionally with an parsed URL url and a state override state override, and then runs these steps:
The encoding override argument is a legacy concept only
relevant for HTML. The url and state override arguments
are only for use by methods of objects implementing the URLUtils
interface.
[HTML]
If url is not given:
Set url to a new parsed URL.
Set url's scheme, scheme data, username, and port to the empty string, password, host, query, and fragment to null, path to the empty list, and unset its relative flag.
Remove any leading and trailing ASCII whitespace from input.
Let state be state override if given, or scheme start state otherwise.
If base is not given, set it to null.
If encoding override is not given, set it to utf-8.
Let buffer be the empty string.
Let the @ flag and the [] flag be unset.
Let pointer be a pointer to first code point in input.
Keep running the following state machine by switching on state, increasing pointer by one after each time it is run, as long as pointer does not point past the end of input.
If c is an ASCII alpha, append c, lowercased, to buffer, and set state to scheme state.
Otherwise, if state override is not given, set state to no scheme state, and decrease pointer by one.
Otherwise, parse error, terminate this algorithm.
If c is an ASCII alphanumeric,
"+
", "-
", or
".
", append c, lowercased, to
buffer.
Otherwise, if c is ":
", set
url's scheme to
buffer, buffer to the empty string,
and then run these substeps:
If state override is given, terminate this algorithm.
If url's scheme is a relative scheme, set url's relative flag.
If url's
scheme is
"file
", set state to
relative state.
Otherwise, if url's relative flag is set, base is not null and base's scheme is equal to url's scheme, set state to relative or authority state.
Otherwise, if url's relative flag is set, set state to authority first slash state.
Otherwise, set state to scheme data state.
Otherwise, if state override is not given, set buffer to the empty string, state to no scheme state, and start over (from the first code point in input).
Otherwise, if c is the EOF code point, terminate this algorithm.
Otherwise, parse error, terminate this algorithm.
If c is "?
", set
url's query
to the empty string and state to
query state.
Otherwise, if c is "#
", set
url's fragment
to the empty string and state to
fragment state.
Otherwise, run these substeps:
If c is not the EOF code point, not a
URL code point, and not
"%
", parse error.
If c is "%
" and remaining does
not start with two ASCII hex digits, parse error.
If c is none of EOF code point, U+0009, U+000A, and U+000D, utf-8 percent encode c using the simple encode set, and append the result to url's scheme data.
If base is null, or base's scheme is not a relative scheme, parse error, return failure.
Due to the protocol
attribute's
ability to change base's
scheme, base's
relative flag is not used here.
Otherwise, set state to relative state, and decrease pointer by one.
If c is "/
" and
remaining starts with "/
", set
state to authority ignore slashes state
and increase pointer by one.
Otherwise, parse error, set state to relative state and decrease pointer by one.
Set url's relative flag, set
url's scheme to
base's scheme if
url's scheme is not
"file
", and then, based on c:
Set url's host to base's host, url's port to base's port, url's path to base's path, and url's query to base's query.
/
"
\
"
If c is "\
",
parse error.
Set state to relative slash state.
?
"
Set url's host to base's host, url's port to base's port, url's path to base's path, url's query to the empty string, and state to query state.
#
"
Set url's host to base's host, url's port to base's port, url's path to base's path, url's query to base's query, url's fragment to the empty string, and state to fragment state.
If url's scheme is not
"file
", or c is not an
ASCII alpha, or remaining does not start with either
":
" or "|
", or remaining does
not consist of one code point, or remaining's second code point is
not one of "/
", "\
", "?
",
and "#
", then set
url's host to
base's host,
url's port to
base's port,
url's path to
base's path, and then pop
url's path.
This is a (platform-independent) Windows drive letter quirk.
When found at the start of a file
URL it is treated as an
absolute path rather than one relative to
base's path.
Set state to relative path state, and decrease pointer by one.
If c is either "/
" or
"\
", run these steps:
If c is "\
",
parse error.
If url's
scheme is
"file
", set state to
file host state.
Otherwise, set state to authority ignore slashes state.
Otherwise, run these steps:
If c is "/
", set
state to authority second slash state.
Otherwise, parse error, set state to authority ignore slashes state, and decrease pointer by one.
If c is "/
", set
state to authority ignore slashes state.
Otherwise, parse error, set state to authority ignore slashes state, and decrease pointer by one.
If c is neither "/
" nor
"\
", set state to
authority state, and decrease pointer by one.
Otherwise, parse error.
If c is "@
", run these substeps:
If the @ flag is set,
parse error, prepend "%40
" to
buffer.
Set the @ flag.
For each code point in buffer, run these substeps:
If code point is U+0009, U+000A, or U+000D, parse error, continue.
If code point is not a
URL code point and not
"%
", parse error.
If code point is "%
" and
remaining does not start with two
ASCII hex digits, parse error.
If code point is ":
" and
url's
password is null, set
url's password
to the empty string and continue.
utf-8 percent encode code point using the default encode set and append the result to url's password if url's password is non-null, and to url's username otherwise.
Set buffer to the empty string.
Otherwise, if c is one of EOF code point,
"/
", "\
", "?
",
and "#
", decrease pointer by the
number of code points in buffer plus one, set
buffer to the empty string, and
state to host state.
Otherwise, append c to buffer.
If c is one of EOF code point,
"/
", "\
", "?
",
and "#
", decrease pointer by one,
and run these substeps:
If buffer consists of two code points, of
which the first is an ASCII alpha and the second is
either ":
" or "|
", set
state to relative path state.
This is a (platform-independent) Windows drive letter quirk. buffer is not reset here and instead used in the relative path state.
Otherwise, if buffer is the empty string, set state to relative path start state.
Otherwise, run these steps:
Let host be the result of host parsing buffer.
If host is failure, return failure.
Set url's host to host, buffer to the empty string, and state to relative path start state.
Otherwise, if c is U+0009, U+000A, or U+000D, parse error.
Otherwise, append c to buffer.
If c is ":
" and the
[] flag is unset, run these substeps:
Let host be the result of host parsing buffer.
If host is failure, return failure.
Set url's host to host, buffer to the empty string, and state to port state.
If state override is hostname state, terminate this algorithm.
Otherwise, if c is the
EOF code point, "/
",
"\
", "?
", or
"#
", decrease pointer by one, and
run these substeps:
Let host be the result of host parsing buffer.
If host is failure, return failure.
Set url's host to host, buffer to the empty string, and state to relative path start state.
If state override is given, terminate this algorithm.
Otherwise, if c is U+0009, U+000A, or U+000D, parse error.
Otherwise, run these substeps:
If c is an ASCII digit, append c to buffer.
Otherwise, if c is one of
EOF code point, "/
",
"\
", "?
", and
"#
", or state override is given, run
these substeps:
Remove leading U+0030 code points from buffer until either the leading code point is not U+0030 or buffer is one code point.
Input | Output |
---|---|
"42 " | "42 "
|
"031 " | "31 "
|
"080 " | "80 "
|
"0000 " | "0 "
|
If buffer is equal to url's scheme's default port, set buffer to the empty string.
Set url's port to buffer.
If state override is given, terminate this algorithm.
Set buffer to the empty string, state to relative path start state, and decrease pointer by one.
Otherwise, if c is U+0009, U+000A, or U+000D, parse error.
Otherwise, parse error, return failure.
If c is "\
",
parse error.
Set state to relative path state
and if c is neither "/
" nor
"\
", decrease pointer by one.
If either c is one of
EOF code point, "/
", and
"\
", or state override is not given and
c is one of "?
" and
"#
", run these substeps:
If c is "\
", parse error.
If buffer, lowercased, matches any row in the first column of the following table, set buffer to the contents of the cell in the second column of the matched row:
"%2e " | ". "
|
".%2e " | ".. "
|
"%2e. "
| |
"%2e%2e "
|
If buffer is "..
", pop
url's path, if non-empty, and
then if c is neither "/
" nor
"\
", append the empty string to url's
path.
Otherwise, if buffer is ".
" and
c is neither "/
" nor "\
",
append an empty string to
url's path.
Otherwise, if buffer is not
".
", run these subsubsteps:
If url's scheme is
"file
", url's path
is the empty list, buffer consists of two
code points, of which the first is an ASCII alpha,
and the second is "|
", replace the second code point in
buffer with ":
".
This is a (platform-independent) Windows drive letter quirk. They are beautiful, no?
Append buffer to url's path.
Set buffer to the empty string.
If c is "?
", set
url's query to the empty string,
and state to query state.
If c is "#
", set
url's fragment to the empty string,
and state to fragment state.
Otherwise, if c is U+0009, U+000A, or U+000D, parse error.
Otherwise, run these steps:
If c is not a
URL code point and not "%
",
parse error.
If c is "%
" and remaining does
not start with two ASCII hex digits, parse error.
utf-8 percent encode c using the default encode set, and append the result to buffer.
If c is the EOF code point or
state override is not given and c
is "#
", run these substeps:
If url's relative flag is set, set encoding override to utf-8.
Set buffer to the result of running encoding override's encoder on buffer, with encoding override's encoder's error handling mode set to URL.
For each byte in buffer run these subsubsteps:
If byte is less than 0x21, greater than 0x7E, or is one of 0x22, 0x23, 0x3C, 0x3E, and 0x60, append byte, percent encoded, to url's query.
Otherwise, append a code point whose value is byte to url's query.
Set buffer to the empty string.
If c is "#
", set
url's
fragment to the empty string,
and state to fragment state.
Otherwise, if c is U+0009, U+000A, or U+000D, parse error.
Otherwise, run these substeps:
If c is not a
URL code point and not "%
",
parse error.
If c is "%
" and remaining does
not start with two ASCII hex digits, parse error.
Append c to buffer.
Based on c:
Do nothing.
If c is not a
URL code point and not
"%
", parse error.
If c is "%
" and remaining does
not start with two ASCII hex digits, parse error.
utf-8 percent encode c using the simple encode set, and append the result to url's fragment.
Return url.
The URL serializer takes a parsed URL url, optionally an exclude fragment flag, and then runs these steps:
Let output be url's
scheme and
":
" concatenated.
If url's relative flag is set:
Append "//
" to output.
If url's username is not the empty string or url's password is non-null, run these substeps:
Append url's host, serialized, to output.
If url's port
is not the empty string, append ":
" concatenated with
url's port to
output.
Append "/
" concatenated with the strings in
url's path
(including empty strings), separated from each other by
"/
" to output.
Otherwise, if url's relative flag is unset, append url's scheme data to output.
If url's query is non-null,
append "?
" concatenated with url's
query to output.
If the exclude fragment flag is unset and
url's fragment is
non-null, append "#
" concatenated with
url's fragment to
output.
Return output.
application/x-www-form-urlencoded
The
application/x-www-form-urlencoded
parser
takes a string input using code points in the range U+0000 to U+007F,
optionally with an encoding
encoding override, optionally with a
use "_charset_
" flag, and optionally with an
isindex flag, and then runs these steps:
Let strings be the result of splitting
input on "&
".
If the isindex flag is set and the first string in
strings does not contain a "=
", prepend
"=
" to the first string in strings.
If encoding override is not given, set it to utf-8.
Let pairs be an empty list of name-value pairs.
For each string string in strings, run these substeps:
If string is the empty string, run these substeps again for the next string.
If string contains a "=
", then let
name be the substring of string from
the start of string up to but excluding its first
"=
", and let value be the substring
from the first code point, if any, after the first "=
"
up to the end of string. If "=
" is
the first code point, then name will be the empty
string. If it is the last, then value will be the empty
string.
Otherwise, let name have the value of string and let value be the empty string.
Replace any "+
" in name and
value with U+0020.
If the use "_charset_
" flag is set,
name is "_charset_
", and
get an encoding
for value does not return failure, unset the
use "_charset_
" flag and set
encoding override to the result of
getting an encoding
for value.
Add a pair consisting of name and value to pairs.
Replace each name-value pair in pairs with the result of running encoding override's decoder on the percent decoding of the name-value pair.
Return pairs.
The
application/x-www-form-urlencoded
byte serializer
takes a byte sequence input and then runs these steps:
Let output be the empty string.
For each byte in input, depending on byte:
Append U+002B to output.
Append a code point whose value is byte to output.
Append byte, percent encoded, to output.
Return output.
The
application/x-www-form-urlencoded
serializer
takes a list of name-value pairs pairs, optionally with an
encoding
encoding override, and then runs these steps:
If encoding override is not given, set it to utf-8.
Let output be the empty string.
For each pair in pairs, run these substeps:
Replace pair's name and value with the result of running encoding override's encoder on them, with encoding override's encoder's error handling mode set to <form>, respectively.
Replace pair's name and value with their serialization.
If this is not the first pair, append "&
" to
output.
Append pair's name, followed by
"=
", followed by pair's value to
output.
[Constructor(DOMString url, optional (URL or DOMString) base = "about:blank")] interface URL { static DOMString domainToASCII([EnsureUTF16] DOMString domain); static DOMString domainToUnicode([EnsureUTF16] DOMString domain); }; URL implements URLUtils; [NoInterfaceObject] interface URLUtils { stringifier attribute [EnsureUTF16] DOMString href; readonly attribute DOMString origin; attribute [EnsureUTF16] DOMString protocol; attribute [EnsureUTF16] DOMString username; attribute [EnsureUTF16] DOMString password; attribute [EnsureUTF16] DOMString host; attribute [EnsureUTF16] DOMString hostname; attribute [EnsureUTF16] DOMString port; attribute [EnsureUTF16] DOMString pathname; attribute [EnsureUTF16] DOMString search; attribute URLSearchParams? searchParams; attribute [EnsureUTF16] DOMString hash; }; [NoInterfaceObject] interface URLUtilsReadOnly { stringifier readonly attribute DOMString href; readonly attribute DOMString protocol; readonly attribute DOMString host; readonly attribute DOMString hostname; readonly attribute DOMString port; readonly attribute DOMString pathname; readonly attribute DOMString search; readonly attribute DOMString hash; };
Except where different objects implementing URLUtilsReadOnly
are identical
to objects implementing URLUtils
.
Since all members are readonly and certain members from
URLUtils
are not exposed a number of potential optimizations is possible
compared to objects implementing URLUtils
. These are left as an exercise to
the reader.
Specifications defining objects implementing URLUtils
or
URLUtilsReadOnly
must define a
get the base algorithm, which must return the
appropriate base URL for the object.
Specifications defining objects implementing URLUtils
may
define update steps to make it possible for an
underlying string (such as an
attribute value)
to be updated. The update steps are passed a string
value for this purpose.
An object implementing URLUtils
or URLUtilsReadOnly
has an
associated input (a string),
query encoding
(an encoding),
query object
(a URLSearchParams
object or null), and a
url (a
parsed URL or null).
Unless stated otherwise, query encoding is
utf-8. The others follow from the
set the input algorithm.
The associated query encoding is a legacy concept only relevant for HTML. [HTML]
Specifications defining objects implementing URLUtils
or
URLUtilsReadOnly
must use the
set the input algorithms to set
input, url, and
query object. To
set the input run these steps:
Set url to null.
Set input to the given value.
Let parsed URL be the result of parsing input with base URL being the result of running get the base and query encoding as encoding override.
If parsed URL is not failure, set url to parsed URL.
If url is non-null and its relative flag is set, run these substeps:
If query object is null, set
query object to a
new URLSearchParams
object using
url's query.
Otherwise, set query object's associated list of name-value pairs to the result of parsing url's query.
If url is null and query object is non-null, empty query object's associated list of name-value pairs.
To run the pre-update steps for an object implementing
URLUtils
, optionally given a value, run these steps:
If value is not given, let value be the result of serializing the associated url.
Run the update steps with value.
The
URL(url, base)
constructor must run these steps:
If base is a string, parse base and set base to the result of that algorithm.
If base is failure, throw a
TypeError
exception.
Let result be a new URL
object.
Let result's get the base return base.
Run result's set the input for url.
Return result.
To parse a URL without using a base URL, invoke the constructor with a single argument:
var input = "http://example.org/💩", url = new URL(input) url.pathname // "/%F0%9F%92%A9"
If you rather resolve it against the base URL of a
document, use
baseURI
:
var input = "/💩", url = new URL(input, document.baseURI) url.href // "http://url.spec.whatwg.org/%F0%9F%92%A9"
URL
staticsThe
domainToASCII(domain)
static method must run these steps:
Let internalDomain be the result of splitting domain on any domain label separators.
Let asciiDomain be the result of running domain to ASCII on internalDomain.
If asciiDomain is failure, return domain.
Return asciiDomain, serialized.
The
domainToUnicode(domain)
static method must run these steps:
Let internalDomain be the result of splitting domain on any domain label separators.
Let unicodeDomain be the result of running domain to Unicode on internalDomain.
Return unicodeDomain, serialized.
URLUtils
and URLUtilsReadOnly
membersThe URLUtils
and URLUtilsReadOnly
interfaces are
not exposed on the global object. They are meant to augment other interfaces, such as
URL
.
The href
attribute must run
these steps:
Return the serialization of url.
Setting the href
attribute must
run these steps:
Run the set the input algorithm for the given value.
If the context object is a URL
object and its
url is null,
throw a TypeError
exception.
Run the pre-update steps with the given value.
This means that if the href
attribute is set
to value that would cause the parser to return
failure, that value is still passed through unchanged. This is one of those unfortunate
legacy incidents.
The origin
attribute must
run these steps:
If url is null, return the empty string.
Return the Unicode serialization of url's origin. [ORIGIN]
It returns the Unicode rather than the ASCII serialization for
compatibility with HTML's MessageEvent
feature.
[HTML]
The protocol
attribute
must run these steps:
Setting the protocol
attribute must
run these steps:
If url is null, terminate these steps.
Parse the given value and
":
" concatenated with
url as url and
scheme start state as state override.
Run the pre-update steps.
The username
attribute
must run these steps:
Setting the username
attribute must
run these steps:
If url is null, or its relative flag is unset, terminate these steps.
Set username to the empty string.
For each code point in the given value, utf-8 percent encode it using the username encode set, and append the result to username.
Run the pre-update steps.
The password
attribute
must run these steps:
Setting the password
attribute must
run these steps:
If url is null, or its relative flag is unset, terminate these steps.
If the given value is the empty string, set password to null, run the pre-update steps, and terminate these steps.
Set password to the empty string.
For each code point in the given value, utf-8 percent encode it using the password encode set, and append the result to password.
Run the pre-update steps.
The host
attribute must run
these steps:
If url is null, return the empty string.
If port is the empty string, return host, serialized.
Return host,
serialized,
":
", and port
concatenated.
Setting the host
attribute must run these
steps:
If url is null, or its relative flag is unset, terminate these steps.
Parse the given value with url as url and host state as state override.
Run the pre-update steps.
The hostname
attribute
must run these steps:
If url is null, return the empty string.
Return host, serialized.
Setting the hostname
attribute must
run these steps:
If url is null, or its relative flag is unset, terminate these steps.
Parse the given value with url as url and hostname state as state override.
Run the pre-update steps.
The port
attribute must run
these steps:
Setting the port
attribute must run these
steps:
If url is null, its
relative flag is unset, or its
scheme is "file
",
terminate these steps.
If the given value is the empty string, set
url's port to
"0
".
Otherwise, parse the given value with url as url and port state as state override.
Run the pre-update steps.
The pathname
attribute
must run these steps:
If url is null, return the empty string.
If the relative flag is unset, return scheme data.
Return "/
" concatenated with the strings in
path (including empty strings),
separated from each other by "/
".
Setting the pathname
attribute must
run these steps:
If url is null, or its relative flag is unset, terminate these steps.
Set path to the empty list.
Parse the given value with url as url and relative path start state as state override.
Run the pre-update steps.
The search
attribute must
run these steps:
If url is null, or its query is either null or the empty string, return the empty string.
Return "?
" concatenated with
query.
Setting the search
attribute must run
these steps:
If url is null, or its relative flag is unset, terminate these steps.
If the given value is the empty string, set query to null, set query object's associated list of name-value pairs to the empty list, run the pre-update steps, and terminate these steps.
Let input be the given value with a single leading
"?
" removed, if any.
Set query to the empty string.
Parse input with url as url, query state as state override, and the associated query encoding as encoding override.
Set query object's associated list of name-value pairs to the result of parsing input.
Run the pre-update steps.
The searchParams
attribute must
return the query object.
Setting the searchParams
attribute must run
these steps:
Let object be the given value.
If query object or object is null, terminate these steps.
If object's
url object is not null, set
object to a
new URLSearchParams
object using
object.
Set query object to object.
Run object's update steps.
The hash
attribute must run
these steps:
If url is null, or its fragment is either null or the empty string, return the empty string.
Return "#
" concatenated with
fragment.
Setting the hash
attribute must run these
steps:
If url is null, or its
scheme is
"javascript
", terminate these steps.
If the given value is the empty string, set fragment to null, run the pre-update steps, and terminate these steps.
Let input be the given value with a single leading
"#
" removed, if any.
Set fragment to the empty string.
Parse input with url as url and fragment state as state override.
Run the pre-update steps.
URLSearchParams
[Constructor(optional ([EnsureUTF16] DOMString or URLSearchParams) init = "")]
interface URLSearchParams {
void append([EnsureUTF16] DOMString name, [EnsureUTF16] DOMString value);
void delete([EnsureUTF16] DOMString name);
DOMString? get([EnsureUTF16] DOMString name);
sequence<DOMString> getAll([EnsureUTF16] DOMString name);
boolean has([EnsureUTF16] DOMString name);
void set([EnsureUTF16] DOMString name, [EnsureUTF16] DOMString value);
};
A URLSearchParams
object has an associated list of name-value
pairs, which is initially empty.
A URLSearchParams
object has an associated
url object which is an object
implementing URLUtils
whose
query object is the
URLSearchParams
object, and null if there is no such object.
URLSearchParams
objects always use
utf-8 as
encoding, despite the existence of
concepts such as
query encoding. This is to
encourage developers to migrate towards
utf-8, which they really ought to
have done a long time ago now.
To create a
new URLSearchParams
object, optionally
using init, run these steps:
Let query be a new URLSearchParams
object.
If init is the empty string or null, return query.
If init is a string, set query's associated list of name-value pairs to the result of parsing input.
If init is a URLSearchParams
object, set
query's associated list of name-value pairs to a copy of
init associated list of name-value pairs.
Return query.
A URLSearchParams
object's
update steps are:
If url object is null, terminate these steps.
Set url object's
url's
query to the
serialization of the
URLSearchParams
object's associated list of name-value pairs.
Run url object's pre-update steps.
The
URLSearchParams(init)
constructor must return a
new URLSearchParams
object using
init if given.
The
append(name, value)
method must run these steps:
Append a new name-value pair whose name is name and value is value, to the list of name-value pairs.
Run the update steps.
The
delete(name)
method must run these steps:
Remove all name-value pairs whose name is name.
Run the update steps.
The
get(name)
method must return the value of the first name-value pair whose name is
name, and null if there is no such pair.
The
getAll(name)
method must return the values of all name-value pairs whose name is
name, in list order, and the empty sequence otherwise.
The
set(name, value)
method must run these steps:
If there are any name-value pairs whose name is name, set the value of the first such name-value pair to value and remove the others.
Otherwise, append a new name-value pair whose name is name and value is value, to the list of name-value pairs.
Run the update steps.
The
has(name)
method must return true if there is a name-value pair whose name is
name, and false otherwise.
Thanks to Adam Barth, Alexandre Morgaut, Bobby Holley, Boris Zbarsky, David Sheets, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Henri Sivonen, Ian Hickson, James Graham, James Manger, James Ross, Marcos Cáceres, Martin Dürst, Mathias Bynens, Michael Peick, Michael™ Smith, Peter Occil, Rodney Rehm, Simon Pieters, Tab Atkins, Tantek Çelik, and 成瀬ゆい (Yui Naruse) for being awesome!
While this standard has been written from scratch, special thanks should be extended to the editors of the various specifications that previously defined what we now call URLs: Larry Masinter, Martin Dürst, Michel Suignard, Roy Fielding, and Tim Berners-Lee.