Ticket #31 (new defect)

Opened 2 years ago

Last modified 1 month ago

(To do) Clean up appendix on octal numbers

Reported by: lth Assigned to: anonymous
Type: defect Priority: minor
Milestone: Component: Spec
Version: 3.1 Keywords:
Cc: brendan, jeffdyer, chrispi, david-sarah@jacaranda.org

Description (last modified by lth) (diff)

(Was: RegExp?: where should octal values be allowed?)

Need to clean up appendix B.1 so that it corresponds more to what's actually expected. Also: issues re: different treatment of leading 0 in ToNumber and parseInt.

--- old description -------------

We agreed to allow octal literals. Thus regexes must allow them too.

Current (revised) proposed behavior:

  • An AtomEscape \0[0-7]{1,3} is legal provided the computed value is less than 256; it means a singleton character matcher matching that character value
  • A CharEscape \0[0-7]{1,3} is legal provided the computed value is less than 256; it means a singleton character which either stands alone or participates in a range in the set; if the character value happens to be special character (eg ^ or -) this is of no consequence
  • Octal numbers of arbitrary length can be used for MIN and MAX in a quantifier, eg [0-7]{01,012} means 1-10 repetitions

The "less than 256" restriction is really not necessary. Discuss.

Attachments

Change History

Changed 2 years ago by lth

  • description changed from We agreed to allow octal literals. Thus regexes must allow them too. Summary of proposed behavior (for the moment): * !AtomEscape `\0` is illegal * !AtomEscape `\0[0-7]{1,3}` is legal provided the value < 256 * !CharEscape `\0` means NUL * !CharEscape `\0[0-7]{1,3}` is legal provided the value < 256 * Octal numbers can be used for MIN and MAX in a quantifier, eg `[0-7]{01,012}` means 1-10 repetitions Disallowing `\0` as !AtomEscape is good and/or required because it looks like a backref (it's illegal in 3rd Ed). to We agreed to allow octal literals. Thus regexes must allow them too. Summary of proposed behavior (for the moment): * !AtomEscape `\0` is illegal * !AtomEscape `\0[0-7]{1,3}` is legal provided the value < 256 * !CharEscape `\0` means NUL * !CharEscape `\0[0-7]{1,3}` is legal provided the value < 256 * Octal numbers can be used for MIN and MAX in a quantifier, eg `[0-7]{01,012}` means 1-10 repetitions Disallowing `\0` as !AtomEscape is good and/or required because it looks like a backref (it's illegal in 3rd Ed); there is no limitation on the values in this case.

Changed 2 years ago by lth

  • milestone set to M1

Changed 2 years ago by lth

  • cc set to brendan

Firefox:

  • AtomEscape \0 means singleton char NUL
  • AtomEscape \0nnn means singleton char nnn_8, never backref
  • If a digit sequence starting with 0 contains "8" or "9" then the sequence is taken to be decimal after all
  • Only two octal digits are allowed following \0 in a string... must be a bug?

Changed 2 years ago by lth

More Firefox (2.0.0.3):

  • In running text, 0100 does mean 64 base 10, and \0101 means "A" (65) in regexes, so the string restriction looks like a peculiar restriction
  • Octal escapes inside charsets are subject to the same length restriction as in strings

MSIE 6.0:

  • 0101 in running text means 65
  • In regexes, \0nn is a singleton char pattern
  • \0 is a singleton char pattern
  • In strings, at most three octal digits including the leading 0 is allowed (same as FF)
  • The same restriction applies to octals in regexes, unlike in Firefox...
  • And applies consistently to octals in charsets in regexes

Changed 2 years ago by lth

ActionScript? 3.0:

  • does not allow octal literals in running text
  • does not allow octal escapes in strings
  • **does** allow 3-digit octal escapes in regexes! (also in charsets)
  • ... but not 4-digit octal ditto

Changed 2 years ago by lth

  • description changed from We agreed to allow octal literals. Thus regexes must allow them too. Summary of proposed behavior (for the moment): * !AtomEscape `\0` is illegal * !AtomEscape `\0[0-7]{1,3}` is legal provided the value < 256 * !CharEscape `\0` means NUL * !CharEscape `\0[0-7]{1,3}` is legal provided the value < 256 * Octal numbers can be used for MIN and MAX in a quantifier, eg `[0-7]{01,012}` means 1-10 repetitions Disallowing `\0` as !AtomEscape is good and/or required because it looks like a backref (it's illegal in 3rd Ed); there is no limitation on the values in this case. to We agreed to allow octal literals. Thus regexes must allow them too. Current (revised) proposed behavior: * An !AtomEscape `\0[0-7]{1,3}` is legal provided the computed value is less than 256; it means a singleton character matcher matching that character value * A !CharEscape `\0[0-7]{1,3}` is legal provided the computed value is less than 256; it means a singleton character which either stands alone or participates in a range in the set; if the character value happens to be special character (eg `^` or `-`) this is of no consequence * Octal numbers of arbitrary length can be used for MIN and MAX in a quantifier, eg `[0-7]{01,012}` means 1-10 repetitions The "less than 256" restriction is really not necessary. Discuss.

Changed 2 years ago by lth

  • cc changed from brendan to brendan, jeffdyer, chrispi

BTW, I feel disinclined to formalize the quasi-error-correcting behavior whereby "\097" turns into "97" (either in regexes, strings, or running text), though I see how it can result from a particular formalization of the grammar eg

"\0" [0-7]+ (?![0-9])

so that a normal interpretation of the lexical grammar (ie, top-down matching among rules with backtracking) will do exactly that, provided DecimalEscape does not have a restriction on no-leading-zero.

Opinions, please.

Changed 2 years ago by brendan

I have a vague memory that \097 being interpreted as a string containing three chars: {NUL, '9', '7'} was important for web compatibility. Your MSIE summary didn't talk about decimal (or I misread it), but my testing of IE7 just now shows it turning "\097" into the same string as Firefox.

I take it Opera didn't do this and got away with it, but that might just be an unreported bug for Chris to fix ;-).

The peculiar at most 3 octal digits including leading 0 string lexing rule is probably an old bug to fix for Mozilla and Microsoft.

My \02 cents.

/be

Changed 2 years ago by lth

Nah, Opera does it too, I had just forgotten. Too long ago :-)

Another observation about octal: FF, IE, and Opera all evaluate 097 + 1 => 98, so this is probably the strange fallback I remembered.

My goal tomorrow is to write up a coherent and backwards compatible Octal proposal in the TG1 wiki. Wish me luck.

Changed 2 years ago by jeffdyer

  • owner deleted
  • priority changed from minor to major
  • component changed from Proposals to Spec
  • milestone changed from M1 to M4

resolution: we will not include octal in the normative language, but want to keep it in an informative annex as in ES3. Morphing to a spec bug.

Changed 2 years ago by lth

  • description changed from We agreed to allow octal literals. Thus regexes must allow them too. Current (revised) proposed behavior: * An !AtomEscape `\0[0-7]{1,3}` is legal provided the computed value is less than 256; it means a singleton character matcher matching that character value * A !CharEscape `\0[0-7]{1,3}` is legal provided the computed value is less than 256; it means a singleton character which either stands alone or participates in a range in the set; if the character value happens to be special character (eg `^` or `-`) this is of no consequence * Octal numbers of arbitrary length can be used for MIN and MAX in a quantifier, eg `[0-7]{01,012}` means 1-10 repetitions The "less than 256" restriction is really not necessary. Discuss. to (Was: RegExp: where should octal values be allowed?) Need to clean up appendix B.1 so that it corresponds more to what's actually expected. Also: issues re: different treatment of leading 0 in `ToNumber` and `parseInt`. --- old description ------------- We agreed to allow octal literals. Thus regexes must allow them too. Current (revised) proposed behavior: * An !AtomEscape `\0[0-7]{1,3}` is legal provided the computed value is less than 256; it means a singleton character matcher matching that character value * A !CharEscape `\0[0-7]{1,3}` is legal provided the computed value is less than 256; it means a singleton character which either stands alone or participates in a range in the set; if the character value happens to be special character (eg `^` or `-`) this is of no consequence * Octal numbers of arbitrary length can be used for MIN and MAX in a quantifier, eg `[0-7]{01,012}` means 1-10 repetitions The "less than 256" restriction is really not necessary. Discuss.
  • summary changed from RegExp: where should octal values be allowed? to Clean up appendix on octal numbers

Changed 1 year ago by lth

  • priority changed from major to trivial
  • summary changed from Clean up appendix on octal numbers to (To do) Clean up appendix on octal numbers

Changed 1 month ago by David-Sarah Hopwood

  • cc changed from brendan, jeffdyer, chrispi to brendan, jeffdyer, chrispi, david-sarah@jacaranda.org
  • priority changed from trivial to minor
  • version changed from 4 to 3.1
  • type changed from enhancement to defect
  • milestone deleted

"issues re: different treatment of leading 0 in ToNumber? and parseInt." -- is this relevant to ES3.1?

Note: See TracTickets for help on using tickets.