HTML 5

A central location for HTML5 news and updates

Please read this sites disclaimer before you contact me with any concerns.

http://www.htmlfive.net/feed/rss/

Entries Tagged as 'Weekly Review'

This Week in HTML 5 - Episode 17

December 29th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a major revamp of table headers, following up from the last major edits last March. Ian summarizes the most recent round of changes:

  • Header cells can now themselves have headers.
  • I have reversed the way the algorithm is presented, such that it starts from a cell and reports the headers rather than generating the list of headers for each cell on a header-by-header basis.
  • If headers=”" points to a <td> element, the association is set up, but I have left this non-conforming to help authors catch mistakes.
  • Header cells that are automatically associating do not stop associating when they hit equivalent cells unless they have also hit a <td> first.
  • The "col" and "row" scope values now act like the implied auto value except that they force the direction.
  • Empty header cells don’t get automatically associated.
  • I have removed the wide header cell heuristic.
  • I have made headers=”" use the same ID discovery mechanism as getElementById(), to avoid implementations having to support multiple such mechanisms.
  • Finally, I have made the spec define if a header is a column header or a row header in the case where scope="" is omitted.
  • I haven’t added summary=”" on table; nothing particularly new has been raised on the topic since the last times I looked at this.

Accessibility advocates are disappointed by the continued non-inclusion of the summary attribute. Their reasoning is that “the summary attribute is a very, very practical and useful attribute,” despite their own user testing that shows otherwise. As Ian put it, “I am hesitant to include a feature like summary=”" when all evidence seems to point to it being widely misused by authors and ignored by the users it intends to help.” As with all issues, this is not the final word on the matter, but it’s where we stand today.

In other news, r2566 addresses a very subtle issue with fetching images. The problem stems from the following (arguably pointless) markup: <img src=""> A fair number of web pages actually try to declare an image with an empty src attribute. According to the HTTP and URL specifications, this markup means that there is an image at the same address as the HTML document — a theoretically possible but highly unlikely scenario. Internet Explorer apparently catches this mistake and just silently drops the image. Other browsers do not; they will actually try to fetch the image, which results in a “duplicate” request for the page (once to successfully retrieve the page, and again to unsuccessfully retrieve the image).

Boris Zbarsky, a leading Mozilla developer, states

We (Gecko) have had 28 independent bug reports filed (with people bothering to create an account in the bug database, etc) about the behavior difference from IE here. That’s a much larger number of bug reports than we usually get about a given issue. I can’t tell you why this pattern is so common (e.g. whether some authoring frameworks produce it in some cases), but it seems that a number of web developers not only produce markup like this but notice the requests in their HTTP logs and file bugs about it.

r2566 addresses the issue by special-casing <img src> to allow browsers to ignore an image if its fetch request would result in fetching exactly the same URL as its HTML document:

When an img is created with a src attribute, and whenever the src attribute is set subsequently, the user agent must fetch the resource specifed by the src attribute’s value, unless the user agent cannot support images, or its support for images has been disabled, or the user agent only fetches elements on demand, or the element’s src attribute has a value that is an ignored self-reference.

The src attribute’s value is an ignored self-reference if its value is the empty string, and the base URI of the element is the same as the document’s address.

Other interesting tidbits this week:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review

This week in HTML 5 - Episode 16

December 18th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is r2529, which makes so many changes that I had to ask Ian to explain it to me. This is what he said:

Someone asked for onbeforeunload, so I started fixing it. Then I found that there was some rot in the drywall. So I took down the drywall. Then I found a rat infestation. So I killed all the rats. Then I found that the reason for the rot was a slow leak in the plumbing. So I tried fixing the plumbing, but it turned out the whole building used lead pipes. So I had to redo all the plumbing. But then I found that the town’s water system wasn’t quite compatible with modern plumbing techniques, and I had to dig up the entire town. And that’s basically it.

“Amusing, in a quiet way,” said Eeyore, “but not really helpful.”

Basically, the way that scripts are defined has changed dramatically. Not in an terribly incompatible way, just a clearer definition that paves the way for better specification of certain properties of script (and noscript). Let’s start with the new definition of a script:

A script has:

A script execution environment

The characteristics of the script execution environment depend
on the language, and are not defined by this specification.

In JavaScript, the script execution environment
consists of the interpreter, the stack of execution
contexts
, the global code and function code and
the Function objects resulting, and so forth.

A list of code entry-points

Each code entry-point represents a block of executable code
that the script exposes to other scripts and to the user
agent.

Each Function object in a JavaScript
script execution environment has a corresponding code
entry-point, for instance.

The main program code of the script, if any, is the
initial code entry-point. Typically, the code
corresponding to this entry-point is executed immediately after
the script is parsed.

In JavaScript, this corresponds to the
execution context of the global code.

A relationship with the script’s global object

An object that provides the APIs that the code can use.

This is typically a Window
object. In JavaScript, this corresponds to the global
object
.

When a script’s global object is an
empty object, it can’t do anything that interacts with the
environment.

A relationship with the script’s browsing context

A browsing context that is assigned responsibility
for actions taken by the script.

When a script creates and navigates a new top-level browsing
context
, the opener
attribute of the new browsing context’s
Window object will be set to the script’s
browsing context
’s Window object.

A character encoding

A character encoding, set when the script is created, used to
encode URLs. If the character encoding is
set from another source, e.g. a document’s character
encoding
, then the script’s character encoding
must follow the source, so that if the source’s changes, so does
the script’s.

A base URL

A URL, set when the script is created, used to
resolve relative URLs. If the base URL is
set from another source, e.g. a document base URL,
then the script’s base URL must follow the source, so
that if the source’s changes, so does the script’s.

Membership in a script group

A group of one or more scripts that are loaded in the same
context, which are always disabled as a group. Scripts in a script
group all have the same global object and browsing context.

A script group can be frozen. When a script group is
frozen, any code defined in that script group will throw an
exception when invoked. A frozen script group can be
unfrozen, allowing scripts in that script group to run
normally again.

The most interesting part of this new definition is the script group, a new concept which now governs all scripts. When a Document is created, it gets a fresh script group, which contains all the scripts that are defined (or are later created somehow) in the document. When the user navigates away from the document, the entire script group is frozen, and browsers should not execute those scripts anymore. This sounds like an obvious statement if you think of documents as individual browser windows (or tabs), but consider the case of a document with multiple frames, or one with an embedded iframe. Suppose that the user clicks some link within the iframe that only navigates to a new URL within the iframe (i.e. the parent document stays the same). The parent document may have some reference to functions defined in the old iframe. Should it still be able to call these functions? IE says no; other browsers say yes. HTML 5 now says no, because when the iframe navigates to a new URL, the old iframes script group is frozen — even if there are active references to those scripts (say, from the parent document), browsers shouldn’t allow the page to execute them.

The main benefit of this new concept of script groups is that it removes a number of complications faced by the non-IE browsers. For example, it prevents the problem of scripts suddenly discovering that their global object is no longer the object that they think of as the Window object. Script groups are also frozen when calling document.open(). Freezing script groups also defines the point at which timers and other callbacks are reset, which is something that previous versions of HTML had never defined.

And after all of this ripping up and redefining, HTML 5 now defines the onbeforeunload event, which is already supported by major browsers.

Other interesting tidbits this week:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: WHATWG · Weekly Review

This Week in HTML 5 - Episode 15

December 10th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is the disintegration of HTTP authentication from HTML forms (which was last week’s big news). As I predicted, the proposal generated a healthy discussion, but a combination of security concerns and concerns about tight coupling ultimately did in the proposal.

In its place, r2470 includes the following conformance requirement to allow for the possibility of someone specifying such a scheme in the future (hat tip: Robert Sayre):

HTTP 401 responses that do not include a challenge recognised by the user agent must be processed as if they had no challenge, e.g. rendering the entity body as if the response had been 200 OK.

User agents may show the entity body of an HTTP 401 response even when the response do include a recognised challenge, with the option to login being included in a non-modal fashion, to enable the information provided by the server to be used by the user before authenticating. Similarly, user agents should allow the user to authenticate (in a non-modal fashion) against authentication challenges included in other responses such as HTTP 200 OK responses, effectively allowing resources to present HTTP login forms without requiring their use.

Continuing with the web forms work, the <input> element has gained a new type: a color picker, marked up as <input type=color>. Browser vendors are encouraged to integrate this field with platform-native color pickers, as appropriate. As with all new input types, browsers that do not explicitly recognize the new type will default to a simple text field.

The <audio> and <video> API continues to churn rapidly. Implementors should probably ignore it altogether until it’s been stable for two consecutive weeks. To wit: r2493 removes the pixelratio attribute, originally proposed to allow authors to override the display of videos known to be encoded with incorrect an aspect ratio. r2498 adds the playing event, fired when playback as started. r2489 drops the HAVE_SOME_DATA readyState. I will try to write up a comprehensive summary of this API once its stabilizes.

Other interesting tidbits this week:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: WHATWG · Weekly Review

This Week in HTML 5 - Episode 14

November 25th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a radical proposal for integrating HTTP authentication with HTML forms. r2432 defines a new token for the WWW-Authenticate header: “HTML“.

A common use for forms is user authentication. To indicate that
an HTTP URL requires authentication through such a form
before use, the HTTP 401 response code with a WWW-Authenticate challenge “HTML” may be used.

For this authentication scheme, the framework defined in RFC2617
is used as follows. [RFC2617]

challenge = "HTML” [ form ]
form      = “form” “=form-name
form-name = quoted-string

The form parameter, if
present, indicates that the first form element in the
entity body whose name is the
specified string, in tree order, if any, is the login
form. If the parameter is omitted, then the first form
element in the entity body, in tree order, if any, is
the login form.

There is no credentials production for this
scheme because the login information is to be sent as a normal form
submission and not using the Authorization
HTTP header.

This idea has been kicked around for more than a decade. Microsoft wrote User Agent Authentication Forms in 1999. Mark Nottingham asked the WHATWG to investigate the idea in 2004. Better late than never, Ian Hickson summarizes the feedback to date. No doubt this new proposal will generate further discussion. No browsers currently support this proposal.

Other interesting tidbits this week:

  • r2429 adds the <input type=search> form field. [<input type=search> discussion]
  • r2440 allows the multiple attribute to appear on <input type=email> and <input type=file>. [<input type=email multiple> discussion]
  • r2423 specifies how <object> elements are submitted in forms. Unbeknownst to me, this feature was present in HTML 4 and is supported across multiple browsers. If a plugin exposes a value getter, the name of the <object> element is submitted with the value exposed by the plugin. [<object> form submission example, Mozilla bug 188938]
  • r2434 seriously revamps the concept of “vaguer moments in time.” r2433 notes, correctly, that there is no year zero in the Gregorian calendar. r2437 further refines the calculation of dates before 1582. [date and time discussion]
  • r2426 clarifies the fallback behavior of the <object> element.
  • r2427 documents existing browser behavior in sending all attributes and attribute values to a plugin invoked from an <object> element. Previously, HTML 5 has specified that only specific parameters were sent, but browsers consistently send all attributes, so there it is.
  • r2424 explains the intended audience of the HTML 5 specification itself.

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review

This Week in HTML 5 - Episode 13

November 18th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a major revamping of how browsers should process multimedia in the <audio> and <video> elements.

r2404 makes a number of important changes. First, the canPlayType() method has moved from the navigator object to HTMLMediaElement (i.e. a specific <audio> or <video> element), and it now returns a string rather than an integer. [canPlayType() discussion]

The canPlayType(type) method must return the string “no” if type is a type that the user agent knows it cannot render; it must return “probably” if the user agent is confident that the type represents a media resource that it can render if used in with this audio or video element; and it must return “maybe” otherwise. Implementors are encouraged to return “maybe” unless the type can be confidently established as being supported or not. Generally, a user agent should never return “probably” if the type doesn’t have a codecs parameter.

Wait, what codecs parameter? That’s the second major change: the <source type> attribute (which previously could only contain a MIME type like “video/mp4″, which is insufficient to determine playability) can now contain a MIME type and a codecs parameter. As specified in RFC 4281, the codecs parameter specifies the specific codecs used by the individual streams within the audio/video container. The section on the type attribute contains several examples of using the codecs parameter.

Third, the <source type> attribute is now optional. If you aren’t sure what kind of video you’re serving, you can just throw one or more <source> elements into a <video> element and the browser will try each of them in the order specified [r2403] until it finds something it can play. [load() algorithm discussion] Of course, if you include a type attribute (and codecs parameter within it), the browser may use it to determine playability without loading multiple resources, but this is no longer required.

The final change (this week) to multimedia elements is the elimination of the start, end, loopstart, loopend, and playcount attributes. They are all replaced by a single attribute, loop, which takes a boolean. To handle initially seeking to a specific timecode (like the now-eliminated start attribute), the HTML 5 spec vaguely declares, “For example, a fragment identifier could be used to indicate a start position.” This obviously needs further specification.

One multimedia-related issue that did not change in the spec this week is same-origin checking for media elements. Robert O’Callahan asked whether video should be allowed to load from another domain, noting (correctly) that it could lead to information leakage about files posted on private intranets. Chris Double outlines the issues and some proposed solutions. However, contrary to Chris’ expectation, HTML 5 will not (yet) mandate cross-site restrictions for audio/video files. This is an ongoing discussion. [WHATWG discussion thread, Theora discussion thread]

In other news, Ian Hickson summarized the discussion around the <input placeholder> attribute (which I first mentioned in This Week in HTML 5 Episode 8) and committed r2409 that defines the new attribute:

The placeholder attribute represents a short hint (a word or short phrase) intended to aid the user with data entry. A hint could be a sample value or a brief description of the expected format.

For a longer hint or other advisory text, the title attribute is more appropriate.

The placeholder attribute should not be used as an alternative to a label.

User agents should present this hint to the user only when the element’s value is the empty string and the control is not focused (e.g. by displaying it inside a blank unfocused control).

Read the section on the placeholder attribute for an example of its proper use.

Other interesting tidbits this week:

Around the web:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review

This Week in HTML 5 - Episode 12

November 10th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. The primary editor was traveling this week, so there are very few spec changes to discuss. Instead, I’d like to try something a little different.

It has been suggested (1, 2, 3, &c.) that HTML 5 is trying to bite off more than it can metaphorically chew. It is true that it is a large specification, and it might benefit from being split into several pieces. But it is not true that it includes everything but the kitchen sink.

For example, HTML 5 will not

Daniel Schattenkirchner asked whether Almost-Standards mode is still needed. Almost-Standards mode is a form of DOCTYPE sniffing invented by Mozilla in 2002 to address line heights in table cells containing images. Bug 153032 implemented the new mode, which Mozilla called “Almost Standards mode” and HTML 5 calls “limited quirks mode.” Henri Sivonen made the point that it would probably be too expensive to get rid of the mode. Like it or not, we’re probably stuck with it.

And finally, a gem that I missed when it was first discussed: back in July, “Lars” provided the best documentation of the <keygen> element, ever.

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review

The Road to HTML 5 - Episode 1: the section element

November 5th, 2008 · No Comments

Welcome to a new semi-regular column, “The Road to HTML 5,” where I’ll try to explain some of the new elements, attributes, and other features in the upcoming HTML 5 specification.

The element of the day is the <section> element.

The section element represents a generic document or application section. A section, in this context, is a thematic grouping of content, typically with a header, possibly with a footer. Examples of sections would be chapters, the various tabbed pages in a tabbed dialog box, or the numbered sections of a thesis. A Web site’s home page could be split into sections for an introduction, news items, contact information.

Discussion of sections and headers dates back several years. In November 2004, Ian Hickson wrote:

Basically I want three things:

  1. It has to be possible to take existing markup (which correctly uses <code ><h1>-<code ><h6>) and wrap the sections up with <code ><section> (and the other new section elements) and have it be correct markup. Basically, allowing authors to replace <code ><div class=”section”> with <code ><section>, <code ><div class=”post”> with <code ><article>, etc.
  2. It has to be possible to write new documents that use the section elements and have the headers be automatically styled to the right depth (and maybe automatically numbered, with appropriate CSS), and yet still be readable in legacy UAs, without having to think about old UAs. Basically, the header element has to be header-like in old browsers.
  3. It shouldn’t be too easy to end up with meaningless markup when doing either of the above. So a random <code ><h4> in the middle of an <code ><h2> and an <code ><h3> has to be defined as meaning _something_.

At the moment what I’m thinking of doing is this (most of these ideas are in the draft at the moment, but mostly in contradictory ways):

The section elements would be:

<code ><body> <code ><section> <code ><article> <code ><navigation> <code ><sidebar>

The header elements would be:

<code ><header> <code ><h1> <code ><h2> <code ><h3> <code ><h4> <code ><h5> <code ><h6>

<code ><h1> gives the heading of the current section.

<code ><header> wraps block-level content to mark the whole thing as a header, so that you can have, e.g., subtitles, or “Welcome to” paragraphs before a header, or “Presented by” kind of information. <code ><header> is equivalent to an <code ><h1>. The first highest-level header in the <code ><header> is the “title” of the section for outlining purposes.

<code ><h2> to <code ><h6> are subsection headings when used in <code ><body>, and equivalent to <code ><h1> when used in one of the section elements.

<code ><h1> automatically sizes to fit the current nesting depth. This could be a problem in CSS since CSS can’t handle this kind of thing well — it has no “or” operator at the simple selector level.

<code ><h2>-<code ><h6> keep their legacy renderings for compatibility.

Further discussion:

Fast-forward to modern times. Using the <section> element instead of, say, <div class="section">, seems like a no-brainer. Unfortunately, there’s a catch. (Hey, it’s the web; there’s always a catch.) Not all modern browsers recognize the <section> element, which means that they fall back to their default handling of unknown elements.

A long digression into browsers’ handling of unknown elements

Every browser has a master list of HTML elements that it supports. For example, Mozilla Firefox’s list is stored in nsElementTable.cpp. Elements not in this list are treated as “unknown elements.” There are two fundamental problems with unknown elements:

  1. How should the element be styled? By default, <p> has spacing on the top and bottom, <blockquote> is indented with a left margin, and <h1> is displayed in a larger font.
  2. What should the element’s DOM look like? Mozilla’s nsElementTable.cpp includes information about what kinds of other elements each element can contain. If you include markup like <p><p>, the second paragraph element implicitly closes the first one, so the elements end up as siblings, not parent-and-child. But if you write <p><span>, the span does not close the paragraph, because Firefox knows that <p> is a block element that can contain the inline element <span>. So the <span> ends up as a child of the <p> in the DOM.

Different browsers answer these questions in different ways. (Shocking, I know.) Of the major browsers, Microsoft Internet Explorer’s answer to both questions is the most problematic.

The first question should be relatively simple to answer: don’t give any special styling to unknown elements. Just let them inherit whatever CSS properties are in effect wherever they appear on the page, and let the page author specify all styling with CSS. Unfortunately, Internet Explorer does not allow styling on unknown elements. For example, if you had this markup:

<style type="text/css">
  section { border: 1px solid red }
</style>
...
<section>
<h1>Welcome to Initech</h1>
<p>This is our <span>home page</span>.</p>
</section>

Internet Explorer (up to and including IE8 beta 2) will not put a red border around the section.

The second problem is the DOM that browsers create when they encounter unknown elements. Again, the most problematic browser is Internet Explorer. If IE doesn’t explicitly recognize the element name, it will insert the element into the DOM as an empty node with no children. All the elements that you would expect to be direct children of the unknown element will actually be inserted as siblings instead. I’ve posted an ASCII graph that illustrates this mismatch.

Sjoerd Visscher discovered a workaround for this problem: after you create a dummy element with that name, IE will recognize the element enough to let you style it with CSS. You can put the script in the <head> of your page, and there is no need to ever insert it into the DOM. Simply creating the element once (per page) is enough to teach IE to style the element it doesn’t recognize. Sample code and markup:

<html>
<head>
<style type="text/css">
  section { display: block; border: 1px solid red }
</style>
<script type=”text/javascript”>
  document.createElement(”section”);
</script>
</head>
<body>
<section>
<h1>Welcome to Initech</h1>
<p>This is our <span>home page</span>.</p>
</section>
</body>
</html>

This hack works in IE 6, IE 7, and IE 8 beta 1, but it doesn’t work in IE 8 beta 2. (bug report, test case) The purpose of this illustration is not to blame IE; there’s no specification that says what the DOM ought to look like in this case, so IE’s handling of the “unknown element” problem is not any more or less correct than any other browser. With the createElement workaround, you can use the <section> element (or any other new HTML 5 element) in all browsers except IE 8 beta 2. I am not aware of any workaround for this problem.

And in conclusion

The <section> element is a very straightforward HTML 5 feature that you can’t actually use yet.

[Read more →]

Tags: Tutorials · Weekly Review

This Week in HTML 5 - Episode 11

November 3rd, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Last Friday was Halloween for some of you; in the United States, it involves dressing up in slutty costumes, begging your neighbors for handouts, and getting diabetes. Yesterday, many of you set your clocks back one hour for Daylight Savings Time. And for those of you on the Gregorian calendar, it is now November.

Dates and times loom large in this week’s updates. “What is today’s date?” is a deceptively simple question, matched in complexity only by the related question, “What time is it?” Sources for Time Zone and Daylight Saving Time Data gives a good overview of the current state of the art for answering both questions. In the movie Crocodile Dundee, Mick says he once asked an Aboriginal elder when he was born; the elder replied, “in the summertime.”

r2381 defines global dates and times:

A global date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, expressed with a time zone, consisting of a number of hours and minutes.

r2382 defines local dates and times:

A local date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone.

r2383 defines a month:

A month consists of a specific Gregorian date with no timezone information and no date information beyond a year and a month.

r2384 and r2385 define a week:

A week consists of a week-year number and a week number representing a seven day period. Each week-year in this calendaring system has either 52 weeks or 53 weeks, as defined below. A week is a seven-day period. The week starting on the Gregorian date Monday December 29th 1969 (1969-12-29) is defined as week number 1 in week-year 1970. Consecutive weeks are numbered sequentially. The week before the number 1 week in a week-year is the last week in the previous week-year, and vice versa.

A week-year with a number year that corresponds to a year year in the Gregorian calendar that has a Thursday as its first day (January 1st), and a week-year year where year is a number divisible by 400, or a number divisible by 4 but not by 100, has 53 weeks. All other week-years have 52 weeks.

The week number of the last day of a week-year with 53 weeks is 53; the week number of the last day of a week-year with 52 weeks is 52.

Note: The week-year number of a particular day can be different than the number of the year that contains that day in the Gregorian calendar. The first week in a week-year year is the week that contains the first Thursday of the Gregorian year year.

<input> form elements can be declared to take a local date and time, a global date and time, a date, a time, a month, or a week. You can also declare a global date and time in a <time> element or in the datetime attribute of <ins> and <del>.

HTML 5 does not define weekends or holidays, and therefore does not define business days. Interstellar datekeeping has been pushed back to HTML 6.

In other news, Chris Wilson suggested a different strategy for the much-maligned <q> element, which kicked off a long discussion, which in turn spawned several tangential discussions: <q> and commas, <q> vs <p>, UA style sheet for <q>, <q addmarks=true>, and the overly-optimistically-titled Final thoughts on <q>. The basic problem is that, while HTML 4 clearly states that user agents should render with delimiting quotation marks, Microsoft Internet Explorer (prior to IE8b2) did not do so. IE8b2 does do so, but it falls back to client-side regional settings to display quotation marks in pages where the author has not specified the language (which is the vast majority of pages). Also, in some languages, convention dictates alternating single and double quotes for nested quotations, but HTML 4 did not specify how to handle this, and different browsers handle nested quotation marks in different ways.

Other interesting tidbits this week:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review

This Week in HTML 5 - Episode 10

October 20th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is offline caching. This has been in HTML 5 for a while, but this week Ian Hickson caught up with his email and integrated all outstanding feedback. He summarizes the changes:

  • Made the online whitelist be prefix-based instead of exact match. [r2337]
  • Removed opportunistic caching, leaving only the fallback behavior part. [r2338]
  • Made fallback URLs be prefix-based instead of only path-prefix based (we no longer ignore the query component). [r2343]
  • Made application caches scoped to their browsing context, and allowed iframes to start new scopes. By default the contents of an iframe are part of the appcache of the parent, but if you declare a manifest, you get your own cache. [r2344]
  • Made fallback pages have to be same-origin (security fix). [r2342]
  • Made the whole model treat redirects as errors to be more resilient in the face of captive portals when offline (it’s unclear what else would actually be useful and safe behavior anyway). [r2339]
  • Fixed a bunch of race conditions by redefining how application caches are created in the first place. [r2346]
  • Made 404 and 410 responses for application caches blow away the application cache. [r2348]
  • Made checking and downloading events fire on ApplicationCache objects that join an update process midway. [r2353]
  • Made the update algorithm check the manifest at the start and at the end and fail if the manifest changed in any way. [r2350]
  • Made errors on master and dynamic entries in the cache get handled in a non-fatal manner (and made 404 and 410 remove the entry). [r2348]
  • Changed the API from .length and .item() to .items and .hasItem(). [r2352]

And now, a short digression into video formats…

You may think of video files as “AVI files” or “MP4 files”. In reality, “AVI” and “MP4″ are just container formats. Just like a ZIP file can contain any sort of file within it, video container formats only define how to store things within them, not what kinds of data are stored. (It’s a little more complicated than that, because container formats do limit what codecs you can store in them, but never mind.) A video file usually contains multiple tracks — a video track (without audio), one or more audio tracks (without video), one or more subtitle/caption tracks, and so forth. Tracks are usually inter-related; an audio track contains markers within it to help synchronize the audio with the video, and a subtitle track contains time codes marking when each phrase should be displayed. Individual tracks can have metadata, such as the aspect ratio of a video track, or the language of an audio or subtitle track. Containers can also have metadata, such as the title of the video itself, cover art for the video, episode numbers (for television shows), and so on.

Individual video tracks are encoded with a certain video codec, which is the algorithm by which the video was authored and compressed. Modern video codecs include H.264, DivX, VC-1, but there are many, many others. Audio tracks are also encoded in a specific codec, such as MP3, AAC, or Ogg Vorbis. Common video containers are ASF, MP4, and AVI. Thus, saying that you have sent someone an “MP4 file” is not specific enough for the recipient to determine if they can play it. The recipient needs to know the container format (such as MP4 or AVI), but also the video codec (such as H.264 or Ogg Theora) and the audio codec (such as MP3 or Ogg Vorbis). Furthermore, video codecs (and some audio codecs) are broad standards with multiple profiles, so saying that you have sent someone an “MP4 file with H.264 video and AAC audio” is still not specific enough. An iPhone can play MP4 files with “baseline profile” H.264 video and “low complexity” AAC audio. (These are well-defined technical terms, not laymen’s terms.) Desktop Macs can play MP4 files with “main profile” H.264 video and “main profile” AAC audio. Adobe Flash can play MP4 files with “high profile” H.264 video and “HE” AAC audio. Of course, it’s a little more complicated than that.

Thus…

r2332 adds a navigator.canPlayType() method. This is intended for scripts to query whether the client can play a certain type of video. There are two major problems with this: first, MIME types are not specific enough, as they will only describe the video container. Learning that the client “can play” MP4 files is useless without knowing what video codecs it supports inside the container, not to mention what profiles of that video codec it supports. The second problem is that, unless the browser itself ships with support for specific video and audio codecs (as Firefox 3.1 will do with Ogg Theora and Ogg Vorbis), it will need to rely on some multimedia library provided by the underlying operating system. Windows has DirectShow, Mac OS X has QuickTime, but neither of these libraries can actually tell you whether a codec is supported. The best you can do is try to play the video and notice if it fails. [WHATWG thread]

Other interesting changes and discussions this week:

  • r2333 changes the data type of the width and height attributes on <embed>, <object>, and <iframe> elements to match current browser behavior. These attributes reflect strings, not integers. No one knows why.

  • Ian Hickson kicked off another round of video accessibility discussion, with this philosophical statement:

    Fundamentally, I consider <video> and <audio> to be simply windows onto pre-existing content, much like <iframe>, but for media data instead of for “pages” or document data. Just as with <iframe>s, the principle I had in mind is that it should make sense for the user to take the content of the element and view it independent of its hosting page. You should be able to save the remote file locally and open it in a media player and you should be able to write a new page with a different media player interface, without losing any key aspect of the media. In particular, any accessibility features must not be lost when doing this. For example, if the video has subtitles or PiP hand language signing, or multiple audio tracks, or a transcript, or lyrics, or metadata, _all_ of this data should survive even if the video file is saved locally without the embedding page.

    In other words, video accessibility should be handled within the video container, not in the surrounding HTML markup. On the plus side, all modern video containers can handle subtitle tracks, secondary audio tracks, and so forth. Unfortunately, authors may be hesitant to add to their bandwidth costs by including tracks that must be downloaded by everyone but appreciated (or even noticed) by very few.

    [W3C discussion thread on video accessibility]

  • Sander van Zoest noticed the pixelaspectratio attribute of the <video> element, and he asked why it was a float instead of a ratio of two rationals (as is standard practice in the video authoring world). Ultimately, he agreed with Eric Carlson that pixelaspectratio should be dropped from HTML 5 because it doesn’t really give enough information about how to scale the video properly. As with so many other things in the video world, the problem is much more complicated that it first appears.

Around the web:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review

This Week in HTML 5 - Episode 9

October 14th, 2008 · No Comments

Welcome back to “This Week in HTML 5,” where I’ll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

Most of the changes in the spec this week revolve around the <textarea> element.

Shelley Powers pointed out that I haven’t mentioned the issue of distributed extensibility yet. (The clearest description of the issue is Sam Ruby’s message from last year, which spawned a long discussion.) The short version: XHTML (served with the proper MIME type, application/xhtml+xml) supports embedding foreign data in arbitrary namespaces, including SVG and MathML. None of these technologies (XHTML, SVG, or MathML) have had much success on the public web. Despite Chris Wilson’s assertion that “we cannot definitively say why XHTML has not been successful on the Web,” I think it’s pretty clear that Internet Explorer’s complete lack of support for the application/xhtml+xml MIME type has something to do with it. (Chris is the project lead on Internet Explorer 8.)

Still, it is true that XHTML does support distributed extensibility, and many people believe that the web would be richer if SVG and MathML (and other as-yet-unknown technologies) could be embedded and rendered in HTML pages. The key phrase here is “as-yet-unknown technologies.” In that light, the recent SVG-in-HTML proposal (which I mentioned several weeks ago) is beside the point. The point of distributed extensibility is that it does not require approval from a standards body. “Let a thousand flowers bloom” and all that, where by “flowers,” I mean “namespaces.” This is an unresolved issue.

Other interesting changes this week:

  • r2314 ensures that the required attribute only applies to form controls whose value can change.
  • r2316 defines the name attribute for form controls.
  • r2317 defines the disabled attribute for form controls.
  • r2320 defines all the different ways that a form control can fail to satisfy its constraints. For example, an <input maxlength=20> element with a 21-character value.
  • r2322 defines exactly how form data should be encoded before being submitted to the server. I’ve previously mentioned character encoding in this series; this revision marks the first time that an HTML specification has acknowledged the existence of <input type=hidden name=_charset_> method of specifying the character encoding of submitted form data.
  • r2319 removes support for data templates and repetition templates. These were inventions in the original Web Forms 2 specification, but they were never picked up by any major browser.

Around the web:

Tune in next week for another exciting episode of “This Week in HTML 5.”

[Read more →]

Tags: Weekly Review