One of the successes of the html5 recommendation standard is the provision of a detailed specification for how to parse html documents. Browser providers have always tried to guess and copy implementations of other browsers, hoping that their parsers will not cause too many problems when processing html documents.
Although some parts of html5 are currently controversial, this part about parsing has been unanimously recognized by browser manufacturers. Once the browser starts implementing it, users can benefit from the compatibility improvements that come with it.
One of the initial implementations of the html5 parsing rules was developed to support the html5 validator. (If you want to test this validator, http://ejohn.org should be legal html5.) This implementation is developed in Java, provides sax and dom interfaces, and is open source.
Interestingly, henri sivonen (the author of the validator) recently developed a brand new html5 parsing engine for Gecko, which will be used in the next version of firefox.
This implementation is actually done by automatically converting the java implementation of Henri's HTML5 parser into C++. This transformation is automatically completed and all changes will be submitted to the mozilla code base.
Generally speaking, when I mention this large-scale programmatic approach to converting the java code base to c++, I will jump out. However, the result is very unexpected: the page loading performance has increased by 3%.
These are based on a series of bug fixes and consistency checks that the code base will provide. You can view the progress of the patch in mozilla's bug library.
If you want to try a new parser (you are unlikely to find many obvious changes, but any effort to find bugs is worthy of thanks.), download a daily build version of firefox, open about:config, and set html5.enable to true.
If you want to upgrade to html5, then now is the time. Because html5 is a superset of the features provided by html4 and xhtml1, upgrading is very easy. You only need to replace the current (x)html document type declaration with the html5 document type.
<!doctype html>
You can find details on how to get the new html5 elements to work on all browsers from the html5 doctor website.