The CERN SGML parser that is used in Viola has been extended to know about tag element content models. This is necessary for it to properly generate (imply) end tags at the correct point...

The DTD that is hard-coded for CERN's parser really does conform (basically) to the HTML+ DTD.


If you examine the HTML source for the following sample HTML, you'll see that all the tags are explicitly end-tagged.


This example shows that P tags are allowed in LI.


This example shows that the second occurence of LI implies the etag for the P and the first LI tag. This is possible because now the CERN parser knows something about each tag's content model -- that for LI tags, P are allowed but not another LI, etc.


This sample contains an EM tag that does not properly close, but the end-tag should still be properly generated.

Well, did that go well?
Pei@ora.com