StableDOM
StableDOM™ is the term used to describe our HTML parsing and transcoding technology. It provides a high-level and elegant solution to the common problem of parsing and extracting HTML content. Even if the source HTML is malformed, StableDOM provides a consistent W3C HTML DOM view of the content thereby ensuring high-level access, correct extraction, and semantic transcoding to alternate documents types such as WML and XHTML. Furthermore, StableDOM provides repeatable results even in the face of changes in the underlying source document. StableDOM—for HTML transcoding—along with the XML and SQL data transcoding technologies may be used together to create an enterprise Mashup site. All of which are accessible through a point-and-click extraction capability.
StableDOM Features
The StableDOM technology provides the following major capabilities:
1.Parses and fixes live HTML web content and creates a corrected W3C HTML DOM.
2.Enables XPath, XSLT, or XQuery access to the corrected W3C HTML DOM view.
3.Provides a high-level StableDOM Java API to access elements based on text content.
4.Provides a web browser user interface to StableDOM enabling point-and-click extraction of HTML elements or text content.
5.Integrated with the XSLT/XQuery code generation component for transcoding from HTML to XHTML, WML, and other XML document types such as RSS.
6.Integrated with the portal code generation component for transcoding from multiple HTML pages to XHTML and WML portlets based on the JSR 168 Portlet API (WSRP).
7.Integrated with the widget code generation component for transcoding from multiple HTML pages to widget platforms including Apple's Dashboard Widgets, Opera Widgets, Google Gadgets, and Microsoft Vista's Sidebar Gadgets.
8.Supports URL and DOM caching for application specific cache management.
HTML Parsing Pipeline
StableDOM first parses the HTML content and then fixes non-conformant HTML according to the W3C HTML 4 DTD specification. A corrected DOM is made available at the end of the processing pipeline with any JavaScript and CSS modifications applied. This provides a server-side stable DOM view of the source HTML document irrespective of tag capitalization, well-formedness, incorrect element order, and use of DHTML.
This diagram presents a high-level view of the parsing pipeline:
Accessing the Corrected DOM
After completing the parsing pipeline and a corrected W3C HTML DOM is made available, you can access this DOM with 5 different technologies depending on your needs:
1.W3C DOM API.
2.XPath access through Jaxen.
3.XSLT access through Xalan.
4.XQuery access through Saxon.
5.StableDOM Java API which is based on common HTML access patterns.
This diagram depicts the HTML parsing pipeline and application access technologies:
Accessing the Corrected DOM with the StableDOM Java API
The StableDOM access API is implemented by the HTMLUtil class in the package com.altmobile.platform.util.browser.html and provides the following method signature list:
1.static public Element getContainingElementContainingText(
String urlString,
String contextElementString,
String contextTextString)
2.static public String getContent (String urlString, String xpathString)
3.static public Node getNode (String urlString, String xpathString)
Accessing the Corrected DOM with the StableDOM XSL-Over-Java API
The StableDOM Java API may also be accessed from the user defined function mechanism provide by the Xalan XSLT engine. There are two functions to extract HTML content using XSL. These two functions were designed to mimic the XSLT “copy” pattern:
1.public static String getContent(String urlString, String xpathString)
2.public static Node getNode(String urlString, String xpathString)
The function getContent() will copy the text content of a node and getNode() will perform a deep copy of the node.
The following screen shots highlight the user function declaration:
As can be seen in the below screen shot, the function getContent() should be used with the <xslt:value-of> element and the function getNode() should be used with the <xslt:copy-of> element:
Point-and-Click Extraction
Since the earliest days of the web, developers have spent enormous amounts of time attempting to “screen scrap” HTML content using program languages such as Perl or more recently the Java programming language. Despite the amount of effort in building extraction frameworks, the application developer still needs to spend a considerable amount of time in analyzing the source HTML and comparing what is physically there in the HTML source to the framework’s view of the source content.
StableDOM solves this problem in an elegant and straight-forward way with two customized web browsers allowing the semantic extraction of either HTML elements or text content.
The HTML Transformation Browser and its engine has been optimized for XML/XHTML Mashup development and the StableDOM Browser and its engine was developed for HTML web development. While both share a similar user-interface, their underlying engines differ in how they process remote HTML content.
The need to support separate processing models for HTML and XHTML is well know to browser vendors despite the urban legend that "XHTML is just well-formed HTML".
As seen in this screen shot, the HTML Transformation Browser will highlight extractable elements and content when you move the mouse over the web page:
In the above example, the user hovered over a list item which translates to the <li> tag found in the HTML source code. Based on the element or text selection, StableDOM will present a popup menu with the most common extraction options.
The HTML Transformation Browser supports the most common transcoding options via the popup menu. For custom transcoding, the advanced developer will use the HTML Transformation Browser to display the XSL API or XPath to a desired element, even the <BODY> element. These items are logged in the target DOM Browser under the “Log” tab. And for ultimate transcoding control, the developer may display the corrected HTML DOM in a DOM Browser and then manually generate the XPath statement to any element. These options can be seen here:
The Mashup Development Process
To create a Mashup, you should follow these steps:
1.If you are a web developer who prefers to use WYSIWYG HTML editors such as found in popular blog tools, then
a.Define your Mashup content by launching the StableDOM Browser from the "Mashup" menu as seen here:

This will launch the StableDOM browser enabling you to navigate to the remote URL just as you would with a traditional desktop web browser. Here is a screen shot of the StableDOM Browser:
The StableDOM Browser differs than the HTML Transformation Browser-- which is described later in the enterprise XML developer section-- in that the HTML Transformation Browser and its engine extract remote HTML content and transform that content to the target document language. So for example, if the remote content is an HTML table and the target document is a WML document, then the HTML Transformation Browser and its engine will dynamically create a WML table. The StableDOM Browser and its engine will not perform any node or content adaptation and will only extract the remote HTML content. Both browsers use a similar user interface, though.
b.Generate the semantic metadata for the remote HTML element by selecting the menu item "Display Transcoding Metadata for xxx" (where xxx will be the element name) from the popup menu as seen here:

A window will be displayed containing the metadata as seen here:

You may tweak the XPath statement defined in the RDF metadata to best locate the HTML element.
c.Launch the Mashup Monitor from the "Mashups" menu as seen here:

d.Drag and drop the RDF metadata onto the Mashup Monitor as seen here:

e.Construct your target HTML document in the WYSIWYG Mashup Designer by launching it from the Mashup menu as seen here:
f.You will be prompted to specify a port to run a special object server that is used to communicate with the WYSIWYG Mashup Designer. The WYSIWYG Mashup Designer will be launch in your default browser, as seen here:
The <alt> WYSIWYG Mashup Designer is based on TinyMCE, the industry leading HTML blog editor. We have created several TinyMCE plug-ins supporting the creation of Mashup content for RSS feeds, blogs, web sites, and widget systems. For information on TinyMCE, visit http://tinymce.moxiecode.com/.
g.Create your static content as you normally would. When you want to add dynamic content, select the "Mash" button to select previously defined Mashup content, as seen here:
h.The Mashup plug-in will launch a list of monitored Mashups, as seen here:
allowing you to view the remote content and insert it either as static or dynamic HTML content as seen here:
The option to view the remote content in the "Content Viewer" is only available when using the latest version of the Opera web browser.
i.Insert the remote content either as static or dynamic content by selecting the appropriate hyperlink which will fetch the remote content and place it into WYSIWYG Mashup Designer as seen here:
If you use a browser other than the latest Opera web browser, you will need to subsequently click on the Monitored Mashup dialog to close it. This seems to be a bug in the underlying TinyMCE plug-in code.
j.You can now save the Mashup as a Dynamic XSL by selecting the "XSLT" button as seen here:
This invokes the Dynamic XSLT plug-in which makes AJAX calls to save the HTML content and in the process convert it to an XSLT document. The XSLT is displayed in a window as seen here:
k.And then use the popup from the new "XSL Source" window to interactively test your Mashup content. Additionally, you may launch different Object Servers to serve this content as seen here:
l.Alternatively, you can use the plug-in short cuts which combine the Dynamic XSL generation and launching of a specific Mashup Designer for Opera Widgets, Google Gadgets, Vista Gadgets, Dashboard Widgets, and RSS feeds. The plug-ins are seen here:

2.If you are an enterprise XML developer who requires fine-grained node level control—down to the attribute—of the target XML/XHTML output document, then
a.Construct your target document in the DOM Browser, or load a valid XHTML document into the DOM Browser. Create any placeholder elements in the target document as necessary depending on the source item and the target DTD requirements. Furthermore, if you plan to transcode from multiple web sites or you are extracting a single piece of text content, then create a wrapper DIV as the first child of the BODY element.
Consider the following example of a placeholder element: if you want to extract a table element from a remote web site and transcode it into an XHTML document, the XHTML DTD requires that the table must not be contained by a p element. So in this case, you should create a div placeholder in the target document as the parent for the new table.
b.Determine the styling requirements that your Mashup should use. Encode this information directly on your placeholder wrapper.
c.Select the “Import Last Child from HTML Source” menu item from the parent element of the to-be-imported node as seen here:
d.This will launch the HTML Transformation Browser, allowing you to extract an HTML element or text content as needed. In the below screen capture, we want to extract the paragraph text:
After selecting the “Extract text: Wed Dec…ST 2005” menu item from the popup, our target document will look like this:
1.Select the “Transform to Dynamic XSLT Source” menu item which cascades off the “XSLT” menu item which is found in the document node popup menu as seen here:
This displays the Dynamic XSLT source code which contains instructions to create both the static XHTML and dynamic XHTML using the special transcoding function getContent(url, xpath) as seen here:
The XSLT function getContent(url, xpath) simply calls the Java language function in HTMLUtil class as described in the Java language API section. This diagram provides a view of HTML parsing pipeline through to the transcoding process:
For server-hosted aggregation services such as Yahoo! Pipes, the Yahoo! servers access the Mashup content via traditional HTTP Get requests from their servers rather than the desktop widget diagramed above.
For browser based access to the Mashup content, the browser-- mobile or desktop-- uses traditional HTTP Get requests rather than the desktop widget diagramed above.
During the generation of the Dynamic XSLT, the code generation component inspected the target document and transcoded accordingly. The previous example illustrated how to transcode text, and the next example will illustrate how to transcode HTML elements.
XHTML Transcoding Example
StableDOM supports the semantic transcoding of HTML nodes. The StableDOM Browser differs than the HTML Transformation Browser in that the HTML Transformation Browser and its engine extract remote HTML content and transform that content to the target document language. So for example, if the remote content is an HTML table and the target document is a WML document, then the HTML Transformation Browser and its engine will dynamically create a WML table. The StableDOM Browser and its engine will not perform any node or content adaptation and will only extract the remote HTML content. Both browsers use a similar user interface, though.
This section describes the HTML Transformation Browser and its engines.
Rather than attempting to blindly transcode an entire HTML page as is done in technologies like WAP gateways and web proxies, we allow you to selectively transcode one or more of the HTML source elements and text content providing you with complete control over the transcoding process. Furthermore, we allow you to aggregate the transcoding of several HTML pages and thereby create a Mashup or web portal in just a few clicks.
Actually, our technology allows you to aggregate not only HTML content but also XML and SQL-based content in a point-and-click manner. These options are available from any element popup menu as seen in this DOM Browser screen shot:
In the previous example, we illustrated how to transcode HTML text content. Now let’s transcode an HTML <OL> list element. Follow the instructions in the section “Transcoding Development Process” steps 1-4 to create the following XHTML document:
To save space for this screen shot, we have disabled displaying both Node Info and attributes.
Circled above is the placeholder <div> element which will contain the transcoded <ol>. After selecting the “Import Last Child from HTML Source” menu item as described in step 4, you should right-click on the list and see the following in the HTML Transformation Browser:
Next, select the menu item “Extract <OL> node” menu item and you will see the following in the DOM Browser: