HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
PROFESSIONAL XML PART 4 - SOME SAX DESIGN PATTERNS

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

Our example SAX applications have only been interested in processing one or two different element types, and the processing has been very simple. In real applications where there is a need to process many different element types, this style of program can quickly become very unstructured. This happens for two reasons: firstly, the interactions of different events processing the same global context data can become difficult to disentangle, and secondly, each of the event-handling methods is doing a number of quite unrelated tasks.
Click here to be kept informed of our new Tutorials.


This free tutorial is a sample from the book Professional XML.


So there is a need to think carefully about the design of a SAX application to prevent this happening. This section presents some of the possibilities. We'll look at two commonly used patterns: the filter pattern and the rule-based pattern.

The Filter Design Pattern

In the filter design pattern, which is also sometimes called the pipeline pattern, each stage of processing can be represented as a section of a pipeline: the data flows through the pipe, and each section of the pipe filters the data as it passes through. This is illustrated in the diagram below:

There are many different things a filter can do, for example:

  • Remove elements of the source document that are not wanted
  • Modify tags or attribute names
  • Perform validation
  • Normalize data values such as dates

The important characteristic of this design is that each filter has an input and an output, both of which conform to the same interface. The filter implements the interface at one end, and is a client of the same interface at the other end. So if we consider any adjacent pair of filters, the left-hand one acts as the Parser, the right-hand one as the DocumentHandler. And indeed, the filters in this structure will generally implement both the SAX Parser and DocumentHandler interfaces. ("Parser," of course, is a misnomer here. The characteristic of a SAX Parser is not that it understands the lexical and syntactic rules of XML, but that it notifies events to a DocumentHandler. Any program that performs such notification can implement the SAX Parser interface, even though it doesn't do any actual parsing).

It is also possible for a filter to have more than one output, notifying the events to more than one recipient, or less commonly, for a filter to have more than one input, merging events from several sources.

The power of the filter design pattern is that the filters are highly reusable, because just like real plumbing, the same standard filters can be plugged together in many different ways.

The ParserFilter class

There are a number of tools around for constructing a pipeline of this form. The simplest is John Cowan's ParserFilter class, available from http://www.ccil.org/~cowan/XML/. This is an abstract class: it does the things that every filter needs to do, and leaves you to define a subclass for each specific filter needed in your own pipeline.

As you might expect, ParserFilter implements both the SAX Parser and DocumentHandler interfaces; in fact, for good measure, it implements the other SAX event-handling interfaces as well (DTDHandler, ErrorHandler, and EntityResolver). All that the event-handling methods in this class do is to pass the event on to the next filter in the pipeline: it's up to your subclass to override any methods that need to do useful work.

The ParserFilter class has a constructor that takes a Parser as its parameter: the effect is to create a piece of the pipeline and connect it to another piece on its left. To construct our three-stage pipeline in the diagram above, we could write:

ParserFilter pipeline = new Filter3(
        new Filter2 (
         new Filter1 (
           new com.jclark.xml.sax.Driver())));
pipeline.setDocumentHandler(outputHandler);

The initial input to the pipeline is of course a SAX Parser and the final output is a SAX DocumentHandler.

An Example ParserFilter: an Indenter

Here is a complete working example of a ParserFilter called Indenter. This filter takes a stream of SAX events, and massages the data by adding whitespace before start and end tags to make the nested structure of the document visible on display. It then passes the massaged data to the next DocumentHandler (which might, of course, be another filter).

The code should be self-explanatory. Note how it relies on the methods in the superclass to actually send the events to the DocumentHandler:

import java.util.*;
import org.xml.sax.*;
import org.ccil.cowan.sax.ParserFilter;

/**
* Indenter: This ParserFilter indents elements, by adding whitespace where
* appropriate. The string used for indentation is fixed at four spaces.
*/


public class Indenter extends ParserFilter {
 
 private final static String indentChars = "  "; 
  //indent by four spaces
 private int level = 0;           
  // current indentation level
 private boolean sameline = false;      
  // true if no newlines in element
 private StringBuffer buffer = new StringBuffer();
  // buffer to hold character data 

 /**
 * Constructor: supply the underlying parser 
 * used to feed input to this filter
 */

 public Indenter(Parser p) {
   super(p);
 }

 /**
 * Output an element start tag.
 */

 public void startElement(String tag, AttributeList atts) 
 throws SAXException
 {
   flush();         // clear out pending character data
   indent();         
    // output whitespace to achieve indentation
   super.startElement(tag, atts); 
    // output the start tag and attributes
   level++;         // we're now one level deeper
   sameline = true;     // assume a single line of content
 }

 /**
 * Output element end tag
 */
 
 public void endElement(String tag) throws SAXException 
 {
   flush();         // clear out pending character data
   level--;         // we've come out by one level
   if (!sameline) indent(); 
    // output indentation if a new line was found
   super.endElement(tag);  // output the end tag
   sameline = false;     // next tag will be on a new line
 }

 /**
 * Output a processing instruction
 */

 public void processingInstruction(String target, String data) throws 
                             SAXException 
 {
   flush();           // clear out pending character data
   indent();          // output whitespace for indentation
   super.processingInstruction(  
    // output the processing instruction
            target, data);
 }

 /**
 * Output character data
 */

 public void characters(char[] chars, int start, int len) 
 throws SAXException 
 {
   buffer.append(chars,  
    // add the character data to a buffer for now
     start, len);
 }

 /**
 * Output ignorable white space
 */

 public void ignorableWhitespace(char[] ch, int start, int len) throws 
                             SAXException 
 {
  // ignore it
 }

 /**
 * Output white space to reflect the current indentation level
 */

 private void indent() throws SAXException 
 {
             // construct an array holding a newline 
             //character 
             // and the correct number of spaces
   int len = indentChars.length();
   char[] array = new char[level*len + 1];
   array[0] = '\n';
   for (int i=0; i<level; i++) 
   {
     indentChars.getChars(0, len, array, len*i + 1); 
   }
             // output this array as character data
   super.characters(array, 0, level*len+1);
 }

 /**
 * Flush the buffer containing accumulated character data.
 * White space adjacent to markup is trimmed.
 */

 public void flush() throws SAXException 
 {
             // copy the buffer into a character array
   int end = buffer.length();
   if (end==0) return;
   char[] array = new char[end];
   buffer.getChars(0, end, array, 0);
             // trim whitespace from the start and end
   int start=0;
   while (start<end && 
   Character.isWhitespace(array[start])) start++;
   while (start<end && 
   Character.isWhitespace(array[end-1])) end--;
      // test to see if there is a newline in the buffer
   for (int i=start; i<end; i++) 
   {
     if (array[i]=='\n') {
       sameline = false;
       break;
     }
   }
              // output the remaining character data
   super.characters(array, start, end-start);
              // clear the contents of the buffer
   buffer.setLength(0);
 }

}

Continued... To actually run this example, we will need a DocumentHandler that outputs the XML, let's suppose this exists and is called XMLOutputter (we'll show how XMLOutputter is written in the next section). We can then write a main program as follows:

public static void main(String[] args) throws Exception 
{
 Indenter app = new Indenter(ParserManager.makeParser());
 app.setDocumentHandler(new XMLOutputter());
 app.parse(args[0]);
}

And you will also have to add an import statement for the ParserManager class at the top of the file:

 
import java.util.*;
import org.xml.sax.*; 
import com.icl.saxon.ParserManager;
import org.ccil.cowan.sax.ParserFilter;

Continued...


NEXT PAGE



5 RELATED COURSES AVAILABLE
HTML 4.0 INTRODUCTION
To create, format and publish a small website using HTML 4.0. You will learn to create web pages incorporating fo....
MICROSOFT INTERNET EXPLORER 6.0 INTERNET INTRODUCTION
This course provides readers with an introduction to the concept of the Internet and the opportunity to gain a br....
A+ MODULE 5 - THE INTERNET
At the end of this course you will be able to: describe the functions of an operating system, describe the featur....
JAVASCRIPT PROGRAMMING
This training course aims to teach the reader the fundamentals of JavaScript. This course covers topics such as -....
I-NET+ MODULE 8 - DEVELOPING A WEB SITE
On completion of this module, readers will be able to: create HTML pages incorporating different document-, parag....
 
0 RELATED JOBS AVAILABLE
CONTACT US
Wednesday 22nd May 2013  © COPYRIGHT 2013 - website design by Website Design by Visualsoft