So there is a need to think carefully about the design of a SAX application to prevent this happening. This section presents some of the possibilities. We'll look at two commonly used patterns: the filter pattern and the rule-based pattern.
The Filter Design Pattern
In the filter design pattern, which is also sometimes called the pipeline pattern, each stage of processing can be represented as a section of a pipeline: the data flows through the pipe, and each section of the pipe filters the data as it passes through. This is illustrated in the diagram below:

There are many different things a filter can do, for example:
- Remove elements of the source document that are not wanted
- Modify tags or attribute names
- Perform validation
- Normalize data values such as dates
The important characteristic of this design is that each filter has an input and an output, both of which conform to the same interface. The filter implements the interface at one end, and is a client of the same interface at the other end. So if we consider any adjacent pair of filters, the left-hand one acts as the Parser, the right-hand one as the DocumentHandler. And indeed, the filters in this structure will generally implement both the SAX Parser and DocumentHandler interfaces. ("Parser," of course, is a misnomer here. The characteristic of a SAX Parser is not that it understands the lexical and syntactic rules of XML, but that it notifies events to a DocumentHandler. Any program that performs such notification can implement the SAX Parser interface, even though it doesn't do any actual parsing).
It is also possible for a filter to have more than one output, notifying the events to more than one recipient, or less commonly, for a filter to have more than one input, merging events from several sources.
The power of the filter design pattern is that the filters are highly reusable, because just like real plumbing, the same standard filters can be plugged together in many different ways.
The ParserFilter class
There are a number of tools around for constructing a pipeline of this form. The simplest is John Cowan's ParserFilter class, available from http://www.ccil.org/~cowan/XML/. This is an abstract class: it does the things that every filter needs to do, and leaves you to define a subclass for each specific filter needed in your own pipeline.
As you might expect, ParserFilter implements both the SAX Parser and DocumentHandler interfaces; in fact, for good measure, it implements the other SAX event-handling interfaces as well (DTDHandler, ErrorHandler, and EntityResolver). All that the event-handling methods in this class do is to pass the event on to the next filter in the pipeline: it's up to your subclass to override any methods that need to do useful work.
The ParserFilter class has a constructor that takes a Parser as its parameter: the effect is to create a piece of the pipeline and connect it to another piece on its left. To construct our three-stage pipeline in the diagram above, we could write:
ParserFilter pipeline = new Filter3(
new Filter2 (
new Filter1 (
new com.jclark.xml.sax.Driver())));
pipeline.setDocumentHandler(outputHandler);
The initial input to the pipeline is of course a SAX Parser and the final output is a SAX DocumentHandler.
An Example ParserFilter: an Indenter
Here is a complete working example of a ParserFilter called Indenter. This filter takes a stream of SAX events, and massages the data by adding whitespace before start and end tags to make the nested structure of the document visible on display. It then passes the massaged data to the next DocumentHandler (which might, of course, be another filter).
The code should be self-explanatory. Note how it relies on the methods in the superclass to actually send the events to the DocumentHandler:
import java.util.*;
import org.xml.sax.*;
import org.ccil.cowan.sax.ParserFilter;
/**
* Indenter: This ParserFilter indents elements, by adding whitespace where
* appropriate. The string used for indentation is fixed at four spaces.
*/
public class Indenter extends ParserFilter {
private final static String indentChars = " ";
//indent by four spaces
private int level = 0;
// current indentation level
private boolean sameline = false;
// true if no newlines in element
private StringBuffer buffer = new StringBuffer();
// buffer to hold character data
/**
* Constructor: supply the underlying parser
* used to feed input to this filter
*/
public Indenter(Parser p) {
super(p);
}
/**
* Output an element start tag.
*/
public void startElement(String tag, AttributeList atts)
throws SAXException
{
flush(); // clear out pending character data
indent();
// output whitespace to achieve indentation
super.startElement(tag, atts);
// output the start tag and attributes
level++; // we're now one level deeper
sameline = true; // assume a single line of content
}
/**
* Output element end tag
*/
public void endElement(String tag) throws SAXException
{
flush(); // clear out pending character data
level--; // we've come out by one level
if (!sameline) indent();
// output indentation if a new line was found
super.endElement(tag); // output the end tag
sameline = false; // next tag will be on a new line
}
/**
* Output a processing instruction
*/
public void processingInstruction(String target, String data) throws
SAXException
{
flush(); // clear out pending character data
indent(); // output whitespace for indentation
super.processingInstruction(
// output the processing instruction
target, data);
}
/**
* Output character data
*/
public void characters(char[] chars, int start, int len)
throws SAXException
{
buffer.append(chars,
// add the character data to a buffer for now
start, len);
}
/**
* Output ignorable white space
*/
public void ignorableWhitespace(char[] ch, int start, int len) throws
SAXException
{
// ignore it
}
/**
* Output white space to reflect the current indentation level
*/
private void indent() throws SAXException
{
// construct an array holding a newline
//character
// and the correct number of spaces
int len = indentChars.length();
char[] array = new char[level*len + 1];
array[0] = '\n';
for (int i=0; i<level; i++)
{
indentChars.getChars(0, len, array, len*i + 1);
}
// output this array as character data
super.characters(array, 0, level*len+1);
}
/**
* Flush the buffer containing accumulated character data.
* White space adjacent to markup is trimmed.
*/
public void flush() throws SAXException
{
// copy the buffer into a character array
int end = buffer.length();
if (end==0) return;
char[] array = new char[end];
buffer.getChars(0, end, array, 0);
// trim whitespace from the start and end
int start=0;
while (start<end &&
Character.isWhitespace(array[start])) start++;
while (start<end &&
Character.isWhitespace(array[end-1])) end--;
// test to see if there is a newline in the buffer
for (int i=start; i<end; i++)
{
if (array[i]=='\n') {
sameline = false;
break;
}
}
// output the remaining character data
super.characters(array, start, end-start);
// clear the contents of the buffer
buffer.setLength(0);
}
}
Continued...
To actually run this example, we will need a DocumentHandler that outputs the XML, let's suppose this exists and is called XMLOutputter (we'll show how XMLOutputter is written in the next section). We can then write a main program as follows:
public static void main(String[] args) throws Exception
{
Indenter app = new Indenter(ParserManager.makeParser());
app.setDocumentHandler(new XMLOutputter());
app.parse(args[0]);
}
And you will also have to add an import statement for the ParserManager class at the top of the file:
import java.util.*;
import org.xml.sax.*;
import com.icl.saxon.ParserManager;
import org.ccil.cowan.sax.ParserFilter;
Continued...