Thursday, June 2, 2011

XMLParsers : SAX & DOM Parser

XML Parsing can be broken down into two logical components: a parser and a scanner.

The scanner reads the text and classifies it as "tokens". A token is a catagory that is recognized by the parser.

e.g. a scanner for the Java programming language might return the tokens that include: identifier, integer, for (a reserved word).

Before we begin, I wanted to make sure everyone is aware of the most important difference between XML parsers: whether the parser is a SAX or a DOM parser.


SAX vs. DOM

A SAX parser is one where your code is notified as the parser walks through the XML tree, and you are responsible for keeping track of state and constructing any objects you might want to keep track of the data as the parser marches through.

A DOM parser reads the entire document and builds up an in-memory representation that you can query for different elements. Often, you can even construct XPath queries to pull out particular pieces.

An important point, relative to SAX, is that the parser calls the scanner. As the parser processes the tokens returned by the scanner it performs operations, like building a syntax tree.

In the case of SAX, the scanner (the SAXParser object) calls the parser. This makes parsing with SAX needlessly awkward and complicates the architecture of the software. For this reason, the DOMParser is frequently used for parsing complicated XML documents.

The SAXParser does have two notable advantages over the DOMParser: the SAXParser is faster and it uses less memory. While the SAXParser is difficult to use for processing complex XML documents, perhaps it is appropriate for processing simple XML documents.....

in iPhone : NSXMLParser is a SAX parser included by default with the iPhone SDK. It’s written in Objective-C and is quite straightforward to use, but perhaps not quite as easy as the DOM model.

No comments:

Post a Comment