< Project structureHelpSet fulltext-search index creation >

STML parsing

HelpSetMaker parses each document of a project on its own. For this purpose, it uses a standard combination of lexical scanner and parser to create a hierarchy of objects representing the document. Scanning and parsing is done in one pass and should be context free, if I applied my old computer science course knowledge correctly...

Scanning

Scanning is done by the “helpsetmaker.util.LexicalScanner” class. It reads all content from a stream and creates a token stream. Tokens can simply be received with the getNextToken()-method.

Tokens consist of a symbols and one or more values. The symbols are held in a QJCC symbol table, which is an automatically created class out of a flat symbol file. The class is helpsetmaker.util.LexicalSymbols, the .symbol file in the same directory contains the original symbol table and QJCC features the symboltojava Java program which creates the .java file for the class.

Parsing

The parser is implemented in the helpsetmaker.util.STMLParser class. As the scanner has already fiddled around with the gory details of creating a nice token stream, the parser can work rather straight forward. It has a main loop where it gets the tokens from the scanner and constructs the object tree in a rather straight forward way.

The is one special, however. For certain elements of STML, there was a flat model in the first place. This is namely for all one-line-command parameters, e.g. the text after “:title” or “:hX”, and for link texts.

Usage practice showed, however, that this behaviour brings some restrictions to the usage. Therefore, the language was extended so that for those one-line-commands, the parameters themselves can have STML markups. This allows for example partially-italic link texts.

So far, this extension has not been implemented by a scanner change (which would truly be possible...). Instead, if the parser gets such a chunk of to-be-parsed stuff, it pushes a new LexicalScanner instance on its internal scanner stack and feeds that one with the line to be parsed. You can see an example of this technique in handleHeadline() and subScan().

The path for changing this has been laid out with the macro feature of version 1.1. Macro calls have the same syntax as picture inclusion commands or links. Opening, closing, parameter separation, and parameter content, however, are scanned as own entities and passed one by one as different tokens to the STML parser. Implementing a similar scheme for colon-command invocations and using this scheme for links etc. will finally solve this problem.

< Project structureHelpSet fulltext-search index creation >