My experiences with ANTLR
Juten Tach,
i wanted to summarize my first experiences with ANTLR a little bit. I have just recently finished my first little project for generating stub code for languages like Java or Actionscript. It was only a little project, but it had some interesting bits in it, that made it a good exercise. By the way, the whole sourcecode is available there, including the grammar. Everything’s running under the BSD license, so feel free to have a look at it.
First of all, my grammar uses indentation for building a hierarchy, a little bit like python does. So in order to build nested structures, you do not use brackets or braces, instead the parser has to find that out by the indentation level. This doesn’t seem to be a big deal, but it is tricky, since it is not easy to find the beginning and ending of a nested block, because there are no dedicated symbols representing them. So you have to count the tabs and tell, if there are more tabs then there were before and decide, if this means, that the indentation level has increased or decreased and with that information you make up your nested structures.
ANTLR then allows you to emit virtual tokens (tokens stand for a ‘kind’ of character or character sequence eg.) like a “INDENT” token or “DEDENT” token, that you emit yourself, once you have recognized, that the indentation level has changed.
Trying to solve these issues i have included a lot of code in my grammar. I feel, this is not right, though. Actually i would think, a grammar should ideally have no concrete code at all in order to be completely portable to whatever target language you want (target language here means the language in which you want to develop your parser with and should not be confused with the language, that you are actually building your grammar for). But in my grammar, i have a lot of java code now and i have not yet found a different way of working around that.
The Extended-Backus-Naur-form (notation for defining grammars) has limits in what you can define and especially if it comes to making decisions based on information, that came long before in the stream of characters or might just come way ahead, than you can’t solve this without writing additional code (as far as i know at the moment, that is).
But besides that, i’m really fascinated about this whole field. I guess after this one, i can try for something more complex. I have already searched for the grammar of Actionscript3 in the net, but no real hit yet. I thought i’d find the grammar in the opensource package for the flex sdk, but haven’t found it yet. Adobe has put the Flex Builder in the list of ANTLR showcase projects, so i am curious, where Adobe uses ANTLR there. Of course it would be great to be able to use the original Actionscript3 grammar from Adobe to build individual tools around Actionscript. If anyone has more information on this, please drop a comment.
