Home > CodeProject > Inside Groovy

Inside Groovy

Groovy is an object-oriented programming language for the Java platform. It is a dynamic language with features similar to those of Python, Ruby, Perl, and Smalltalk. It can be used as a scripting language for the Java Platform, is dynamically compiled to Java Virtual Machine (JVM) bytecode, and interoperates with other Java code and libraries.

Let’s go inside Groovy to discover how it works internally, for that we use JArchitect.
Groovy comes with many libraries like groovy-sql, groovy-json and others, here’s the dependency structure matrix of all groovy jars.


The DSM (Dependency Structure Matrix) is a compact way to represent and navigate across
dependencies between components.

Inside Groovy

In this post we will focus only in the groovy compiler, and discover the compilation phases from the source code to byte code.

Step1: Generate ANTLR AST

ANTLR is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

ANTLR generate from a grammar file the classes needed to generate the AST from source files, concerning Groovy two classes are generated GroovyLexer and GroovyRecognizer.

Here’s the dependency graph of org.codehaus.groovy.antlr.parser package.


GroovyLexer inherit form CharScaner and its role is to walk thought source file tokens by invoking nextToken method.

Let’s search what happens when nextToken is invoked.

from m in Methods where m.IsUsedBy ("org.codehaus.groovy.antlr.parser.GroovyLexer.nextToken()")
select new { m, m.NbBCInstructions }


It depends of the token found, nextToken delegates the treatment to the corresponding method, and to be more concrete we can take the case where a comma was found:


For each token found a GroovySourceToken is created and infos like line and column are assigned to this instance. So the GroovyLexer act as scanner to iterate through tokens.

To generate the AST we need a parser that checks for correct syntax and builds it, it’s the role of the other generated class GroovyRecognizer.

GroovyRecognizer inherit from LLKParser from ANTLR, for more detail about these kinds of parsers please refers to this article.

And the compilationUnit method is used to generate the ANTLR AST. Let’s search for all methods used by compilationUnit directly or indirectly

from m in Methods
let depth0 = m.DepthOfIsUsedBy("org.codehaus.groovy.antlr.parser.GroovyRecognizer.compilationUnit()")
where depth0 >= 0 orderby depth0
select new { m, depth0 }


As we can observe this method use mainly the AST ANTLR classes to generate the AST nodes.

Here’s the dependency graph representing the collaboration between the GroovyLexer and the GroovyRecognizer when the main method is invoked


After discovering the role of each class let’s take a look to their design:


Both of GroovyLexer and GroovyRecognizer have many methods and fields. The GroovyLexer has 140 methods and 35 fields, and the GroovyRecognizer has 310 methods and 121 fields.

In general these kinds of classes are more concerned by the low cohesion, because it’s hard to maintain a high cohesion where the number of methods and fields are high.

The single responsibility principle states that a class should have one, and only one, reason to change. Such a class is said to be cohesive. A high LCOM value generally pinpoints a poorly cohesive class. There are several LCOM metrics. The LCOM takes its values in the range [0-1]. The LCOMHS (HS stands for Henderson-Sellers) takes its values in the range [0-2]. Note that the LCOMHS metric is often considered as more efficient to detect non-cohesive types.

LCOMHS value higher than 1 should be considered alarming.

The LCOMHS of GroovyLexer is equal to 0.96814 and for GroovyRecognizer it’s equal to 0.98932.

So even if they have many methods and fields, their LCOMHS is acceptable.


Low coupling is desirable because a change in one area of an application will require less change throughout the entire application. In the long run, this could alleviate a lot of time, effort, and cost associated with modifying and adding new features to an application.

Using interfaces and abstract classes improve the low coupling.

In the case of GroovyLexer here’s all the classes used:


GroovyLexer use only few interfaces and abstract classes, it’s highly coupled with ANTLR classes, the same remark concern GroovyRecognizer, but it’s not a problem for these kind of classes because there are generated by ANTLR, however for no generated classes it’s better to avoid a high coupling with other classes.

Step2: Generate Groovy AST

In the first step an ANTLR AST is generated, but Groovy use its own AST nodes, the next step is to convert to Groovy AST.

The AntlrParserPlugin is the responsible of this conversion, and its convertGroovy method did the job, and here’s its dependency graph:


This method iterates through ANTL AST nodes and for each kind of node found it delegates the treatment to their corresponding methods. For example here’s what happen in the case of the statement node:


There are many possible kind of statements (try, continue,if,while,…), and this method delegate the treatment to their corresponding methods like before. What’s make the code very easy to understand and isolate each responsibility to a specific method.

Using ANTLR is a good choice, but it’s better to isolate the using of this library to avoid a high coupling with it, what gives the flexibility to use a new version of ANTLR or even another parser generator without impacting the whole code base.

Let’s search for classes using ANTLR

from t in Types where t.IsUsing ("antlr-2.7.7") select t


Only few types use directly ANTLR, what’s very good if in the future another parser generator is used.

Step3: Generate code byte
To generate code byte groovy walk thought AST and create byte code. the popular technique used for almost all compilers is the use of the visitor pattern.

Motivation of using the visitor pattern

We can apply many algorithm and treatments to the AST nodes like :

– Print the AST
– Save it to xml file.
– Generate byte code.
– Save it to HTML file.

And the visitor design pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations to existing object structures without modifying those structures.

The idea is to implement an interface that contains many methods visitXXX, like for example the GroovyCodeVisitor

In the case of Groovy the AsmClassGenerator class is the responsible of generating code byte.

Let’s search for all its base classes:

from t in Types where t.FullName=="org.codehaus.groovy.classgen.AsmClassGenerator"
select new { t, t.BaseClasses}


ClassCodeVisitorSupport implements the GroovyClassVisitor interface, and the CodeVisitorSupport implements the GroovyCodeVisitor interface.

Extend Groovy capabilities:

Although at times, it may sound like a good idea to extend the syntax of Groovy to implement new features, most of the time, we can’t just add a new keyword to the grammar, or create some new syntax construct to represent a new concept. However, with the idea of AST (Abstract Syntax Tree) Transformations, we are able to tackle new and innovative ideas without necessary grammar changes.

When the Groovy compiler compiles Groovy scripts and classes, at some point in the process, the source code will end up being represented in memory in the form of a Concrete Syntax Tree, then transformed into an Abstract Syntax Tree. The purpose of AST Transformations is to let developers hook into the compilation process to be able to modify the AST before it is turned into bytecode that will be run by the JVM.

AST Transformations provides Groovy with improved compile-time metaprogramming capabilities allowing powerful flexibility at the language level, without a runtime performance penalty.

One hook for accessing this capability is via annotations. In your Groovy code you can make use of one of more annotations to mark a class for receiving an AST transformation during compilation.

Let’s search for all standard groovy transformations, that has the annotation GroovyASTTransformation

from t in Types where t.HasAnnotation("org.codehaus.groovy.transform.GroovyASTTransformation")
select t


And the user can create its own transformation to extend groovy capabilities, this feature make Groovy very flexible and powerful.

Categories: CodeProject
  1. No comments yet.
  1. April 14, 2013 at 6:24 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: