Home > CodeProject > JRuby front end internals

JRuby front end internals

JRuby is an implementation of the Ruby programming language atop the Java Virtual Machine, written largely in Java. JRuby is tightly integrated with Java to allow the embedding of the interpreter into any Java application with full two-way access between the Java and the Ruby code.

In this article we will take a look inside the JRuby and discover how it works internally, for that we use JArchitect which is free for open source projects.

As any other compiler or interpreter, JRuby consists of three main parts:

- The front end checks whether the program is correctly written in terms of the programming language syntax and semantics. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end.

- The middle end is where optimization takes place. Typical transformations for optimization are removal of useless or unreachable code, discovery and propagation of constant values, relocation of computation to a less frequently executed place (e.g., out of a loop), or specialization of computation based on the context. The middle-end generates another IR for the following backend. Most optimization efforts are focused on this part.

- The back end is responsible for translating the IR from the middle-end into target code. The target instruction(s) are chosen for each IR instruction.

In this post we will focus more on the front end parser and discover some of its internal design.

Front End

Lexer

The implementation of the jruby lexer is inside org.jruby.lexer package, the lexer is hand-written. A hand-written lexer is a lexer that was written (and fine-tuned) manually, as opposed to being automatically generated from a formal definition by a tool such as LEX or ANTLR.

Here is the dependency graph between all classes from the org.jruby.lexer package:

ruby15

The lexer logic is implemented by the RubyYaccLexer class. nextToken is the main method that’s iterate through tokens, this method read the next character and decide which treatments to do, the responsibility of reading characters is delegated to the LexerSource.

ruby17

The LexerSource which is an abstract class is what feeds the lexer. InputStreamLexerSource and ByteArrayLexerSource implements the LexerSource, so we can treat files or just a script given as string.

SourcePositionFactory, SimpleSourcePosition, ISourcePosition and ISourcePositionHolder are the classes needed to treat token positions.

Using many classes for the lexer could be considered as useless, but there are two advantages of this design:

Enforce the high cohesion: Each responsibility is isolated into specific class, for example the source for the lexer is isolated in the LexerSource class, and the source position is handled by a specific classes, which enforces the high cohesion.

Enforce the low coupling: The LexerSource class is abstract which give us the flexibility when we want to use another source.

Parser

It’s really not that hard to create a lexer manually but creating a parser is a lot trickier, and it’s better to generate it using a tool, the jruby parser is generated by jay.

The JRuby parser is represented by the RubyParser interface. Let’s search for all RubyParser Implementations

from t in Types where t.Implement (“org.jruby.parser.RubyParser”)
select new { t, t.NbBCInstructions }

ruby16

JRuby has three parser implementations, it depends of the version of Ruby concerned.

The JRuby AST is the output of the parser and is just a tree representation of the source code, containing different kinds of nodes, for example for classes, methods, variables. And we can search for all the AST Node kind

from t in Types
let depth0 = t.DepthOfDeriveFrom(“org.jruby.ast.Node”)
where depth0 >= 0 orderby depth0
select new { t, depth0 }

ruby26

Intermediate Representation

The AST, while good for interpretation, is probably not the best choice of IR when it comes to implementation of compiler “optimizations”. The new IR translates the AST into a series of instructions which are somewhat like high-level assembly. An instruction is simply an operation (branch, call, receive arg, return, etc.) that operates on a bunch of operands (which can either be variables, or fixnums, floats, arrays, closures, etc.). At the start, a lot of operations end up being method calls, but the expectation is that some of these will optimized away (inlined, etc.) into native (jvm-native, i.e.) operations.

The intermediate representation implementation is inside the org.jruby.ir package, and here’s the dependency between all its sub packages.

ruby1

The IR implementation is well modularized using packages, and it follows the package by feature approach.
Package-by-feature uses packages to reflect the feature set. It places all items related to a single feature (and only that feature) into a single directory/package. This results in packages with high cohesion and high modularity, and with minimal coupling between packages. Items that work closely together are placed next to each other.

And we can search for all instructions of the intermediate representation:

from t in Types
let depth0 = t.DepthOfDeriveFrom(“org.jruby.ir.instructions.Instr”)
where depth0 >= 0 orderby depth0
select new { t, depth0 }

ruby25

And to create the IR JRuby use the builder pattern, The Builder pattern permits to create an object in a step-by-step fashion. The construction process can create different object representations and provides a high level of control over the assembly of the objects.

The IRBuilder is the class building the IR, and here’s a dependency graph showing where this class is invoked.

ruby21

The createIRBuilder is invoked to create the IRBuilder and after that the IR is built by the buildRoot method, however if we apply the builder pattern, the IRBuilder must be abstract. So we can change the concrete builder without impacting the rest of the code.

JRuby contains two builders IRBuilder and IRBuilder19 which inherits from it. The risk of having this design is when the concrete classes are used directly instead of the abstract one. Let’s search if IRBuilder19 is used directly by other methods.

from m in Methods where m.IsUsing (“org.jruby.ir.IRBuilder19″)
select new { m, m.NbBCInstructions }

ruby23

Only the createIRBuilder is using it, and maybe its better to add an interface for the IRBuilder and the concrete classes IRBuilder and IRBuilder19 implements it.

Conclusion

JRuby is a good example if you want to know how a JVM based language works internally, the design is elegant, many patterns are used and the source code is not complicated to understand. Don’t hesitate to take a look inside it and discover some of its internal design.

About these ads
Categories: CodeProject
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: