Interpreter

From GoF Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the language.

Interpreter may be called for if a particular class of problem occurs repeatedly in a well-defined and well-understood domain. Interpreter involves defining the domain with a 'language' and 'grammar' and coding an 'interpreter' to process that 'language'.

This is fine, but not very clear. Let's hope an example will make it clearer:

Problem: A system generates complex Boolean expressions which it must evaluate. These expressions are not known until run-time, and so we must design a 'language' which we can interpret. The Boolean expressions we need to process are:

Boolean literals (true, false)
Variables (a named variable evaluating to a Boolean expression)
And expressions
Or expressions
Not expressions

Each expression above (except the literals) may be a compound expression. We don't need to specify here the rules of how values combine - boolean algebra is well-enough known - but usually we would have to. An example of the expressions to be evaluated is:

A or (B and (C or (D or false)) or not E)
where A, B, C, D, and E have true or false values (decided at run-time)

We have two types of expression: unary and binary. The unary expressions are literals, variables, and Not. The binary expressions, And and Or, require two expressions (which can each be unary or binary). Unary expressions are represented in the diagram as TerminalExpression; binary expressions are NonterminalExpressions. In this simple example there are no more complex expressions.

[svs: Note that we are going to build some code to address this particular expression and this expression only. The classes will be capable of evaluating far more than that, but putting then together in the correct way is a different subject. For that, see the State pattern]

We start off with our abstract base class:

	public abstract class BooleanExpression {
		public abstract bool Interpret(Context context);
	}

The Interpret method is going to do the work. Since we are looking for a Boolean result, it will return a Boolean value. What the Context class contains we will get to later.

Next comes the Literal expressions:

	public class LiteralBooleanExpression : BooleanExpression {
		public LiteralBooleanExpression(bool literal) {
			this.literal = literal;
		}
		private bool literal;
		public override bool Interpret(Context context) {
			return literal;
		}
	}

The Not expression:

	public class NotBooleanExpression : BooleanExpression {
		public NotBooleanExpression(BooleanExpression exp) {
			this.exp = exp;
		}
		private BooleanExpression exp;
		public override bool Interpret(Context context) {
			return !exp.Interpret(context);
		}
	}

And the last unary expression, the variable. Here we stop and think a bit: the class can contain the name of a variable as a string quite easily, but to evaluate that it needs some way to find out what its value should be. This is where Context comes in. In this example, Context contains the values of variables:

public class Context {
	private Hashtable variables = new Hashtable();
	public bool this [string variableName] {
		get {
			if (variables.Contains(variableName)) {
				return (bool)variables[variableName];
			} else {
				throw new ApplicationException
					("No context for variable " + variableName);
			}
		}
		set {
			variables.Add(variableName, value);
		}
	}
}

Now we can do the variable expressions:

	public class VariableBooleanExpression : BooleanExpression {
		public VariableBooleanExpression(string variableName) {
			this.variableName = variableName;
		}
		private string variableName;
		public override bool Interpret(Context context) {
			return context[this.variableName];
		}
	}

The binary expressions are fairly simple:

	public class AndBooleanExpression : BooleanExpression{
		public AndBooleanExpression
				(BooleanExpression exp1, BooleanExpression exp2) {
			this.exp1 = exp1;
			this.exp2 = exp2;
		}
		private BooleanExpression exp1;
		private BooleanExpression exp2;
		public override bool Interpret(Context context) {
			return 
				exp1.Interpret(context) && exp2.Interpret(context);
		}
	}

The implementation of OrBooleanExpression is left as an exercise for the reader.

Looking at AndBooleanExpression and NotBooleanExpression, you can see why all these expressions inherit from the same type: each boolean expression may need to be evaluated (Interpreted) by another one.

With these classes we can now construct Boolean expressions of arbitrary complexity. How these are constructed is an implementation matter: there may be a text string input which is parsed into the relevant expressions, or it may be made up in some other method. Here is code to construct our original example:

VariableBooleanExpression d = new VariableBooleanExpression("D");
LiteralBooleanExpression f = new LiteralBooleanExpression(false);

OrBooleanExpression dOrFalse = new OrBooleanExpression(d, f);

VariableBooleanExpression c = new VariableBooleanExpression("C");
OrBooleanExpression cOrExp = new OrBooleanExpression(c, dOrFalse);

VariableBooleanExpression b = new VariableBooleanExpression("B");
AndBooleanExpression bAndExp = new AndBooleanExpression(b, cOrExp);

VariableBooleanExpression e = new VariableBooleanExpression("E");
NotBooleanExpression notE = new NotBooleanExpression(e);

OrBooleanExpression bEtcOrNotE = 
new OrBooleanExpression(bAndExp, notE);

VariableBooleanExpression a = new VariableBooleanExpression("A");
OrBooleanExpression aOrExp = new OrBooleanExpression(a, bEtcOrNotE);

Context context = new Context();
context["A"] = false;
context["B"] = true;
context["C"] = true;
context["D"] = true;
context["E"] = true;
Console.WriteLine("Evaluates to: {0}", aOrExp.Interpret(context));

This came up with True - not surprising, given the values of the variables. Setting them all to false (save E) gave a return of False.

Note the implied precedence in the expression bEtcOrNotE. After the composite expressions (C or (D or false)) and not E are evaluated (lets say to X and Y respectively) we get 'B and X or Y'. There are three possible responses to this: evaluate as '(B and X) or Y', evaluate as '(B and (X or Y)', or throw an exception because this is a syntax error in this language. The way the code generates the composite expressions this is evaluated as '(B and X) or Y'. This is different to 'B and (X or Y)'. You need to be aware of this type of 'implementation-implicit rule' when designing such grammars.

It is easy to imagine extending this to do simple arithmetic, or even more complex tasks.

In summary, the classes contain the rules and operations of the 'language'; Context contains the data which the rules and operations work on.

Observations

The final structure on which Interpret is called is a Composite (the binary expressions are nodes, the unary expressions are leaves).
Parsing the input data into the final composite is very often done using the State pattern; if I had used a text string as the source of the composite I would have parsed it with State to generate the composite instead of hard-coding it as in the above example.
Interpreter is not necessarily the most run-time efficient way of achieving the purpose. It makes no optimisations at all � though it can be used as a way of optimising trees for given operations.
Interpreter is recognised not always from the Class names, but from the Interpret() method common to the class hierarchy.
State should be minimised in the objects. Any state which changes (there is none in the example) should be kept in the Context object if at all possible � and it should be possible.
The composite above was built from the bottom up (i.e. the innermost expressions were created first and added to their outer expressions, etc). This is not always possible. It may be necessary to provide access to inner expressions in the NonterminalExpressions and start from the outermost expression. If this is done it may not be possible to pass the values to the constructors of the expressions in which case other methods of setting them must be supplied. Recursion can sometimes solve this problem and is always worth a try.
The composite is often referred to as the abstract syntax tree. This partly because of its origin in compilers, but mostly because the composite is a construct in the language we have defined: it is a hierarchical representation of the original statement in the syntax of the language and it is abstract because it requires the data in the Context to actually mean anything.

This is not an easy pattern to generate, usually, but is very powerful. The above example was relatively easy as the rules of Boolean algebra are well-known (but remember the unstated precedence rule included). However, if you ever need to evaluate arbitrary Boolean expressions we now have code which can do it - this is no small thing. Most of the work in Interpreter goes into defining the language and grammar, and then possibly in coding the generation of the composite from the source data. This is why the text under the diagram emphasises the words 'repeatedly' and 'well-defined and well-understood'. Once the language and grammar are defined coding the classes is quite often relatively trivial.

Notes from GoF

If you have different operations to do on the same composite, it is better to use a Visitor instead of the Interpret method. This allows you define operations separately from the composite, and the composite can be re-used. Interpreter is common in Compilers, where abstract syntax trees are built up; Visitors to the tree perform operations such as type-checking, optimisation, code generation, and so on.
If you have many literals consider using Flyweight for them. (For example, if any of the variables or literals were repeated in the expression above I would have reused the object created for its first use. There is no point in having two copies of 'a' for example, as it keeps no changeable state.)
Considered in its most general form (i.e. an operation distributed over a class hierarchy based on the Composite pattern, nearly every use of the Composite pattern will also contain the Interpreter pattern. As a matter of communication Interpreter should be used when you want to think of the hierarchy and operations as a language.

From a Korean University (which no longer has the page)

It's easy to change and extend the grammar.

You can use inheritance to change or extend the grammar.
Existing expressions can be modified incrementally.
New expressions can be defined as variations on old ones.

Implementing the grammar is easy, too.

Classes defining nodes in the abstract syntax tree have similar implementations.
Often their generation can be automated with a compiler or parser generator.

Complex grammars are hard to maintain.

Grammars containing many rules can be hard to manage and maintain.
When the grammar is very complex, other techniques such as parser or compiler generator are more appropriate.

Adding new ways to interpret expressions

The Interpreter pattern makes it easier to evaluate an expression in a new way.
For example, you can support pretty printing or type-checking an expression by defining a new operation on the expression classes.
You can avoid changing the grammar classes to add new ways of interpreting expressions by using the Visitor pattern.

[svs: I think the use of the word 'easy' above may be misleading. Once the language and grammar are understood, the implementation is quite often 'easy', as in this case. Also, the associated code which would generate a BooleanExpression composite from an input text string, using the State pattern, is not trivial.]

Return to top

Contents

Interpreter