Contents
Interpreter
From GoF Given a language, define a representation for its grammar along
with an interpreter that uses the representation to interpret sentences in the
language.
Interpreter may be called for if a particular class of problem occurs repeatedly
in a well-defined and well-understood domain. Interpreter involves
defining the domain with a 'language' and 'grammar' and coding an 'interpreter'
to process that 'language'.
This is fine, but not very clear. Let's hope an example will make it clearer:
Problem: A system generates complex Boolean expressions which it must evaluate.
These expressions are not known until run-time, and so we must design a
'language' which we can interpret. The Boolean expressions we need to process
are:
-
Boolean literals (true, false)
-
Variables (a named variable evaluating to a Boolean expression)
-
And expressions
-
Or expressions
-
Not expressions
Each expression above (except the literals) may be a compound expression. We
don't need to specify here the rules of how values combine - boolean algebra is
well-enough known - but usually we would have to. An example of the expressions
to be evaluated is:
-
A or (B and (C or (D or false)) or not E)
-
where A, B, C, D, and E have true or false values (decided at run-time)
We have two types of expression: unary and binary. The unary expressions are
literals, variables, and Not. The binary expressions, And and Or, require two
expressions (which can each be unary or binary). Unary expressions are
represented in the diagram as TerminalExpression; binary expressions are
NonterminalExpressions. In this simple example there are no more complex
expressions.
[svs: Note that we are going to build some code to address this particular expression
and this expression only. The classes will be capable of evaluating far more than that,
but putting then together in the correct way is a different subject. For that,
see the State pattern]
We start off with our abstract base class:
public abstract class BooleanExpression {
public abstract bool Interpret(Context context);
}
The Interpret method is going to do the work. Since we are looking for a Boolean
result, it will return a Boolean value. What the Context class contains we will
get to later.
Next comes the Literal expressions:
public class LiteralBooleanExpression : BooleanExpression {
public LiteralBooleanExpression(bool literal) {
this.literal = literal;
}
private bool literal;
public override bool Interpret(Context context) {
return literal;
}
}
The Not expression:
public class NotBooleanExpression : BooleanExpression {
public NotBooleanExpression(BooleanExpression exp) {
this.exp = exp;
}
private BooleanExpression exp;
public override bool Interpret(Context context) {
return !exp.Interpret(context);
}
}
And the last unary expression, the variable. Here we stop and think a bit: the
class can contain the name of a variable as a string quite easily, but to
evaluate that it needs some way to find out what its value should be. This is
where Context comes in. In this example, Context contains the values of
variables:
public class Context {
private Hashtable variables = new Hashtable();
public bool this [string variableName] {
get {
if (variables.Contains(variableName)) {
return (bool)variables[variableName];
} else {
throw new ApplicationException
("No context for variable " + variableName);
}
}
set {
variables.Add(variableName, value);
}
}
}
Now we can do the variable expressions:
public class VariableBooleanExpression : BooleanExpression {
public VariableBooleanExpression(string variableName) {
this.variableName = variableName;
}
private string variableName;
public override bool Interpret(Context context) {
return context[this.variableName];
}
}
The binary expressions are fairly simple:
public class AndBooleanExpression : BooleanExpression{
public AndBooleanExpression
(BooleanExpression exp1, BooleanExpression exp2) {
this.exp1 = exp1;
this.exp2 = exp2;
}
private BooleanExpression exp1;
private BooleanExpression exp2;
public override bool Interpret(Context context) {
return
exp1.Interpret(context) && exp2.Interpret(context);
}
}
The implementation of OrBooleanExpression is left as an exercise for the reader.
Looking at AndBooleanExpression and NotBooleanExpression, you can see why all these
expressions inherit from the same type: each boolean expression may need to be
evaluated (Interpreted) by another one.
With these classes we can now construct Boolean expressions of arbitrary
complexity. How these are constructed is an implementation matter: there may be
a text string input which is parsed into the relevant expressions, or it may be
made up in some other method. Here is code to construct our original example:
VariableBooleanExpression d = new VariableBooleanExpression("D");
LiteralBooleanExpression f = new LiteralBooleanExpression(false);
OrBooleanExpression dOrFalse = new OrBooleanExpression(d, f);
VariableBooleanExpression c = new VariableBooleanExpression("C");
OrBooleanExpression cOrExp = new OrBooleanExpression(c, dOrFalse);
VariableBooleanExpression b = new VariableBooleanExpression("B");
AndBooleanExpression bAndExp = new AndBooleanExpression(b, cOrExp);
VariableBooleanExpression e = new VariableBooleanExpression("E");
NotBooleanExpression notE = new NotBooleanExpression(e);
OrBooleanExpression bEtcOrNotE =
new OrBooleanExpression(bAndExp, notE);
VariableBooleanExpression a = new VariableBooleanExpression("A");
OrBooleanExpression aOrExp = new OrBooleanExpression(a, bEtcOrNotE);
Context context = new Context();
context["A"] = false;
context["B"] = true;
context["C"] = true;
context["D"] = true;
context["E"] = true;
Console.WriteLine("Evaluates to: {0}", aOrExp.Interpret(context));
This came up with True - not surprising, given the values of the variables.
Setting them all to false (save E) gave a return of False.
Note the implied precedence in the expression bEtcOrNotE. After the composite
expressions (C or (D or false)) and not E are evaluated (lets say to X and Y
respectively) we get 'B and X or Y'. There are three possible responses to
this: evaluate as '(B and X) or Y', evaluate as '(B and (X or Y)', or throw an
exception because this is a syntax error in this language. The way the code
generates the composite expressions this is evaluated as '(B and X) or Y'. This
is different to 'B and (X or Y)'. You need to be aware of this type of 'implementation-implicit
rule' when designing such grammars.
It is easy to imagine extending this to do simple arithmetic, or even more
complex tasks.
In summary, the classes contain the rules and operations of the 'language';
Context contains the data which the rules and operations work on.
Observations
-
The final structure on which Interpret is called is a Composite (the binary
expressions are nodes, the unary expressions are leaves).
-
Parsing the input data into the final composite is very often done using the
State pattern; if I had used a text string as the source of the composite I
would have parsed it with State to generate the composite instead of
hard-coding it as in the above example.
-
Interpreter is not necessarily the most run-time efficient way of achieving the
purpose. It makes no optimisations at all � though it can be used as a way of
optimising trees for given operations.
-
Interpreter is recognised not always from the Class names, but from the
Interpret() method common to the class hierarchy.
-
State should be minimised in the objects. Any state which changes (there is
none in the example) should be kept in the Context object if at all possible �
and it should be possible.
-
The composite above was built from the bottom up (i.e. the innermost
expressions were created first and added to their outer expressions, etc). This
is not always possible. It may be necessary to provide access to inner
expressions in the NonterminalExpressions and start from the outermost
expression. If this is done it may not be possible to pass the values to the
constructors of the expressions in which case other methods of setting them
must be supplied. Recursion can sometimes solve this problem and is always
worth a try.
-
The composite is often referred to as the abstract syntax tree. This partly
because of its origin in compilers, but mostly because the composite is a
construct in the language we have defined: it is a hierarchical representation
of the original statement in the syntax of the language and it is abstract
because it requires the data in the Context to actually mean anything.
This is not an easy pattern to generate, usually, but is very powerful. The
above example was relatively easy as the rules of Boolean algebra are
well-known (but remember the unstated precedence rule included). However, if you ever need to evaluate arbitrary Boolean expressions
we now have code which can do it - this is no small thing. Most of the work in
Interpreter goes into defining the language and grammar, and then possibly in
coding the generation of the composite from the source data. This is why the
text under the diagram emphasises the words 'repeatedly' and 'well-defined and
well-understood'. Once the language and grammar are defined coding the classes
is quite often relatively trivial.
Notes from GoF
-
If you have different operations to do on the same composite, it is better to
use a Visitor instead of the Interpret method. This allows you define
operations separately from the composite, and the composite can be re-used.
Interpreter is common in Compilers, where abstract syntax trees are built up;
Visitors to the tree perform operations such as type-checking, optimisation,
code generation, and so on.
-
If you have many literals consider using Flyweight for them. (For example, if
any of the variables or literals were repeated in the expression above I would
have reused the object created for its first use. There is no point in having
two copies of 'a' for example, as it keeps no changeable state.)
-
Considered in its most general form (i.e. an operation distributed over a class
hierarchy based on the Composite pattern, nearly every use of the Composite
pattern will also contain the Interpreter pattern. As a matter of communication
Interpreter should be used when you want to think of the hierarchy and
operations as a language.
From a Korean University (which no longer has the page)
-
It's easy to change and extend the grammar.
-
You can use inheritance to change or extend the grammar.
-
Existing expressions can be modified incrementally.
-
New expressions can be defined as variations on old ones.
-
Implementing the grammar is easy, too.
-
Classes defining nodes in the abstract syntax tree have similar
implementations.
-
Often their generation can be automated with a compiler or parser generator.
-
Complex grammars are hard to maintain.
- Grammars containing many rules can be hard to manage and maintain.
- When the grammar is very complex, other techniques such as parser or compiler generator are more appropriate.
- Adding new ways to interpret expressions
- The Interpreter pattern makes it easier to evaluate an expression in a new way.
- For example, you can support pretty printing or type-checking an expression by defining a new operation on the
expression classes.
- You can avoid changing the grammar classes to add new ways of interpreting expressions by using the Visitor pattern.
[svs: I think the use of the word 'easy' above may be misleading. Once the language and grammar are
understood, the implementation is quite often 'easy', as in this case. Also, the associated code
which would generate a BooleanExpression composite from an input text string, using the State pattern, is not trivial.]