String to Expression Part 2: The way of the parser
The previous part we built a a tokenizer. Now we need to get into the meat of our project, parsing tokens into an expression
The plan - shunting yard algorithm
Normally I’m for just writing code and seeing what comes out, however for something as involved as this we need to have a plan, and that plan is the Shunting Yard Algorithm.
It turns out we can approach our problem the same way railway shunting yards use tracks to separate railway cars. If (like the majority of developers) that analogy means nothing to you, don’t worry the algorithm is quite simple.
Step 1: Applying the tokens
Firstly we will run forward through all our tokens and classify each token and put them in one of two stacks, a token can either be:
- An Operand - An expression that has (or can be evaluated to) a constant. For example
2
,'mystring'
,x.Value
,2+3
are all operands. - An Operator - is something that will act on one or more operands and as will create a new operand as a result. For example
+
,*
,toLower()
are all operators.
For example if we are processing a bunch of tokens like this
after we classify our two stacks look something like this
Operand Stack | Operator Stack |
---|---|
3 | + |
5 | * |
6 |
Step 2: Pop operations and execute them
Now that we have applied the tokens we have everything in one of two stacks. The next step is to starting pop’ing items off our operation stack, we will then execute them. When we execute them they will pop all the operands they need from the operand stack combine them and push the result back onto the operand stack.
Given our example
Operand Stack | Operator Stack |
---|---|
3 | + |
5 | * |
6 |
We would then
- Pop
*
off the operator stack. - Execute the multiplication, to do this it needs two operands which we pop from the operand stack, so it would pop
6
then5
- The result of the multiplication is the operand
(6*5)
(remember an operand can be anything that can be evaluated to a constant so it can be a combination of other operands). We would then push this back on the operand stack
So after our first operator execution our stacks look like
Operand Stack | Operator Stack |
---|---|
3 | + |
(6*5) |
Our next step is to continue pop’ing things off the opertor stack and executing just as before. So the next +
is popped off the operation stack, it in turn pops two operands from the operand stack combines them together to a new operand which it puts back on the stack. This leaves us with
Operand Stack | Operator Stack |
---|---|
((6*5) + 3) |
When we run out of operators we should (if our input was a valid expression) only have a single operand left, that single operand is our final expression.
Other features
It should be noted there are a few additional processing add-ons we can use to make it possible to handle things like
- Order of operations - for example multiplication should always happen before addition)
- Brackets - bracketed expressions should happen first
- Functions - which take in variable number of operands
Ill discuss these add-ons in the next post, for now we will just get the basic infrastructure in place.
Being practical
Modeling the problem
So what does the Shunting Yard algorithm mean in our context? We will model it as
- A
ParseState
which will hold our stack of operands and operators. - An
Operand
which simply has anExpressionn
. It might be aConstantExpression
or aMemberExpression
or be the root for a tree of an expression such as aBinaryExpression
- An
Operator
which contains anAction
. This action will pop someOperands
from theParseState
create a newExpression
using theOperands
and push the newExpression
back on theOperand
stack. Exactly how many operands it pops and what expression it creates is dependent on the type ofOperator
.
More TokenDefintions
We now need to run through our token list and classify the tokens as either an Operand
or Operator
. The best place to determine what type of token we are, is where we defined the token to begin with, at the TokenDefinition
. We will add a virtual Apply method with the intention we can implement the specific in subclasses.
Firstly we will make subclass of OperandDefintion
which will know how to turn the token string into an Expression, it will Apply
this knowledge to add the item to the Operand
stack.
Next we will make a BinaryOperatorDefinition
to handle our simple operators such as +
, *
, AND
etc. As we make more complicated operators we can simply create new types of TokenDefinition
. It should be noted that we do not actually execute the Operator
in the Apply
method we only create an Action
so it can be executed later.
The Parser
The thing remaining is actually calling the apply, which is almost trivial. Ive also thrown in the operator execution loop in there which is equally trivial. We Apply all the tokens, then we pop off the Operator
stack and execute the operators.
Finally we hook it up with our Tokenizer
With all this in place we can configure our entire language via TokenDefintion
and its subclasses
What is next
We have a working parser it makes expressions we are good to go right? Well we are missing some very critical elements
- order of operations - as it is the expression is evaluated right-to-left, BEMA should apply
- brackets - putting brackets round anything should force that part ot evaluate first
- functions - we are not calling any functions here
But it turns out all of these can be built upon our existing objects, we just need to make new TokenDefinitions
for them, which is what we will do in part 3.
Full code
The full code of the entire parser is can be found at https://github.com/codecutout/StringToExpression
Some of the code will not appear exactly as it is in this post, mostly because the real code has a lot more error checking, although the principles are the same.