Using Rust, I created a program that can understand and run a part of the Forth programming language, which works with an interrior stack of data in memory. I also made a Lexer that turns input strings into tokens, which are easier to process and execute.
Project Summary
This project involves implementing a basic evaluator for a small subset of the stack-based programming language, Forth. The evaluator should support the following words:
+, -, *, / (integer arithmetic)
DUP, DROP, SWAP, OVER (stack manipulation)
It should also support defining new words using the customary syntax: : word-name definition ;.
Initial Attempt
The initial approach involved binding closures into HashMap, which led to complications with Rust. Additionally, there were unnecessary macros that, while aesthetically pleasing, did not contribute to the functionality of the code. Here's a snippet of the initial code:
type Operation = Rc<dyn Fn(&mut Forth, Option<&str>) -> Result>;
type Operations = HashMap<String, Operation>;
// - - - //
install!(new,
"DROP" => |forth: &mut Forth, _| {
let stack = &mut forth.eval_stack;
match stack.pop() {
Some(_) => Ok(()),
None => Err(Error::StackUnderflow),
}
};
"DUP" => |forth: &mut Forth, _| {
let stack = &mut forth.eval_stack;
match stack.last() {
Some(last) => {stack.push(*last); Ok(())},
None => Err(Error::StackUnderflow),
}
};
"SWAP" => |forth: &mut Forth, _| {
let stack = &mut forth.eval_stack;
let a = stack.pop();
let b = stack.pop();
match (a, b) {
(Some(first), Some(last)) => {stack.extend([last, first]); Ok(())},
_ => Err(Error::StackUnderflow)
}
}
// . . .
The code above was intended to perform operations such as DROP, DUP, and SWAP. However, the implementation was not as efficient as expected.
Example
Here's an example of how the code was used:
let x = "4";
install!(n, "foo" => |forth: &mut Forth, _| {
forth.call("_VAL", Some(x))
});
call!(n, "foo");
replace!(n, "foo" -> ("DUP", "")("_VAL", "1")("+", ""));
call!(n, "foo");
call!(n, "foo");
replace!(n, "foo" -> ("foo", "")("_VAL", "1")("+", ""));
call!(n, "foo");
dbg!(&n.eval_stack);
Output:
stack = [4, 5, 5, 6]
Revised Approach
The revised approach involved using a tokenizer instead of binding functions. This approach made the base functions just operations that the stack can perform on itself. Here's a snippet of the revised code:
type Tokens = Vec<TokenTypes>;
type Identifiers = HashMap<String, Tokens>;
// - - - //
impl StackOperations for Stack {
fn _dupl(&mut self) -> Result {
if let Some(&num) = self.iter().last() {
self.push(num);
Ok(())
} else {
Err(Error::StackUnderflow)
}
}
fn _drop(&mut self) -> Result {
self.pop().map_or(Err(Error::StackUnderflow), |_| Ok(()))
}
fn _swap(&mut self) -> Result {
let stack_len = self.len();
if stack_len < 2 {
Err(Error::StackUnderflow)
} else {
self.swap(stack_len - 2, stack_len - 1);
Ok(())
}
}
Example
The revised approach simplifies the process by reading a single string and converting it to tokens. When a function is called, it is flattened down, replacing all identifiers with either operations or values.
let x: 132 = 4;
let install_foo: String = format!("
: foo {x} ;
foo
");
let replace_foo = "
: foo foo 1 + ;
foo foo foo foo
"
self.eval(&install_foo, replace_foo)?;
Output:
stack = [4, 5, 6, 7, 8]
Tokenizer Conversion
The tokenizer conversion was a significant part of the revised approach. Instead of binding functions, which led to complications, the revised approach involved binding a list of tokenized instructions. This made the base functions just operations that the stack could perform on itself.
The tokenizer reads a single string and converts it into tokens. For example, for the replace_foo function, the tokenizer reads it as follows:
&tokens = [
OpenDef,
Ident(
"foo",
),
Ident(
"foo",
),
Val(
1,
),
Ident(
"+",
),
CloseDef,
Ident(
"foo",
),
Ident(
"foo",
),
Ident(
"foo",
),
Ident(
"foo",
),
]
The tokenizer then replaces foo in the identifiers map:
&self.idents = {
"*": [
Op(
"_mult",
),
],
"foo": [
Ident(
"foo",
),
Val(
1,
),
Ident(
"+",
),
],
...
When foo is called again, the function is flattened down, replacing all identifiers with either operations or values:
"foo":[
Val(
4,
),
Val(
1,
),
Op(
"_add",
)
]
The tokenizer conversion simplifies the process by breaking down the function into a series of tokens. This makes it easier to identify and perform operations, leading to a more efficient and less complicated solution.
Conclusion
The revised approach proved to be more efficient and less complicated than the initial approach. It simplified the process by using a tokenizer and made the base functions just operations that the stack can perform on itself. This project serves as a good example of how a change in approach can lead to a more efficient and less complicated solution.