<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>


Arrays

For TurboForth V1.2

Written by Mark Wills

[ This article is important. Some of the secrets of Forth are revealed in here... Sshhhh! ]

For Joshua


Table of Contents


1 - Introduction

Arrays are something that you soon get to grips with in most programming languages. Just about every programming language supports them.

Except Forth.

Yes. It's a bit strange really, but Forth provides no built-in support for handling arrays of... well... anything. Nothing. Not a jot.

So you can't do arrays in Forth, right? Wrong. The reason you don't get arrays "out of the box" in Forth is simply this:

In Forth, you can implement your own arrays, and make them work just like you want them to.

This article, I'm going to present my own solution, which is tiny and solves the "array problem" in a Forth-like way (i.e. making Forth do as much of the work for us, so that we don't have to).

 

2 - An Array Syntax

The cool thing about Forth is that when you write Forth "code" you're just extending the language. Each new word that you write becomes part of the language, and we can use this (and indeed, you're supposed to!) to our advantage. We're going to invent our little array handling lexicon (group of words) to give TurboForth support for arrays.

Lets take a look at the typical syntax that one uses in BASIC and C for handling arrays of integers. Lets use cats as an example (no pussy jokes, please). We'll have a four element array that holds the number of lives for four cats: Tom, Dick, Harry, and er, Fred.

Here is how we might do this in BASIC and C:

BASIC C
DIM CATS(4) int cats[4];

Looks okay to me. How would we do this in Forth? Well, it's Forth, and unlike C and BASIC, we can extend the language any way we see fit. We just have to write the code and it just "becomes" part of the language.

When we declare VALUEs in Forth (a kind of variable that pushes its value to the stack when reference it) we give it an initial value via the stack, and a name, like this:

4 VALUE CATS

So, 4 goes on the stack, the word VALUE executes, and he (VALUE) knows he has to create a new entry in the dictionary for this new value. Where does he get the name from? Well, it comes immediately after the word VALUE, so VALUE itself reads the input in front of itself and gets the name and creates it. This would work well for us with arrays: It's Forthy and Forth users would be comfortable with it immediately. In Forth, it's "common sense".

Declaring an Array

How about this to declare an array:

size [] name

So, that's the size of the array (we're only going to talk about integer arrays), then that weird [] thing (which is what is actually going to create the array for us) and then the name of array, so that we can reference it in our code later. So:

9 [] lives

Would declare an integer array of 9 elements, called "lives". That [] symbol is just two square brackets. It's a Forth word (that we're going to create). Remember that in Forth you can call a word anthing you like, as long as it doesn't have a space in its name. So, we'll use [] as a kind of pictogram, since array access in C, Java and Pascal is accomplished with square brackets (BASIC being the odd one, as it uses parenthesis) so [] should be obvious to us that we're dealing with arrays.

Accessing Array Elements

Okay, so we have a way to declare an array. Now we need a way to write data into the elements of the array. Again, lets look at existing Forth practice for this.

Writing to the Array

In Forth, when writing to variables with the word ! we first place the value to store on the stack, then the variable to store it in, then the word ! takes this information off the stack and uses it to perform the write action, like this:

variable score
999 score !

Since anyone that uses Forth would be comfortable with this, it makes sense to use the same idiom if we can. However, we need three items to go on the stack:

Like this:

4 5 lives

So, we want to write the value 4 into the 5th element of the lives array. But, just like ! with variables, we need a word to do the writing for us. Well, it's Forth convention to use the symbol ! to represent writing values to something, since ! has been used to write to variables since at least 1970. So, we really should pick something similar for our lexicon that follows this convention, so that it conveys meaning to the reader.

I elect (because I'm writing this article) the word/symbol [!]

4 5 lives [!]

Here, again we're using the [] nomenclature to indicate that we're dealing with an array, but we have that rather cool ! inside the brackets, which is synonymous with ! which means write. So, the above reads as:

The stack signature for [!] would be:

value element array --

Which simply means that [!] exepcts the value, the element, and the array itself on the stack, and it removes them when it's done.

Reading from the Array

GHow do we read from the Array? Well, when we read variables, we use the word/symbol @ (fetch). @ just needs an address on the stack, and then it uses that address, goes and reads that address and pushes whatever it finds there onto the stack. Again, we should use the same principle, as the concept is already established in Forth and we should avoid completely new paradigms unless we have no choice; they're just harder to remember later on.

I elect [@] as our word to read from an array. It will need two items on the stack: The element number, and the array itself:

5 lives [@]

So, the above would read the 5th element from the lives array, and place it on the stack. The stack signature is:

element array -- n

So, after [@] executes, it removes the element number and the array reference from the stack, and leaves a value (n) for us - the value that it read from the array.

That's probably enough to get us started writing code. Strap in...

Code Smorgasbord

Well, the first word we need to tackle is [] creates an array for us.

A Detour Into CREATE

There is a very useful word in Forth called CREATE which, er, creates stuff for us. Specifically, it creates words in the dictionary. We're going to go on a little detour here and discuss CREATE for while, because it does a little more than just creating a word in the dictionary for us. The words it creates actually inherit some special built in behaviour. The push the address of themselves. Let's look at an example:

create chaos

If you type this directly into the TurboForth command line, it will dutifully obey you, and go ahead and create that word for you. You can now execute that word, straight away, which is a little weird, since the word has no code associated with it. Nevertheless, go ahead and try it:

As you can see, when we executed chaos (ha, see what I did there?!) something was pushed on the stack for us. Hmmm... What is that? Well, words created with CREATE reserve one cell (in TurboForth, a 16-bit (2 byte) word) of 'storage space'. In addition, words created with CREATE inherit some behaviour such that, when they are executed, they push the address of the this 'storage space' to the stack (if you are familiar with objects, you can think of words created with CREATE as objects of class CREATE, and they have a single method, which pushes the address of its storage). In memory, it would look like this (let's assume chaos is stored at memory address B000 hex to keep things simple):

Link Field Length Field Name Code Field Parameter Field
A3B0
5
c h a o s _
63E0
0

The link field is the link to the previously defined word in the dictionary and is not important to us. The length field just holds the length of the name (plus some other info, again not important to us). Then comes the 5 bytes of the name, plus a padding byte since it's an odd length. Then the Code Field. Now things are getting interesting. This is the address of some code that is executed when you execute chaos (this is where chaos' inherited behaviour comes from). So, when you execute chaos it will actually execute some code at address 63E0 (just chosen for illustration purposes - i've no idea where it goes in reality, and I wrote the bloody thing!).

What does that code do? It pushes the address of the parameter field. It's called a parameter field because we can store, well, a parameter in it. Personally I'd have called it a data field, but what do I know?

Lets prove it:

Here, we have created chaos, then we used ! to store 99 into it. Then we read it back.

Let's look at how this works:

  1. Remember that chaos is "pre-defined" to push the address of its parameter/data field onto the stack when we execute it;
  2. We put 99 on the stack;
  3. We execute chaos, and he puts his data field address on the stack
  4. We excute ! (store) which stores 99 into the address provided by chaos.

Then we read it back:

  1. We execute chaos, so he pushes the address of his data field onto the stack;
  2. We exedute @ (fetch), so he uses the address supplied by chaos, reads that address, and puts the result on the stack;
  3. We use . (dot) which takes the top item off the stack and prints it on the screen as a number. And there's the 99 that we stored.

If you're thinking "Hang on, that's just the same as a Forth variable" then congratulations. You're right. In fact the definition of VARIABLE in Forth is:

: VARIABLE CREATE ;

The [] Word

So, now that we know that know about CREATE, we can use it (and some other tricks) to build our array defining word for us, which we have called [].

We're going to built this up as we go, illustrating concepts as we go. Little steps.

As we determined above, we need to declare an array of a specific size (which is going to be passed in on the stack), so CREATE is only going to get us so far... We still need a way to reserve some storage space. I'll just go ahead and give you the code, and then we'll break it down. You'll probably have some "aha!" moments along the way. Later on, we're going to make it more sophisticated, but first we'll keep things simple. Here's the code for our array creation word, called [] (which I pronounce out loud as (drum roll...) "create array").

: [] ( size tib:"name" -- ) create dup , cells allot ;

Not a lot of code is it, but there's quite a lot going on, here you're about to see. Hang in there. It's worth learning this stuff.

First, the stack signature:

size tib:"name" --

This indicates that [] needs a size on the stack, and also needs a string (the name of the array) in the Terminal Input Buffer (tib). What on earth is the terminal input buffer? It's the buffer that you are typing stuff into on the TurboForth command line. This is just a fancy way of saying that [] requires the name of the array to follow the [] itself, like this:

10 [] years

So, let's go through step-by-step, and see what happens when you type 10 [] years into TurboForth:

  1. The number 10 goes onto the stack (no surprises there);
  2. The word [] executes, which:
    1. Executes CREATE, which:
      1. Reads ahead in the Terminal Input Buffer and identifies the string years;
      2. Uses this string to create an entry in the dictionary - so we now have a new word in the dictionary called years;
    2. Executes DUP. Remember that 10 that was placed on the stack? We need two of them, so we DUPd it. Now we have 10 10 on the stack;
    3. Executes , (comma). Comma takes a value off the stack and compiles it (or "commas it" as we say) into the current compilation address in memory. What on earth is that? It's a pointer that allows the system to know where the next thing to be compiled should go. Have a look at the table above that shows the chaos word, with the parameter field at the end. Well, when CREATE finishes creating the word, the current compilation address (known as HERE) is pointing right at that parameter field. Now, when we execute , (comma) it takes the value 10 off the stack, and "comma's it" into memory, so it goes right into the parameter/data field. So, in our case, years has 10 sitting in its data field, which we just put there. Why did we just put that in there? Because, later on, we will need to know how big the array is so that we can do bounds checking (i.e. check that we don't do something stupid like write to the 100th element of a 10 element array. So, since we have stored the size of our array "in" years, we'll be able to access it later. Recall that words created with CREATE push the address of their data field every time, which, in our case, will always contain the size of the array. Sorry to labour the point but it's critical to understand this.
    4. Executes CELLS. This word simply takes what is on the stack (in our case, the first 10 that we typed in at the command line) and multiplies it by two. Why do we need to do that? Well, because next we're going to execute...
    5. ALLOT. This is a neat word. It takes number off the stack (20 - it was 10, but CELLS multiplied it by two) and reserves that many bytes of memory. We want to reserve CELLS, which are two bytes, (10 cells = 20 bytes) which is why we used CELLS convert a cell count into bytes). Internally, ALLOT "reserves" the memory by simply adding the value we just computed (20) to HERE (the current compilation address pointer). So, this is how memory would look after creating our array, years:
Link Field Length Field Name Field Code Field Data Field 20 bytes (10 cells) reserved
A0FE
5
y e a r s _
63E0
10
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The next word that we create in the dictionary will start after the last reserved byte. That's why it is very important that we include bounds checking. If we didn't, then the word [!] (when we get around to writing it) could possibly destroy other words in the dictionary if we supplied an out-of-range value for the element number.

TO BE CONTINUED

 

Published 7 June 2017


<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>