Enhanced DATA

For Jean

TurboForth has a built-in facility to compile data. "Data" in this context refers to a list of integers. It works like this:

: some-word ( - addr #cells ) data 5 1 2 3 4 5 ;

The first number following data is the number of integers that follow, in the example above were telling the word data that five integers follow. Then come the integers themselves. At run-time, as noted by the stack signature, data pushes the address of the data (that is the address of the start of the integers) and the number of integers to the stack.

Whilst DATA is quite a useful word it has a number of limitations as implemented:

You need to know how many integers you want to compile in your data;
Data can only be used within a colon definition;
A single line of data cannot span more than 80 characters if being entered via a keyboard. This limit does not apply when loading from a block.

This author had long desired to replace the data facilities built into TurboForth, alas space restrictions inside the ROM prevented this. The code presented here results in a much more useful and versatile implementation and completely eradicates the limitations noted above.

It is implemented using the two words noted below.

Here's an example:

DATA[ 12 24 36 48 60 72 84 96 108 120 132 144 ]DATA

DATA[

The word DATA[ begins the data list. Note that you do not need to know how many data items you wish to compile - just go ahead and enter them.

]DATA

The word ]DATA indicates the end of the data list. When the data list is closed the address of the start of the data list and the number of items in the data list is pushed to the stack. Behaviour is identical whether the word is used on the command line or in a colon definition:

At the command line: The address of the data and the number of data items is immediately pushed to the stack;
In a colon definition: When the colon definition is executed the address of the data and the number of data items is pushed to the stack.

Example

: 12-table ( -- addr #items )
DATA[ 1 12 24 36 48 60 72 84 96 108 120 144 ]DATA ;

Executing the above code causes the address of the beginning of the data list and a number of items in the list to be pushed to the stack. These can then be retrieved with code such as the following:

: show-12s ( -- )
12-table 0 do dup i cells + @ . loop drop ;

Enhanced DATA Code

Here's the enhanced data code. It occupies a whopping 162 bytes when compiled! A very small price to pay for such a versatile enhancement. Note that this code is also included on block 29 of the tools disk which is available in the download section of the site.

variable _state : data[] ( -- addr #cells ) r@ 2+ r@ @ dup cells 2+ r> + >r ; : data[ ( -- here ) state @ _state ! state 0! 1 $A068 ! _state @ if compile data[] here 0 , else here then ; immediate : ]data ( here -- ) dup here swap - 2/ _state @ if 1- swap ! then $A068 0! _state @ state ! ; immediate

[Added 24th August 2015] Further Enhancement - Smaller and Faster

A little sprinkling of machine code can go a long way. When a data list is used inside a colon definition, the word data[] in the above code is responsible for pushing the address and count. As can be seen, some return stack machinations are required: Firstly, the number of items in the list is accessed via the return stack. This value then has to be replaced in order to make the system jump over the data. There are 10 Forth instructions in the above code, meaning there are 10 passes through the Forth inner interpreter. This is a good candidate for optimisation with machine code, as it is particularly easy to do. The following assembler code is a much faster equivalent:

asm: data[] ( -- addr #cells ) r3 *+ r1 mov, \ get #cells in r1 sp dect, \ make space on the stack r3 sp ** mov, \ push the address of the data sp dect, \ make space on the stack r1 sp ** mov, \ move #cells to stack r1 1 sla, \ convert to bytes r1 r3 a, \ adjust Forth PC to jump over the data ;asm

It is undesirable to have to load the assembler each time we want to use DATA[ - therefore, we'll use the ASM>CODE utility (boot disk, block 29) to convert the assembler code to a CODE: definition. We end up with this:

   CODE: data[] ( -- addr #cells )
   C073 0644 C503 0644 C501 0A11 A0C1 ;CODE

We can then replace the Forth definition of data[] with the above CODE definition, resulting in much faster execution, and reducing the size to 156 bytes! [ Note: The first incarnation of this code was 188 bytes, so we've reduced the code size by 17% - not bad ]

The complete code, therefore (and the code that is included on the Tools disk) is given below:

HEX CODE: data[] ( -- addr #cells ) C073 0644 C503 0644 C501 0A11 A0C1 ;CODE DECIMAL variable _state : data[ ( -- here ) state @ _state ! state 0! 1 $A068 ! _state @ if compile data[] here 0 , else here then ; immediate : ]data ( here -- ) dup here swap - 2/ _state @ if 1- swap ! then $A068 0! _state @ state ! ; immediate

Performance Test

The original Forth version of the code, and the machine code enhanced version of the code were tested and timed using the following test code:

   : 12-table ( -- addr #items ) 
     DATA[ 12 24 36 48 60 72 84 96 108 120 132 144 ]DATA
  
   : test 30000 0 do 12-table 2drop loop ;

The execution times are as follows:

Original Forth code: 20 seconds
Machine code enhanced: 7 seconds

Thereby exhibiting an (approximate) performance improvement of 65%.

Article uploaded 17th July 2015
Expanded and corrected: 24th August 2015
Additional typo corrections: 29 December 2020