<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>


TurboForth TMS9900 Assembler Version 1.0

For TurboForth V1.1/V1.2

Ported by Mark Wills from the original TI-Forth Assembler source code

Note: This document is also available in PDF format.

Document uploaded 10 July 2012. Updated 12 October 2014


Table of Contents

1 -Introduction to the TurboForth 9900 Assembler

2 -TMS900 Assembly Mnemonics

3 -TurboForth's Workspace Registers

4 -Using the Assembler

5 -TMS9900 Addressing Modes with the Forth Assembler

5.1 -Workspace Register Addressing

5.2 -Symbolic Memory Addressing

5.3 -Workspace Register Indirect Addressing

5.4 -Workspace Register Indirect Auto-Increment Addressing

5.5 -Indexed Memory Addressing

5.6 -Addressing Mode Words for Special Registers

6 -Structured Assembler Constructs

7 -Assembler Jump Tokens

7.1 -Assembly Example for Structured Constructs

8 -Test Code

8.1 -Add two numbers

8.2 -Fill screen with a character

8.3 -Nested Delay

9 -General Usage Notes

9.1 -Using Your Own Workspace

9.2 -Calling Other Code Words

9.3 -Mixing Forth and Assembly Language in an :ASM Definition

9.4 -Accessing Forth Strings from Assembly Language

10 -TurboForth Assembler Code

10.1 -Errors Corrected from the Original Source Code

11 -Conclusion


1 Introduction to the TurboForth 9900 Assembler

The assembler presented here is ported from the original FIG Forth implementation for TI Forth, modified for use with F83 TurboForth by Mark Wills.

The assembler is typical of assemblers supplied with Forth systems; it provides the capability of using all the op-codes of the TMS9900 as well as the ability to use structured assembly instructions.

Labels are not supported; they are not actually required, since the assembler adds high-level control structures such as IF,...ENDIF, and BEGIN,...WHILE,...REPEAT, making labels completely redundant.

The complete Forth language is available to the user to assist in macro type assembly if desired. The assembler uses the standard Forth convention of Reverse Polish Notation (RPN) for each instruction. For example the instruction to add register 1 to register 2 is:

R1 R2 A,

As can be seen in the above example, the 'add' instruction mnemonic is followed by a comma. Every op-code in the assembler is followed by a comma. The significance is that when the op-code is reached during the assembly process, the instruction is compiled into the dictionary at that point. The comma convention serves as a reminder of this compile operation. It also serves to assist in differentiating assembler words from the rest of the words in the Forth language. Note that the order in which registers and other operands are specified is exactly the same as standard 9900 assembly language.


2 -TMS900 Assembly Mnemonics

A, AB, ABS, AI, ANDI, B, BL,
BLWP, CMP, CB, CI, CKOF, CKON, CLR,
COC, CZC, DEC, DECT, DIV, IDLE, INC,
INCT, INV, JEQ, JGT, JH, JHE, JL,
JLE, JLT, JMP, JNC, JNE, JNO, JOC,
JOP, LDCR, LI, LIMI, LREX, LWPI, MOV,
MOVB, MPY, NEG, ORI, RSET, RTWP, S,
SB, SBO, SBZ, SETO, SLA, SOC, SOCB,
SRA, SRC, SRL, STCR, STST, STWP, SWPB,
SZC, SZBC, TB, X, XOP, XOR,  

The above words are available when the assembler is loaded. Note that the 9900 instruction C (compare) has been renamed to CMP, - this is to avoid collision with the standard Forth word C, which is used to compile a byte to memory.


3 -TurboForth's Workspace Registers

Most assembly code in Forth will probably use the Forth workspace registers. The following table describes the register allocation.

Register Name Usage

R0

Available.

R1

Available.

R2

Available.

IP (R3)

Interpretive pointer.

SP (R4)

Stack pointer.

RP (R5)

Return stack pointer.

W (R6)

Inner interpreter current word pointer.

R7

Available.

R8

Available.

R9

Available.

R10

Available.

R11

Available.

NEXT (R12)

Points to the next instruction fetch routine at >8328. May be used as long its value is restored before returning to the TurboForth environment.

R13

Available.

R14

Available.

R15

Available.

As can be seen from the above table, the only registers that absolutely must not be changed are R3, R4, R5 and R12. Whilst R6 is used it is transitory and can generally be used in an assembly language word. All other registers may be used, however it should be noted that TurboForth code words (the words inside the TurboForth dictionary implemented in machine code) also use these registers (obviously). Therefore, if you wish to persist data between assembly language routines you should either use the stack, or use a seperate workspace. Please see section 9.1, Using Your Own Workspace, page 16 and 9.2, Calling Other Code Words, page 16 for more information.


4 -Using the Assembler

To use the assembler, simply load it from the appropriate block. Assembly definitions begin with the word ASM: and end with the word ;ASM. The following example copies the current frame count (from the VDP interrupt) to the stack:

To run the example, just type FRAME, and a value will be pushed to the stack.


5 -TMS9900 Addressing Modes with the Forth Assembler

All of the addressing modes of the 9900 are supported by the assembler. Each of the following examples will show both the Forth assembler code for various instructions, and the more conventional TMS9900 assembly method of coding the same instructions. The TurboForth version of the assembler has been enhanced with some additional functionality with respect to addressing modes, which considerably improve the readability of the resulting assembly code. This is indicated where appropriate in the following sections.

5.1 -Workspace Register Addressing

TMS9900 registers in Forth assembler are referenced directly by number:

Forth Assembler Conventional Assembler
ASM: EX1 DEF EX1
  1 2 A, EX1 A R1,R2
   3 INC,     INC R3
   3 FFFC ANDI,     ANDI R3,>FFFC
;ASM     B *NEXT

Note how the operands are specified in the same order as standard TMS9900 assembly language; the only difference at this point is that the instruction mnemonic has moved to the end.

5.1.1 -Enhanced Functionality

Note that, for enhanced clarity, the ability to use register numbers starting with R has been added. This functionality is not included in the original TI-Forth assembler. For example, the above code can be changed as follows, which makes it more readable:

Forth Assembler Conventional Assembler
ASM: EX1 DEF EX1
   R1 R2 A, EX1 A R1,R2
   R3 INC, INC R3
   R3 FFFC ANDI, ANDI R3,>FFFC
;ASM B *NEXT

5.2 -Symbolic Memory Addressing

Symbolic addressing is done with the @() word. It is used after the address. Note how, in the Forth assembler, it is possible to refer to earlier defined Forth variables (and indeed other words, constants etc.)

Forth Assembler Conventional Assembler
VARIABLE VAR1 VAR1 BSS 2
VARIABLE VAR2 5 VAR2 ! VAR2 DATA 5
ASM: EX2     DEF EX2
   VAR2 @() R1 MOV, EX2 MOV @VAR2,R1
   R1 2 SRC,     SRC R1,2
   R1 VAR1 @() S,     S R1,@VAR1
   VAR2 @() VAR1 @() SOC,     SOC @VAR2,@VAR1
;ASM     B *NEXT

5.2.1 -Enhanced Functionality

In addition to the @() notation shown above, Symbolic Memory Addressing can also be specified by using the @@ notation, as found in the Wycove Forth assembler. This author finds the Wycove notation much more readable than the TI-Forth notation. The modified code is shown below:

Forth Assembler Conventional Assembler
VARIABLE VAR1 VAR1 BSS 2
VARIABLE VAR2 5 VAR2 ! VAR2 DATA 5
ASM: EX2     DEF EX2
   VAR2 @@ R1 MOV, EX2 MOV @VAR2,R1
   R1 2 SRC,     SRC R1,2
   R1 VAR1 @@ S,     S R1,@VAR1
   VAR2 @@ VAR1 @@ SOC,     SOC @VAR2,@VAR1
;ASM     B *NEXT

The key to understanding this is simply to realise that when using an addressing mode, it simply follows the operand to which you want it to apply. Note how, in the example above @@ follows the VAR2 operand, giving us:

VAR2 @@

This is directly equivalent to @VAR2.

5.3 -Workspace Register Indirect Addressing

Workspace Register Indirect Addressing is done with the *? word. It is used after the register number to which it pertains.

Forth Assembler

Conventional Assembler

2000 CONSTANT XRAM XRAM EQU >2000
ASM: EX3 DEF EX3
   R1 XRAM LI, EX3 LI R1,XRAM
   R1 *? R2 MOV, MOV *R1,R2
;ASM B *NEXT

5.3.1 -Enhanced Functionality

In addition to the *? notation shown above, Workspace Register Indirect Addressing can also be specified by using the ** notation, as found in the Wycove Forth assembler. The modified code is shown below:

Forth Assembler

Conventional Assembler

2000 CONSTANT XRAM XRAM EQU >2000
ASM: EX3     DEF EX3
   R1 XRAM LI, EX3 LI R1,XRAM
   R1 ** R2 MOV,     MOV *R1,R2
;ASM     B *NEXT

Once again, notice how the addressing mode, in this case *? or ** follows the register:

R1 **

is directly equivalent to *R1.

5.4 -Workspace Register Indirect Auto-Increment Addressing

Workspace Register Indirect Auto-Increment Addressing is performed with the *?+ word. It is used after the register to which it pertains:

Forth Assembler Conventional Assembler
2000 CONSTANT XRAM XRAM EQU >2000
ASM: EX4     DEF EX4
   R1 XRAM LI, EX4 LI R1,XRAM
   R1 *?+ R2 MOV,     MOV *R1+,R2
;ASM     B *NEXT

5.4.1 -Enhanced Functionality

In addition to the *?+ notation shown above, Workspace Register Indirect Auto-Increment Addressing can also be specified by using the *+ notation, as found in the Wycove Forth assembler. The modified code is shown below:

Forth Assembler Conventional Assembler
2000 CONSTANT XRAM XRAM EQU >2000
ASM: EX4 DEF EX4
   R1 XRAM LI, EX4 LI R1,XRAM
   R1 *+ R2 MOV,     MOV *R1+,R2
;ASM     B *NEXT

Once again, the important distinction is that the addressing mode follows the register to which it pertains.

R1 *+ is equivalent to *R1+

The instruction

MOV *R1+,*R2+

would be coded as

R1 *+ R2 *+ MOV,

5.5 -Indexed Memory Addressing

The final addressing type is Indexed Memory Addressing. This is performed with the @(?) word used after the index and register, as shown below:

Forth Assembler Conventional Assembler
2000 CONSTANT XRAM XRAM EQU >2000
ASM: EX5 DEF EX5
   XRAM R1 @(?) R2 MOV, EX5 MOV @XRASM(R1),R2
   XRAM 22 + R2 @(?) XRAM 26 + R3 @(?) MOV,     MOV XRAM+22@(R2),XRAM+26@(R3)
;ASM     B *NEXT

5.5.1 -Enhanced Functionality

In addition to the @(?) notation shown above, Indexed Memory Addressing can also be specified by using the () notation, as found in the Wycove Forth assembler. The modified code is shown below:

Forth Assembler< Conventional Assembler
2000 CONSTANT XRAM XRAM EQU >2000
ASM: EX5 DEF EX5
   XRAM R1 () R2 MOV, EX5 MOV @XRASM(R1),R2
   XRAM 22 + R2 () XRAM 26 + R3 () MOV,     MOV XRAM+22@(R2),XRAM+26@(R3)
;ASM     B *NEXT

5.6 -Addressing Mode Words for Special Registers

In order to make addressing modes easier for the W, RP, IP, SP and NEXT registers, the following words are available and eliminate the need to enter the register name separately:

Register Address Workspace Register
Indirect Mode
Workspace Register
Indirect Auto-Increment Mode
Indexed Mode
W *W *W+ @(W)
RP *RP *RP+ @(RP)
IP *IP *IP+ @(IP)
SP *SP *SP+ @(SP)
NEXT *NEXT *NEXT+ @(NEXT)


6 -Structured Assembler Constructs

This assembler also permits the user to write structured (label-less) code. This is done in a manner very similar to the way that Forth implements conditional constructs. The major difference is that rather than taking a value from the stack and using it as a true/false flag, the processor's status register is used to determine whether or not to jump.

Using this technique, it is possible to use constructs such as IF,...ENDIF, and BEGIN,...WHILE,...REPEAT, etc. in assembly code.

The following structured constructs are implemented:

As noted above, these constructs work in the same way as their Forth counterparts, except that they check the processor's status register, rather than take a value from the data stack.

Note:

The three conditional words (IF, UNTIL, and WHILE, ) must each be preceded by one of the jump tokens shown in the next section.


7 -Assembler Jump Tokens

The following assembler jump tokens are defined in the assembler. These jump tokens are used in conjunction with conditional constructs in order to assembler jump instructions, as discussed in the following sections.

Token Comment Instruction Used
EQ True if = JNE
GT True if signed > JGT $+1 JMP
GTE True if signed > or = JLT
HI True if unsigned > JLE
HE True if unsigned > or = JL
LO True if unsigned < JHE
LE True if unsigned < or = JH
LT True if signed < JLT $+1 JMP
LTE True if signed < or = JGT
NC True if no carry JOC
NE True if equal bit not set JEQ
NO True if no overflow JNO $+1 JMP
NP True if not odd parity JOP
OC True if carry bit is set JNC
OO True if overflow JNO
OP True if odd parity JOP $+1 JMP

Conditional constructs IF, UNTIL, and WHILE, must be preceded by a jump token from the following table. This is because IF, UNTIL, and WHILE, actually assemble jump instructions into your code, and the jump token tells the assembler which type of jump instruction to assemble.

The actual jump instruction assembled into your assembly language routine may appear to be counter-intuitive. For example, if the EQ jump token is used, the actual instruction assembled is a JNE (jump if not equal). The reason for this is simply that the assembler writes the code such that the jump is taken if the condition is false:

Forth Assembly Style Actual Assembled Code Comment
R0 R0 MOV, MOV R0,R0 \ check R0
EQ IF, JNE xxx \ jump if R0 <> 0
  R2 INCT, INCT R2 \ else add 2 to R2
ENDIF,    
R1 INC, INC R1 \ then increment R1

As can be seen, if the condition is false the code will jump over the IF block. The ENDIF, construct causes the assembler to calculate the jump offset and complete the assembly of the jump instruction. Constructs can be nested with no problems. The assembler will track and assemble the appropriate jump instructions with the correct offsets automatically.

7.1 -Assembly Example for Structured Constructs

The following example is designed to show how these jump tokens and structured constructs are used:

Forth Assembly Code 9900 Assembly Code Comment
( GENERALISED SHIFTER) * GENERALISED SHIFTER
ASM: SHIFT ( s v – v)       DEF SHIFT \ begin assembly word
   *SP R0 MOV, SHIFT MOV *SP,R0 \ get value to shift from stack into R0
   SP DECT,       DECT SP \ reduce stack pointer
   R0 R0 MOV,       MOV R0,R0 \ check if value is 0
   NE IF,       JEQ L3 \ just exit if yes
      *SP R1 MOV,       MOV *SP,R1 \ get shift count from stack into R1
      GTE IF,       JLT L1 \ if shift count is positive...
         R1 0 SLA,       SLA R1,0 \ … then shift to the left
      ELSE,       JMP L2 \ otherwise...
         R1 0 SRL, L1    SRL R1,0 \ … shift to the right
      ENDIF,    
      R1 *SP MOV, L2    MOV R1,*SP \ move value back to the stack
   ENDIF,    
;ASM L3    B *NEXT \ exit back to TurboForth

Note: The structured words shown above do not check to ensure that the jump target is within range (+127,-128 words).


8 -Test Code

The following routines demonstrate use of the assembler. Each example increases in complexity in terms of exercising the abilities and facilities of the assembler.

8.1 -Add two numbers

The following code takes two numbers from the stack, adds them, and places the result on the stack. Functionally, it is equivalent to the Forth code word + though in practice, the code is slightly different.

Forth Assembly Code Comment
ASM: ADD ( n1 n2 – n1+n2) \ begin definition of assembly language word “ADD”
   *SP+ R0 MOV, \ pop n2 into R0
   *SP R1 MOV, \ get n1 in R1
   R0 R1 A, \ add n1 & n2 (result in R1)
   R1 *SP MOV, \ push result to stack
;ASM \ end definition of assembly language word

Note how the stack is popped (using *SP+) to get n2 into R0, but the stack is not popped in order to get n1. This is because we are going to write a result to stack, so we can write it directly over the top of n1.

To test, simply type:

100 99 ADD .

A more efficient, though less illustrative version of the above might be:

Forth Assembly Code Comment
ASM: ADD ( n1 n2 – n1+n2) \ begin definition of assembly language word “ADD”
   *SP+ R0 MOV, \ pop n2 into R0
   R0 *SP A, \ add n2 to n1 (n1 is on the stack)
;ASM \ end definition of assembly language word

 

8.2 -Fill screen with a character

The following program takes an ASCII code from the stack, and fills the screen with it. The program directly sets the VDP Write Address and writes to the VDP write register. Note the use of the BEGIN, … UNTIL, looping construct, which removes the requirement for a label. Also note the use of the EQ jump token. The code will loop until the EQ bit in the 9900 status register is set. Jump tokens are discussed in section 7, page 11, Assembler Jump Tokens.

Code Comment
$8C02 CONSTANT VDPA \ address of VDP address register
$8C00 CONSTANT VDPW \ address of VDP write register
   
ASM: WIPE ( ascii --)

\ create assembly language word “WIPE”
   R0 CLR, \ screen address 0
   R2 960 LI, \ counter (40 column mode assumed)
   *SP SWPB, \ get ASCII value on stack in high byte
   BEGIN, \ begin a loop
      R0 SWPB, \ get address low byte
      R0 VDPA @@ MOVB, \ write low byte to address register
      R0 SWPB, \ get address high byte
      R0 VDPA @@ MOVB, \ write hi byte to address register
      *SP VDPW @@ MOVB, \ write to VDP write register
      R0 INC, \ move to next address
      R2 DEC, \ decrement counter
   EQ UNTIL, \ repeat loop if R2 is not 0
   SP DECT, \ remove ASCII value from the stack
;ASM \ end definition of assembly language word

To test, type:

65 WIPE

to fill the screen with A's

42 WIPE

to fill the screen with * symbols, etc.

8.3 -Nested Delay

The following code takes a value from the stack, and uses it as part of the outer loop value in a nested delay. Nested delay loops are used to produce long delays.

The BEGIN, …. WHILE, … REPEAT, loop construct in the assembler works in the same way as in high-level Forth: The loop starts at BEGIN,. At this point, a condition should be evaluated, and the result fed to WHILE,. While the condition is true, execution will continue with the code immediately following WHILE, and will loop back to the code immediately after BEGIN, upon encountering REPEAT..

If the condition fed to WHILE, evaluates to false, the code jumps out of the loop, running the code immediately after REPEAT,.

Code Comment
ASM: DELAY ( delay --) \ begin assembly language word “DELAY”
   *SP R0 MOV, \ get loop factor from the stack
   SP DECT, \ reduce stack pointer
   BEGIN, \ begin a loop
      R0 R0 MOV, \ test R0 for 0
      NE WHILE, \ if not equal to zero then proceed, otherwise exit
         R1 $FFFF LI, \ load R1 with >FFFF
      BEGIN, \ begin another loop
         R1 DEC, \ decrement R1...
   EQ UNTIL, \ ...and continue to do so until R1=0
   R0 DEC, \ decrement R0
   REPEAT, \ jump back to the code immediately after BEGIN,
;ASM \ exit back to TurboForth

Note the BEGIN, … UNTIL, loop nested inside the BEGIN, … REPEAT, loop. The assembler compiles the appropriate jump instructions and calculates the jump offsets with no intervention (or thought) required by the programmer.


9 -General Usage Notes

The following notes are offered to assist the Forth Assembly language developer. As will be seen, integrating assembly language sub-routines into Forth applications is no more complex than referencing the assembly language sub-routine by name. Data can also be passed between Forth and assembly via the Forth data stack with ease, as shown.

9.1 -Using Your Own Workspace

It may be desirable to use your own workspace registers for some routines. TurboForth is quite economical in it's use of registers; it uses R3, R4, R5 and R12 for its own use – all other registers are available. However, where this is inconvenient you can use your own workspace by ALLOTing data space within the dictionary, and using this space in your assembly language program:

Forth/Assembly Code Comment
CREATE WORKSPACE \ create an entry in the dictionary
32 CHARS ALLOT \ reserve 32 bytes of dictionary space
ASM: TEST \ create an assembly word called TEST
   WORKSPACE LWPI, \ set CPU register workspace
   ...  
   ...  
   ...  
;ASM  

9.2 -Calling Other Code Words

Due to the ITC (Indirect Threaded Code) system employed in TurboForth (and indeed, most 16-bit Forth systems) it is not possible to call other code words directly from inside another code word. The reason for this is that all words in the Forth dictionary consist of a pointer field, which points to the executable code, followed by the code itself.

For high-level Forth words, the address stored in the pointer field is the address of a routine called DOCOL (do-colon) which pushes the current IP (instruction pointer) address to the return stack, so that code may resume from the current address. This is how Forth words can be essentially endlessly nested.

For machine language/assembly words, the address in the pointer field simply points to the next word in memory, where the machine code is located.

What this means is that if you reference an assembly language word in another word, the pointer address gets assembled into your definition, not the actual address of your executable code.

For example:

In the above example the word DO-WORK calls the word MULTIPLY to perform a multiply function. However, the address of MULTIPLY, assembled into DO-WORK is incorrect, since the pointer address is actually compiled. Assuming MULTIPLY is assembled at >B000, it would actually look like this in memory:

Address Item/Value/Code Comment
>B000 >AF21 Pointer to previous dictionary entry
>B002 >0008 Length of name of word
>B004 M U L T I P L Y The name of the word
>B00C >B00E Pointer to executable code
>B00E MPY R1,R8 The multiply instruction
>B010 B *R11 Return to the calling routine
>B012 B *R12 Assembled by ;ASM

So, when MULTIPLY is referenced in DO-WORK the pointer address (>B00C) not the address of the code gets assembled – clearly wrong.

The solution is not to use BL (branch and link) to 'chain' sub-routines together. Rather, use a high-level Forth word to 'glue' your assembly language sub-routines together. Using this approach, the “pointer problem” is completely avoided, and actually adds a significant benefit, as it allows assembly language words to be nested easily without any additional code, since the Forth system takes care of nesting and de-nesting via the return stack (assembly language programmers will be used to the 'problem' of having to save the contents of R11 if wishing to nest calls via the BL instruction).

Let us look at how the problem could be solved using high-level Forth. In addition, we will expand the code to use our own workspace.

Notice how the call to MULTIPLY has been removed from the assembly language word DO-WORK. Now, the Forth word TEST-ASM 'links' the two assembly routines together. Because they use their own, private (from Forth's perspective) workspace, they can safely use any register, and even leave data behind in a register(s) for other routines to pick up and use. The Test-ASM word demonstrates this by reading from the workspace used by DO-WORK and MULTIPLY, directly reading the 16-bit word representing R9 of the workspace.

Let us look at a more useful example. In this example, we wish to create an assembly routine to fetch a random number from the VDP interrupt timer at >8379 and make it available to us in the TurboForth environment by pushing the value to the data stack. However, we will separate the fetching of the random number, and the pushing to stack into separate routines, since the action of pushing a value to the data stack could be used in multiple places, therefore it makes sense to 'factor' it into its own sub-routine.

First, the routine to get a number from the VDP interrupt timer:

Next, the routine to push a value in R0 to the stack. Since this is a subroutine, we will also document the code:

We can now use high-level Forth to get a random number, using these two routines:

: GET-RND ( – n ) RND PUSH-R0 ;

This can be tested with:

GET-RND .

Caution:

This particular example exposes the programmer to a certain amount of risk. The issue is that data is left in R0 by RND for PUSH-R0 to use later on. In this particular example, there is no risk, since PUSH-R0 is called directly after RND in the Forth word GET-RND. As it happens, the TurboForth inner interpreter does not use R0, so the data stored in R0 remains untouched. However, there is no such guarantee if another resident TurboForth word is called “inbetween” calls to RND and PUSH-R0. For example:

: GET-RND ( - n) RND PAGE PUSH-R0 ;

This example calls PAGE (which clears the screen) before calling PUSH-R0. It so happens that PAGE uses R0, so the data left by RND is overwritten.

Therefore, if your assembly language sub-routines will be “bridged” using high-level Forth colon definitions, and you wish to pass data between them you must either provide a separate workspace for them, or use the data stack.

9.3 -Mixing Forth and Assembly Language in an ASM: Definition

It might not be immediately obvious, but when you are writing assembly language instructions in TurboForth, you are actually writing Forth code. Words such as MOV, INC, DECT, ABS, etc. are all high level Forth colon definitions that simply compile op-codes to the current position in memory.

It is easy to fall into the “trap” of thinking that when you type ASM: to begin an assembly language word that you change into some special 'mode' that leaves the Forth environment behind, and enters an assembly language environment. This actually isn't true at all. You are still “in Forth” with all of the facilities of the Forth system and environment at your disposal.

The stack is heavily used by the assembler, just as it is in Forth. Let us look at an example:

R1 $1234 LI,

Here, R1 is loaded with the value 1234 hex. However, it is not immediately obvious but what happened is this:

What this means is that you can use Forth to enhance your assembly language code, with practically unlimited power at your disposal.

For example, the above code might theoretically be replaced with something like:

R1 GET-VALUE LI,

Where GET-VALUE is a standard Forth colon definition that pushes some value to the stack. In fact, we could use our earlier example in our assembly language code:

R1 GET-RND LI,

Here, the word we previously built to return a random number to the stack is directly used in an assembly language definition. What would actually happen here is that at assembly time (when the LI instruction is physically compiled/assembled to memory) a random number would be chosen (using previously assembled assembly language routines to generate the number) and pushed to the data stack. LI, would then simply use that value and assemble it into the LI instruction.

We could take this a step further. Let us assume we have built a Forth application that relies on a number of assembly routines to do work for us. In those routines, we have used R13 as a holding register (a routine places data in R13 for another routine to access later).

However, at some point, it becomes inconvenient to use R13, and we instead wish to use R14. We can do this in one simple action, by using a CONSTANT:

14 CONSTANT HREG \ holding register is R14
ASM: SOME-THING
   HREG GET-RND LI,
   …
   …
;ASM

Here, instead of referencing R14 throughout our assembly code, we reference HREG instead. Thus, if we want to change the holding register designation, we only have to change the constant, in one place. The assembly language source code remains unchanged.

Of course, you could use extremely complex Forth words in your assembly code that carry out complex calculations, or fetch some data from a disk block, or a disk file – you have the complete functionality of Forth at your disposal.

9.4 -Accessing Forth Strings from Assembly Language

At some point you may wish to access a string from inside your assembly language routine, to read the string and perhaps modify it in some way. Here, we will discuss an assembly language routine that can changes all lower case characters in a string to upper case. The string will be declared “Forth side” using Forth nomenclature and we will call an assembly language routine to do the case conversion for us.

For example, when the code is complete, it will be possible to execute:

S” this is an upper case string!” >UCASE TYPE

and see the output

THIS IS AN UPPER CASE STRING!

First, we need to discuss strings in TurboForth, and indeed Forth in general. Strings are a little strange in Forth, because you can 'execute' them. Indeed, this is how one gains access to strings in Forth. When one “executes” a string, the address, and the length of the string are pushed to the stack. Once we have this information, we can access/manipulate the string in any way.

Type this into TurboForth as an example:

S” Hello, world!”

You will see two values pushed to the stack, which you can examine with .S. We can feed this directly into TYPE and have the string echoed back to us:

S” I AM FORTH” TYPE

Indeed, we can place a string in a colon definition, and it will behave the same way:

: STRING S” Hello Mother!” ;
STRING TYPE

The fact that we get the address of the string, and its length is very useful to us, and we will make use of this in the following assembly language sub-routine.

The following sub-routine expects the address and the length of the string to be on the stack before the routine is called (obviously). It will leave the address and the length on the stack, unchanged, so that other words such as TYPE may immediately access the string after processing. Only characters between ASCII codes 97 (a) and 122 (z) will be modified. Any other characters in the string are left unchanged.

Code Comment
ASM: >UCASE \ declare assembly language word >UCASE
   *SP R0 MOV, \ get the string length from stack into R0
   2 @(SP) R1 MOV, \ get the string address in R1 [MOV @2(SP),R1]
[note: for TF V1.1 use -2 rather than 2]
   R2 CLR, \ we'll use r2 for byte-wise operations, so clear it
   BEGIN, \ begin a loop
      R1 ** R2 MOVB, \ get a character from the string into r2
      R2 CHAR a 256 * CI, \ compare the upper byte to 'a'
      HE IF, \ if higher or equal to 'a' then...
         R2 CHAR z 256 * CI, \ ...compare the upper byte to 'z'
         LE IF, \ if lower or equal to 'z' then we are in range, so...
            R2 -32 256 * AI, \ ...subtract 32 from the value in the upper byte
         ENDIF,  
      ENDIF,  
      R2 R1 *+ MOVB, \ move the byte back to memory
      R0 DEC, \ decrement the length counter
   EQ UNTIL, \ loop back to BEGIN if R0 is not 0
;ASM \ exit back to Forth

Assemble the above code and test with:

S” hello mother! How are you today?” >UCASE TYPE

The above would perhaps look scarcely familiar, even to the experienced assembly language programmer; however, the above code is an excellent example of leveraging the full power of the combination of Forth and assembly code.

As can be seen, we have actually applied high-level programming constructs such as looping, IF and ENDIF to low-level assembly code. Note that there are no labels used in the above code, and no jump instructions (they are compiled into the code for us by the assembler, “behind the scenes”). In addition, we have mixed Forth code into the assembly source code to do some grunt-work for us at assembly time.

Whilst the RPN nature of the code is initially confusing, it actually becomes quite natural quite quickly (this author has been using the assembler less than 24 hours at the time of writing, and is having no difficulty making the mental adjustment), especially if one uses R notation for register names, and uses the Wycove addressing notation, which aligns much better to 9900 assembly than the original TI-Forth addressing mode notation.

9.4.1 -Code Breakdown

We will now examine the above code line-by-line and discuss how it works, and the techniques employed. Standard 9900 assembler is also shown so that one can compare the difference in syntax.

ASM: >UCASE

The above begins a new assembly language definition called >UCASE. In Forth parlance, where a > preceeds a word, it generally means “to”. Thus this word could be called “to upper case”.

*SP R0 MOV, [ MOV *SP,R0 ]

Here we get the contents of the top word of the stack (the length of the string) into register R0. SP is a 'special' register name, and is equivalent to R4. See section 3, TurboForth's Workspace Registers, on page 4 for a description of special register names.

2 @(SP) R1 MOV, [ MOV @2(SP),R1 ]

Here we get the second value on the stack (the value “underneath” the length) which is the address of the string. See section 5.6. Note: In TurboForth V1.1 the stack grows/shrinks in the opposite direction, therefore a value of +2 (rather than -2) should be used. I.e. -2 @(SP)...

R2 CLR, [ CLR R2 ]

Since we will be working with single bytes, we need a register to hold them. Bytes are always operated on in the upper (high) byte of a register, so we zero this register before we start.

BEGIN,

We now begin a loop. This is simply a 'return' point to return to if the exit condition of the loop is not satisfied. In practice, the code loops to the instruction immediately after BEGIN,. BEGIN, itself is a place holder.

R1 ** R2 MOVB, [ MOVB *R1,R2 ]

Here, we read a byte from the string, and place it into the upper byte of R2.

R2 CHAR a 256 * CI, [ CI R2,'a'*256 ]

At this point, we have mixed Forth and assembly together. We want the ASCII code of 'a' (97) in the upper byte, because the character we are examining is in the upper byte. Therefore we actually want 97*256. However, rather than calculate 97*256 (24832 decimal) we ask Forth to work it out for us by using the CHAR command. CHAR pushes the ASCII character of the character immediately following it to the stack. We then multiply it by 256. Note that this calculation takes place at assembly time, not run-time, so what actually gets assembled is CI R2,24832 (or, in hex,
CI R2,>6100)

HE IF, [ JL xxxx ]

If the character is higher or equal to >6100 (which is ASCII 97 in the upper byte) then we enter this IF block. Other wise we skip it completely.

R2 CHAR z 256 * CI, [ CI R2,'z'*256 ]

If we do enter the IF block, then we check to see if the character is less than or equal to a 'z' character. Again, we get Forth to calculate the value of ASCII code 112*256, using CHAR.

We could have instead written it as R2 112 256 * CI, however the “112” doesn't really show any context. By using CHAR z it is much clearer that we are performing a comparison to the 'z' character.

LE IF, [ JH xxxx ]

Here, if the character code is less than or equal to 'z' then we know that our character is in the target a-z range, so we enter the IF block. Otherwise, the character lies outside the range, and we skip the IF block.

R2 -32 256 * AI, [ AI R2,-32*256 ]

If the character is in range, then we subtract 32 from its value to convert it to upper case. Since the value is in the upper byte, we need 32*256. Finally, since there is no “subtract immediate” instruction in 9900 assembly language, we use the AI (add immediate) instruction, so we need to add -32*256 to the value in R2.

ENDIF, [ none ]

This marks the end of the 'inner' IF block (the part that executes if the character is in range). It generates no code, it simply resolves the offset of the JH instruction, allowing the code following the JH instruction to be skipped if the condition is not met.

ENDIF, [ none ]

Again, this generates no code, it simply resolves the offset of the JL instruction (the section that checks for 'a' or greater) allowing the code to jump over if the condition is not met.

R2 R1 *+ MOVB, [ MOVB R2,*R1+ ]

Here we write the character value (which may, or may not have been modified) back to the same address in the buffer. We also increment the buffer pointer address.

R0 DEC, [ DEC R0 ]

We then decrement R0 (length)

EQ UNTIL, [ JNE xxxx ]

When R0 reaches 0 the EQ flag will be set in the status register. The EQ UNTIL, causes the code to loop back (to the instruction following BEGIN,) until the EQ flag is on (i.e. until R0=0)

;ASM [ B *NEXT ]

This ends the assembly language definition and assembles the appropriate exit code (a B* R12) instruction into the definition, which returns back to the Forth system.


10 -TurboForth Assembler Code

The following represents the code for the TurboForth Assembler. The code here is shown starting on block 9, but in practice it can be located anywhere. The code occupies 5 blocks in source-code form, and compiles into 2.7K of object code when loaded.

Block 9

Block 10

Block 11

Block 12

Block 13

Block 14

10.1 -Errors Corrected from the Original Source Code

References to ?EXEC in the original TI-Forth code have been removed as ?EXEC is not used in F83.

There are erroneous [COMPILE] CJMP sequences in IF, ELSE, and UNTIL, which are not required as CJMP is not immediate. These have also been removed.


11 -Conclusion

The ability to write assembly language code directly from within TurboForth, or load and assemble assembly language source code directly from blocks represents a major leap forward in the power and versatility of TurboForth. High level code constructs such as looping and IF...THEN blocks allow one to apply high-level programming paradigms to low level assembly code, removing completely the requirement for error-prone labels.

Further, the ability to mix, completely freely, assembly language mnemonics and high-level Forth code in an assembly definition provides unprecedented power and flexibility. High-level Forth words can be used to add complex macro like functionality to the assembly language programmer.

This author has been truly delighted and empowered by the extra flexibility afforded by the assembler, and hopes that other users of TurboForth will give it a try and 'unleash' the power of the Forth and assembly language.


<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>