<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>


A Stack-based String Library for TurboForth

Simplifying string handling through the use of a String Stack

 

Mark Wills

2/27/2014

[ Author's Note: This paper is adapted from a paper that I wrote describing a string library that I developed for ANS Forth systems. The code presented at the end of this paper has been modified where appropriate for compatibility with the Forth-83 standard, and, specifically, TurboForth V1.2. The original ANS paper can be downloaded as a PDF. ]

Abstract

String handling is not one of Forth strong points. Out-of-the-box support for strings is all but non-existent in standard Forth. Whilst the concept of strings does exist in the language, relatively few words are provided to allow effective string manipulation. The normal approach for Forth programmers is to roll one’s own string functions as required. Issues such as heap allocation and de-allocation, and memory fragmentation are thorny issues which are often passed over in preference for a ‘quick-and-dirty’ solution that solves the problem at hand. This paper presents an Forth-83 Forth compliant library which affords the Forth programmer such facilities as string constants, transient strings, and a wide range of string manipulation words. Issues such as memory allocation, memory de-allocation and memory fragmentation are rendered irrelevant through the provision of a string stack, which is used to host and manipulate transient strings.



Table of Contents


Introduction – The Concepts behind the Library

The String Library offers two types of strings:

Coding Conventions

The following coding-style conventions are employed in the library:

Stack Notation

Normal Forth stack notation conventions are used. Where words have an effect on the string stack, the string stack effects are shown alongside the normal data stack effects.

For example:

            VAL$ ( -- ud ) ( ss: str -- )

The above example indicates that the word VAL$ takes a string from the string stack and results in an unsigned double being pushed to the data stack.

The suggested pronunciation of the word is also given.

String Stack Library Download

The String Stack Library is supplied on the TurboForth Tools disk, in block format, ready to load. Assuming the Tools disk is in DSK1, and, after booting TurboForth, simply type the following:

S" DSK1.TFTOOLS" USE
1 LOAD

To load the menu on the disk, or, simply type 8 LOAD to load the library directly.

Aknowledgements

Whilst the code presented here is original, the concepts used in it are based on concepts developed by Brian Fox, who developed a string stack library originally for TI-Forth, and also HsForth for DOS, circa 1988. Brian was kind enough to correspond with me on the subject of string stacks, and kindly shared his code. This author extends his sincere thanks to Brian for his generosity.


String Constant Words

Since only a handful of words are associated with string constants, they will be documented first:

 

$CONST  ( max_len tib:"name" -- ) ( runtime: -- $Caddr)  “string constant”

The word $const declares a string constant. Declared at compile time, string constants require a maximum length and a name. For example:

50 $const welcome

The above example declares a string with a maximum size of 50 characters. It shall be referenced in code using the name welcome.

Note the runtime stack effect. It can be seen that at run-time when the name with the string is referenced it shall push its address to the data stack. The label $Caddr indicates that it is the address of a string constant. String constants push the address of their maximum length field which can be read with the word $maxLen.

 

MAXLEN$ ( $Caddr -- max_len )  “maximum length of string”

Given the address of a string constant on the data stack the word $maxLen returns the maximum allowed string length for that string constant.

For example:

50 $const welcome
welcome maxLen$ .

The above code fragment shall display the value 50.

 

:=" ( $Caddr tib:"string" -- ) “assign string constant”

Given the address of a string constant on the data stack, the word :=” initialises the string constant with the string from the terminal input buffer.

For example:

50 $const welcome
welcome :=" hello and welcome!"

 

.$CONST ( $Caddr -- )  “display string constant”

Given the address of a string constant on the data stack the word .$CONST shall display the string.

For example:

50 $const welcome
welcome :=" hello and welcome!"
welcome .$const

 

CLEN$ ( $Caddr – len )  “string constant length”

Given the address of a string constant on the data stack the word clen$ returns its actual length on the data stack. The word $maxLen can be used to determine the maximum length of a string constant.

For example:

50 $const welcome
welcome :=" hello and welcome!"
welcome clen$ .

The above code displays 18 – the length of the string.

 

>$ ( $Caddr -- ) ( ss: -- str)  “to string stack”

Given the address of a string constant on the data stack the word >$ copies the contents of the string to the string stack where it can be manipulated.

For example:

50 $const welcome
welcome :=" hello and welcome!"
welcome >$

Note that the string stack has received a copy of the string contained within welcome. The string welcome still exists as a string constant.


String Stack Words

The convention within this document is to refer to words that exist on the string stack as transient strings. They are referred to as transient strings because they generally only exist for a short time on the string stack. Strings are placed on the string stack (which is separate from the data and returns stacks) and then manipulated in some way before being consumed. Memory allocation and de-allocation is managed by virtue of the strings being on the stack in the same way that the size of the data stack is managed by simply adding or removing values on the data stack.

$" ( tib:"string" -- ) ( ss: -- str)  “string to string stack”

The word $" takes a string from the terminal input buffer and pushes it to the string stack. The end of the string is indicated by a quotation mark.

For example:

$" Hello, World!"

In this example the string “Hello, world!” is pushed directly to the string stack, thus becoming the top item on the string stack.

Note that $" is a state-smart word. It can be used in both colon definitions and also directly at the command line. The correct action will be taken in either case.

In order that the run-time actions of $" may be compiled into a definition if so desired, the run-time action of this word is encapsulated within the word ($"). Therefore if the run-time behaviour of this word is to be compiled into another word one must compile, or postpone, the word ($").

 

DUP$ ( -- ) ( ss: s1 -- s1 s1)  “duplicate string”

The word DUP$ duplicates the top item on the string stack.

For example:

$" Hello, World!" DUP$

The string stack now contains two copies of the string.

 

DROP$ ( -- ) ( ss: str -- )  “drop string”

The word DROP$ removes the topmost string item from the string stack.

For example:

$" Hello, World!"
$" How are you?"
DROP$

At this point the string “Hello, World!” is the topmost string the string stack. “How are you?” was pushed onto the string stack, but it was immediately dropped.

 

SWAP$ ( -- ) ( ss: s1 s2 -- s2 s1)  “swap string”

The word SWAP$ swaps the topmost two strings on the string stack.

For example:

$" Hello, World!"
$" How are you?"

At this point the string how are you? is the topmost string on the string stack. If SWAP$ is executed the two strings are exchanged on the string stack.

 

NIP$ ( -- ) ( ss: s1 s2 -- s2)  “nip string”

The word NIP$ removes the string underneath the topmost string from the string stack.

For example:

$" red"
$" blue"

At this point, “blue” is on the top of the string stack, with “red” underneath it.

NIP$

At this point, “red” has been removed from the string stack. “blue” is the topmost string.

 

OVER$ ( -- ) ( ss: s1 s2 – s1 s2 s1 )  “over string”

The word OVER$ pushes a copy of the string s1 to the top of the string stack, above s2.

For example:

$" red"
$" green"
OVER$

At this point, the string stack contains the following strings:

“red” (the topmost string)
“green”
“red”

 

ROT$ ( -- ) ( ss: s3 s2 s1 -- s2 s1 s3) \ "rotate strings"

The word ROT$ rotates the top three strings to the left. The third string (prior to the execution of ROT$) moves to the top of the string stack.

Note: For ease of implementation, this routine copies (using PICK$) the strings to the top of the string stack in their correct final order, then removes the 3 original strings underneath. Consequently, it is possible to run out of string stack space. If this happens, the condition will be correctly trapped in (set$SP).

 

-ROT$ ( -- ) ( ss: s3 s2 s1 -- s1 s3 s2) \ "rotate strings"

The word –ROT$ rotates the top three strings to the right. The top string (prior to the execution of –ROT$) moves to the third position. Note: For ease of implementation, this routine copies (using PICK$) the strings to the top of the string stack in their correct final order, then removes the 3 original strings underneath. Consequently, it is possible to run out of string stack space. If this happens, the condition will be correctly caught in (set$SP).

 

>$CONST  ( $Caddr -- ) ( ss: str -- )  “to string constant”

The word $> takes the topmost string from the string stack and moves it into the string constant who’s address is on the data stack.

For example:

4 $const colour
$" red" colour >$const

At this point, the string constant colour has the value “red”. To verify, display the string using .$CONST as follows:

colour .$CONST

 

+$ ( -- ) ( ss: str1 str2 – str2&str1 )  ”concatenate strings”

The word +$ replaces the top two strings on the string stack with their concatenated equivalent.

For example:

$" red"  $" blue"  +$

At this point, “red” and “blue” have been removed from the string stack. The topmost string on the string stack has the value “bluered”. Note that the topmost string goes to the left of the newly concatenated string.

 

LEN$ ( -- len ) ( ss: -- )  “length of string”

The word len$ returns the length of the topmost string on the string stack.

For example:

$" hello world!"  len$ .

Displays the value 12.

 

MID$ ( start end -- ) ( ss: str1 – str1 str2 )  “mid-string”

The word mid$ produces a sub-string on the string stack, consisting of the characters from the topmost string starting at character start and ending at character end.

For example:

$" redgreenblue"  3 5 mid$

At this point, the topmost two strings on the string stack are as follows:

“green” (the topmost item)
“redgreenblue”

Note, as indicated in the string stack signature, the original string (str1) is retained. Note also that the first character in the string (the leftmost character) is character number 0.

 

LEFT$ ( len -- ) ( ss: str1 – str2 )  “left of string”

The word left$ pushes the leftmost len characters to the string stack as a new string. The original string is retained.

For example:

$" redgreenblue" 3 left$

The above causes the string “red” to be pushed to the string stack.

RIGHT$ ( len -- ) ( ss: str1 – str1 str2 )  “right of string”

The word right$ cause the rightmost len characters to be pushed to the string stack as a new string. The original string is retained.

For example:

$" redgreenblue" 4 right$

The above causes the string “blue” to be pushed to the string stack.

 

FINDC$ ( char – pos|-1) ( ss: -- )  “find character in string”

The word findc$ returns the position of the first occurrence of the character char, beginning at the left side of the topmost string, with the search proceeding towards the right. If the character is not found, -1 is returned.

For example:

$" redgreenblue" char b  findc$ .

Displays the value 8, as the character b is found in the 8th character position (where the first character is character 0).

 

FIND$ ( start – pos|-1) ( ss: – )  “find string in string”

The word finds$ searches the second string on the string stack, starting from position start, for the first occurrence of the topmost string and pushes its starting position to the data stack. As a convenience, to make subsequent searches for the same substring easier, both strings are retained on the string stack.

For example:

$" redgreenbluegreen" $" green" 0 find$ .

Displays the value 3, as the substring is found at character position 3 (the leftmost character being character 0). The strings “redgreenbluegreen” and “green” remain on the stack, thus, the second instance of “green” could be found if desired.

 

REPLACE$ ( -- pos ) ( found: ss: s1 s2 s3 -- s4  not found: s1 s2 -- s1 s2)  “replace string”

The word replace$ searches string s2 for the first occurance of string s3. If it is found:

Example:

$" PURPLE"
$" redgreenblue"
$" green"
replace$
$.s

If the search string (s3) is not found:

 

.$ ( -- ) ( ss: str – )  “display string”

The word .$ pops the topmost string from the string stack and displays it.

For example:

$" Hello, World!" .$

The above code displays the string “Hello, World!” on the output device.

 

REV$ ( -- ) ( ss: s1 – s2 )  “reverse string”

The word rev$ replaces the topmost string on the string stack with its reversed equivalent.

For example:

$" green" rev$ .$

The above displays “neerg”.

 

LTRIM$ ( -- ) ( ss: str1 – str2 )  “trim left of string”

The word $ltrim removes leading spaces from the topmost string.

For example:

$"        hello!" ltrim$ .$

Displays “hello!” with the leading spaces removed.

 

RTRIM$ ( -- ) ( ss: str1 – str2 )  “trim right of string”

The word $rtrim removes leading spaces from the topmost string.

For example:

$" hello!      " rtrim$ .$

Displays “hello!” with the trailing spaces removed.

 

TRIM$ ( -- ) ( ss: str1 – str2 )  “trim string”

The word $trim removes both leading and trailing spaces from the topmost string.

For example:

$"         hello!      " $trim .$

The above code removes leading and trailing spaces and displays the string.

 

UCASE$ ( -- ) ( ss: str1 – str2 )  “convert to upper case”

The word $ucase converts all lower case characters in the topmost string to upper case.

For example:

$" hello world! 1234" ucase$ .$

The above displays “HELLO WORLD! 1234”

 

LCASE$ ( -- ) ( ss: str1 – str2 )  “convert to lower case”

The word lcase$ converts all upper case characters in the topmost string to lower case.

For example:

$" HELLO WORLD! 1234" lcase$ .$

The above displays “hello world! 1234”

 

==$? ( -- flag ) ( ss: -- )  “is equal to string?”

The word ==$ performs a case-sensitive comparison of the topmost two strings on the string stack and returns true if both their length and content is identical. If the lengths or the contents differ, false is returned. The strings are retained.

For example:

$" hello" $" HELLO" ==$? .

Displays 0 (false) since the strings are different (the comparison is case sensitive).

$" hello" $" hello" ==$? .

Displays -1 (true) since the strings are identical.

$" hello" $" hell" ==$? .

Displays false, since their lengths differ.

A case in-sensitive comparison can easily be built as follows:

: same$? ( -- flag ) ( ss: s1 s2 -- s1 s2 )
     over$ over$ lcase$ swap$ lcase$ ==$? drop$ drop$ ;

The above code creates copies of s1 and s2 (using over$) then converts them both to lower case. ==$ then compares the strings placing the appropriate flag on the data stack. Finally, the lower-case versions of s1 and s2 are removed from the string stack, thus s1 and s2 are retained, un-changed.  

 

PICK$ ( index -- ) ( ss: -- str )  “pick string”

Given the index of a string on the string stack, copy the indexed string to the top of the string stack.   0 $pick is equivalent to DUP$1 $pick is equivalent to OVER$ etc.

For example:

$" blue"
$" green"
$" red"
2 pick$

The above causes the string “blue” to be copied to the top of the string stack.

 

VAL$ ( -- n ) ( ss: str -- )

The word VAL$ interprets the topmost string on the string stack as a number, and returns it on the data stack as an integer. An error occurs if the string cannot be represented as a number.

Note that a double value can be returned by pre-pending the number within the string with a period.

Example:

$" 9900" VAL$ .
$" .9900" VAL$ .

 

$.S ( -- ) ( ss: -- )

The word $.s displays a non-destructive string stack dump to the output device. The length of each string is given, along with the total number of strings on the string stack. The amount of space allocated to the string stack, the amount of space in use, and the amount of free space is also reported.

 

DEPTH$ ( -- n ) ( ss: -- )

Returns the current depth of the string stack, with 0 meaning the string stack is empty.

 

RESET$ ( -- ) ( ss: -- )

Resets (i.e. empties) the string stack.



The String Stack

The string stack is ALLOTED from dictionary space. The constant ($sSize) determines the amount of space reserved.

Error Checking

Error checking is included in all words that could cause a string stack under or overflow condition. In the event that an under or overflow is detected, the code aborts with an error message.

Other words such as DUP$ also perform checks. For example, DUP$check that there is at least one item on the string stack. SWAP$ checks that there are at least two items on the string stack, etc.

String Stack Format

The string stack grows from higher memory addresses to lower memory addresses.

The format of the strings on the string stack is very simple, as follows:

Actual length (1 cell)

String payload (1 char=1 byte)

String Constant Format

String Constants have the same format, but are preceded by a maximum length cell in order to check that a requested string can be accommodated within the string constant:

Maximum length
(1 cell)

Actual length
(1 cell)

String payload
(1 char=1 byte)



Throw Codes

The words in the library perform sanity checks on input parameters where necessary. In particular, the string stack, being statically ALLOTed from dictionary space, is carefully guarded, since the string stack is very likely to have code and/or data on either side of it, resulting in catastrophic software failure in the event of a string stack under or over flow. Where errors are detected, the library throws the following THROW codes:

It should be noted that this author has not checked that the THROW codes listed here are used in other systems or libraries elsewhere.

Throw Code

Nature of Error

Thrown By

9900

String stack underflow

(SETS$P)

9901

String too large to assign

:="

9902

String stack is empty

PICK$   DUP$   LEN$ >$CONST   MID$   LEFT$   RIGHT$   FINDC$   .$   REV$   LTRIM$   RTRIM$   UCASE$   LCASE$ DROP$

9903

Need at least 2 strings on string stack

SWAP$   NIP$   OVER$   +$   FIND$   ==$?  

9904

String too large for string constant

>$CONST

9905

Illegal LEN value

MID$   LEFT$   RIGHT$  

9906

Need at least 3 strings on string stack

ROT$  -ROT$   REPLACE$  

9907

String is not a legal number

VAL$


Dependencies

The following environmental dependencies are declared:

Word

ANS Library

ANS Reference

Referenced In

-ROT

None ANS. Defined as follows:

: -ROT ( a b c – c b a )
  ROT ROT ;

:=" 
.R
Core Ext
$.S
HERE
Core
SWAP$   +$   REV$   LTRIM$   REPLACE$
PARSE
Core Ext
:="   $"
PICK
Core Ext
FINDC$
WITHIN
Core Ext
UCASE$   LCASE$

Author Information

The library was developed by Mark Wills in February 2014. The code is hereby released to the public domain. The author can be contacted by email via: markwills1970@gmail.com. Please also see the aknowledgements section for further attribution information.


Portable String Library Source Code

The source code for the string library is presented below.

Please also note the following:

27th of February 2014


<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>