<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>
A Stack-based String Library for TurboForth |
Simplifying string handling through the use of a String Stack |
Mark Wills |
2/27/2014 |
[ Author's Note: This paper is adapted from a paper that I wrote describing a string library that I developed for ANS Forth systems. The code presented at the end of this paper has been modified where appropriate for compatibility with the Forth-83 standard, and, specifically, TurboForth V1.2. The original ANS paper can be downloaded as a PDF. ]
AbstractString handling is not one of Forth strong points. Out-of-the-box support for strings is all but non-existent in standard Forth. Whilst the concept of strings does exist in the language, relatively few words are provided to allow effective string manipulation. The normal approach for Forth programmers is to roll one’s own string functions as required. Issues such as heap allocation and de-allocation, and memory fragmentation are thorny issues which are often passed over in preference for a ‘quick-and-dirty’ solution that solves the problem at hand. This paper presents an Forth-83 Forth compliant library which affords the Forth programmer such facilities as string constants, transient strings, and a wide range of string manipulation words. Issues such as memory allocation, memory de-allocation and memory fragmentation are rendered irrelevant through the provision of a string stack, which is used to host and manipulate transient strings. |
The String Library offers two types of strings:
The following coding-style conventions are employed in the library:
Normal Forth stack notation conventions are used. Where words have an effect on the string stack, the string stack effects are shown alongside the normal data stack effects.
For example:
VAL$ ( -- ud ) ( ss: str -- )
The above example indicates that the word VAL$ takes a string from the string stack and results in an unsigned double being pushed to the data stack.
The suggested pronunciation of the word is also given.
The String Stack Library is supplied on the TurboForth Tools disk, in block format, ready to load. Assuming the Tools disk is in DSK1, and, after booting TurboForth, simply type the following:
S" DSK1.TFTOOLS" USE
1 LOAD
To load the menu on the disk, or, simply type 8 LOAD to load the library directly.
Whilst the code presented here is original, the concepts used in it are based on concepts developed by Brian Fox, who developed a string stack library originally for TI-Forth, and also HsForth for DOS, circa 1988. Brian was kind enough to correspond with me on the subject of string stacks, and kindly shared his code. This author extends his sincere thanks to Brian for his generosity.
Since only a handful of words are associated with string constants, they will be documented first:
The word $const declares a string constant. Declared at compile time, string constants require a maximum length and a name. For example:
50 $const welcome
The above example declares a string with a maximum size of 50 characters. It shall be referenced in code using the name welcome.
Note the runtime stack effect. It can be seen that at run-time when the name with the string is referenced it shall push its address to the data stack. The label $Caddr indicates that it is the address of a string constant. String constants push the address of their maximum length field which can be read with the word $maxLen.
Given the address of a string constant on the data stack the word $maxLen returns the maximum allowed string length for that string constant.
For example:
50 $const welcome
welcome maxLen$ .
The above code fragment shall display the value 50.
Given the address of a string constant on the data stack, the word :=” initialises the string constant with the string from the terminal input buffer.
For example:
50 $const welcome
welcome :=" hello and welcome!"
Given the address of a string constant on the data stack the word .$CONST shall display the string.
For example:
50 $const welcome
welcome :=" hello and welcome!"
welcome .$const
Given the address of a string constant on the data stack the word clen$ returns its actual length on the data stack. The word $maxLen can be used to determine the maximum length of a string constant.
For example:
50 $const welcome
welcome :=" hello and welcome!"
welcome clen$ .
The above code displays 18 – the length of the string.
Given the address of a string constant on the data stack the word >$ copies the contents of the string to the string stack where it can be manipulated.
For example:
50 $const welcome
welcome :=" hello and welcome!"
welcome >$
Note that the string stack has received a copy of the string contained within welcome. The string welcome still exists as a string constant.
The convention within this document is to refer to words that exist on the string stack as transient strings. They are referred to as transient strings because they generally only exist for a short time on the string stack. Strings are placed on the string stack (which is separate from the data and returns stacks) and then manipulated in some way before being consumed. Memory allocation and de-allocation is managed by virtue of the strings being on the stack in the same way that the size of the data stack is managed by simply adding or removing values on the data stack.
The word $" takes a string from the terminal input buffer and pushes it to the string stack. The end of the string is indicated by a quotation mark.
For example:
$" Hello, World!"
In this example the string “Hello, world!” is pushed directly to the string stack, thus becoming the top item on the string stack.
Note that $" is a state-smart word. It can be used in both colon definitions and also directly at the command line. The correct action will be taken in either case.
In order that the run-time actions of $" may be compiled into a definition if so desired, the run-time action of this word is encapsulated within the word ($"). Therefore if the run-time behaviour of this word is to be compiled into another word one must compile, or postpone, the word ($").
The word DUP$ duplicates the top item on the string stack.
For example:
$" Hello, World!" DUP$
The string stack now contains two copies of the string.
The word DROP$ removes the topmost string item from the string stack.
For example:
$" Hello, World!"
$" How are you?"
DROP$
At this point the string “Hello, World!” is the topmost string the string stack. “How are you?” was pushed onto the string stack, but it was immediately dropped.
The word SWAP$ swaps the topmost two strings on the string stack.
For example:
$" Hello, World!"
$" How are you?"
At this point the string how are you? is the topmost string on the string stack. If SWAP$ is executed the two strings are exchanged on the string stack.
The word NIP$ removes the string underneath the topmost string from the string stack.
For example:
$" red"
$" blue"
At this point, “blue” is on the top of the string stack, with “red” underneath it.
NIP$
At this point, “red” has been removed from the string stack. “blue” is the topmost string.
The word OVER$ pushes a copy of the string s1 to the top of the string stack, above s2.
For example:
$" red"
$" green"
OVER$
At this point, the string stack contains the following strings:
“red” (the topmost string)
“green”
“red”
The word ROT$ rotates the top three strings to the left. The third string (prior to the execution of ROT$) moves to the top of the string stack.
Note: For ease of implementation, this routine copies (using PICK$) the strings to the top of the string stack in their correct final order, then removes the 3 original strings underneath. Consequently, it is possible to run out of string stack space. If this happens, the condition will be correctly trapped in (set$SP).
The word –ROT$ rotates the top three strings to the right. The top string (prior to the execution of –ROT$) moves to the third position. Note: For ease of implementation, this routine copies (using PICK$) the strings to the top of the string stack in their correct final order, then removes the 3 original strings underneath. Consequently, it is possible to run out of string stack space. If this happens, the condition will be correctly caught in (set$SP).
The word $> takes the topmost string from the string stack and moves it into the string constant who’s address is on the data stack.
For example:
4 $const colour
$" red" colour >$const
At this point, the string constant colour has the value “red”. To verify, display the string using .$CONST as follows:
colour .$CONST
The word +$ replaces the top two strings on the string stack with their concatenated equivalent.
For example:
$" red" $" blue" +$
At this point, “red” and “blue” have been removed from the string stack. The topmost string on the string stack has the value “bluered”. Note that the topmost string goes to the left of the newly concatenated string.
The word len$ returns the length of the topmost string on the string stack.
For example:
$" hello world!" len$ .
Displays the value 12.
The word mid$ produces a sub-string on the string stack, consisting of the characters from the topmost string starting at character start and ending at character end.
For example:
$" redgreenblue" 3 5 mid$
At this point, the topmost two strings on the string stack are as follows:
“green” (the topmost item)
“redgreenblue”
Note, as indicated in the string stack signature, the original string (str1) is retained. Note also that the first character in the string (the leftmost character) is character number 0.
The word left$ pushes the leftmost len characters to the string stack as a new string. The original string is retained.
For example:
$" redgreenblue" 3 left$
The above causes the string “red” to be pushed to the string stack.
The word right$ cause the rightmost len characters to be pushed to the string stack as a new string. The original string is retained.
For example:
$" redgreenblue" 4 right$
The above causes the string “blue” to be pushed to the string stack.
The word findc$ returns the position of the first occurrence of the character char, beginning at the left side of the topmost string, with the search proceeding towards the right. If the character is not found, -1 is returned.
For example:
$" redgreenblue" char b findc$ .
Displays the value 8, as the character b is found in the 8th character position (where the first character is character 0).
The word finds$ searches the second string on the string stack, starting from position start, for the first occurrence of the topmost string and pushes its starting position to the data stack. As a convenience, to make subsequent searches for the same substring easier, both strings are retained on the string stack.
For example:
$" redgreenbluegreen" $" green" 0 find$ .
Displays the value 3, as the substring is found at character position 3 (the leftmost character being character 0). The strings “redgreenbluegreen” and “green” remain on the stack, thus, the second instance of “green” could be found if desired.
The word replace$ searches string s2 for the first occurance of string s3. If it is found:
Example:
$" PURPLE"
$" redgreenblue"
$" green"
replace$
$.s
If the search string (s3) is not found:
The word .$ pops the topmost string from the string stack and displays it.
For example:
$" Hello, World!" .$
The above code displays the string “Hello, World!” on the output device.
The word rev$ replaces the topmost string on the string stack with its reversed equivalent.
For example:
$" green" rev$ .$
The above displays “neerg”.
The word $ltrim removes leading spaces from the topmost string.
For example:
$" hello!" ltrim$ .$
Displays “hello!” with the leading spaces removed.
The word $rtrim removes leading spaces from the topmost string.
For example:
$" hello! " rtrim$ .$
Displays “hello!” with the trailing spaces removed.
The word $trim removes both leading and trailing spaces from the topmost string.
For example:
$" hello! " $trim .$
The above code removes leading and trailing spaces and displays the string.
The word $ucase converts all lower case characters in the topmost string to upper case.
For example:
$" hello world! 1234" ucase$ .$
The above displays “HELLO WORLD! 1234”
The word lcase$ converts all upper case characters in the topmost string to lower case.
For example:
$" HELLO WORLD! 1234" lcase$ .$
The above displays “hello world! 1234”
The word ==$ performs a case-sensitive comparison of the topmost two strings on the string stack and returns true if both their length and content is identical. If the lengths or the contents differ, false is returned. The strings are retained.
For example:
$" hello" $" HELLO" ==$? .
Displays 0 (false) since the strings are different (the comparison is case sensitive).
$" hello" $" hello" ==$? .
Displays -1 (true) since the strings are identical.
$" hello" $" hell" ==$? .
Displays false, since their lengths differ.
A case in-sensitive comparison can easily be built as follows:
: same$? ( -- flag ) ( ss: s1 s2 -- s1 s2 )
over$ over$ lcase$ swap$ lcase$ ==$? drop$ drop$ ;
The above code creates copies of s1 and s2 (using over$) then converts them both to lower case. ==$ then compares the strings placing the appropriate flag on the data stack. Finally, the lower-case versions of s1 and s2 are removed from the string stack, thus s1 and s2 are retained, un-changed.
Given the index of a string on the string stack, copy the indexed string to the top of the string stack. 0 $pick is equivalent to DUP$, 1 $pick is equivalent to OVER$ etc.
For example:
$" blue"
$" green"
$" red"
2 pick$
The above causes the string “blue” to be copied to the top of the string stack.
The word VAL$ interprets the topmost string on the string stack as a number, and returns it on the data stack as an integer. An error occurs if the string cannot be represented as a number.
Note that a double value can be returned by pre-pending the number within the string with a period.
Example:
$" 9900" VAL$ .
$" .9900" VAL$ .
The word $.s displays a non-destructive string stack dump to the output device. The length of each string is given, along with the total number of strings on the string stack. The amount of space allocated to the string stack, the amount of space in use, and the amount of free space is also reported.
Returns the current depth of the string stack, with 0 meaning the string stack is empty.
Resets (i.e. empties) the string stack.
The string stack is ALLOTED from dictionary space. The constant ($sSize) determines the amount of space reserved.
Error checking is included in all words that could cause a string stack under or overflow condition. In the event that an under or overflow is detected, the code aborts with an error message.
Other words such as DUP$ also perform checks. For example, DUP$check that there is at least one item on the string stack. SWAP$ checks that there are at least two items on the string stack, etc.
The string stack grows from higher memory addresses to lower memory addresses.
The format of the strings on the string stack is very simple, as follows:
Actual length (1 cell) |
String payload (1 char=1 byte) |
String Constants have the same format, but are preceded by a maximum length cell in order to check that a requested string can be accommodated within the string constant:
Maximum length |
Actual length |
String payload |
The words in the library perform sanity checks on input parameters where necessary. In particular, the string stack, being statically ALLOTed from dictionary space, is carefully guarded, since the string stack is very likely to have code and/or data on either side of it, resulting in catastrophic software failure in the event of a string stack under or over flow. Where errors are detected, the library throws the following THROW codes:
It should be noted that this author has not checked that the THROW codes listed here are used in other systems or libraries elsewhere.
Throw Code |
Nature of Error |
Thrown By |
9900 |
String stack underflow |
(SETS$P) |
9901 |
String too large to assign |
|
9902 |
String stack is empty |
PICK$ DUP$ LEN$ >$CONST MID$ LEFT$ RIGHT$ FINDC$ .$ REV$ LTRIM$ RTRIM$ UCASE$ LCASE$ DROP$ |
9903 |
Need at least 2 strings on string stack |
|
9904 |
String too large for string constant |
|
9905 |
Illegal LEN value |
|
9906 |
Need at least 3 strings on string stack |
|
9907 |
String is not a legal number |
The following environmental dependencies are declared:
Word |
ANS Library |
ANS Reference |
Referenced In |
-ROT | None ANS. Defined as follows: |
:=" | |
.R | Core Ext |
$.S | |
HERE | Core |
SWAP$ +$ REV$ LTRIM$ REPLACE$ | |
PARSE | Core Ext |
:=" $" | |
PICK | Core Ext |
FINDC$ | |
WITHIN | Core Ext |
UCASE$ LCASE$ |
The library was developed by Mark Wills in February 2014. The code is hereby released to the public domain. The author can be contacted by email via: markwills1970@gmail.com. Please also see the aknowledgements section for further attribution information.
The source code for the string library is presented below.
Please also note the following:
27th of February 2014
Updated 20th April 2021 - Corrected stack comments in FINDC$ and FIND$.
<< Home | About Forth | About TurboForth | Download | Language Reference | Resources | Tutorials | YouTube >>