Table of Contents | The Intrinsics > tads-gen Function Set
Prev: t3vm Function Set     Next: Regular Expressions    

tads-gen Function Set

The tads-gen function set provides general utility and data manipulation functions. These functions have no user interface component.

To use the tads-gen function set in a program, you should \#include either \<tadsgen.h\> or \<tads.h\> (the latter includes both \<tadsio.h\> and \<tadsgen.h\>, for the full set of TADS intrinsics). If you’re using the adv3 library, you can simply \#include \<adv3.h\>, since that automatically incluedes the basic system headers.

tads-gen functions

abs(*val*)

Returns the absolute value of the given number. val can be an integer or a BigNumber value; the result has the same type as val. If val is positive or zero, the result is val; if val is negative, the result is -val. For example, abs(-3) returns 3.

concat(...)

Returns a string with the concatenation of the argument values, in the order given. The arguments can be strings, or any other types that can be converted to strings. If there are no arguments, the result is an empty string.

Non-string values are converted to strings automatically. The automatic conversion rules are similar to toString, except that nil values are treated as empty strings.

This function is essentially the same as concatenating the values with the + operator, but it’s more efficient when combining three or more values. The + operator is applied successively to one pair of values at a time, so it has to create an intermediate result string at each step, and copy that intermediate result at the next step; the function, in contrast, creates a single result string for the entire list and only copies each input string once.

dataType(*val*)

Returns the datatype of the given value. The return value is one of the TypeXxx values (see the section on reflection).

firstObj(*cls*?, *flags*?)

Returns the first object of class cls, or the first object in the entire program if cls is not specified. This is used to start iterating over the set of all instances of a given class; use nextObj() to continue the iteration. The order in which objects are enumerated by firstObj() and nextObj() is arbitrary, but each object will be returned exactly once. Returns nil if there are no objects of the given class.

If the flags argument is specified, it can a combination (with the \| operator) of the following bit flags:

If the flags argument is omitted, only instances are enumerated, as though ObjInstances had been specified.

A note on garbage collection: Retrieving objects with firstObj() and nextObj() can have the sometimes surprising effect of “resurrecting” objects that aren’t reachable in any other way. Normally, when the last reference to an object is removed, there’s no way for your program to ever reach that object again, so in effect the object is dead - ready to be deleted by the garbage collector. However, the collection process only happens intermittently; between passes, dead objects linger in memory, waiting for the collector to remove them. Between garbage collection passes, TADS doesn’t know whether an object is reachable or not, since it’s fairly time-consuming to figure this out - that’s really the main work the garbage collector does when it runs. As a result, firstObj() and nextObj() simply visit every object currently in memory, without trying to determine which are still reachable. If it’s important to your program’s logic that you visit only reachable objects, you can call t3RunGC() just before starting your firstObj()-nextObj() loop, to ensure that objects that are unreachable at the start of the loop are removed from memory. Of course, even this won’t absolutely guarantee that nextObj() won’t return any unreachable objects, since more objects could become unreachable in the course of the loop itself, but it will at least ensure that objects that were already dead don’t turn up.

Resurrecting an unreachable object with firstObj() or nextObj() is harmless as far as the TADS VM is concerned, so if your program logic doesn’t have a problem with finding objects that you thought you had removed from the game, you don’t have to worry about calling t3RunGC() before object loops. The garbage collector will only delete an object if it’s unreachable at the moment the collector runs; the fact that an object was unreachable for some period in the past doesn’t matter, and in fact the garbage collector won’t even know, because reachability is only determined during the collection process. If you resurrect an object, and then the collector runs while you’re still holding a reference to the resurrected object, the collector will see your reference and will keep the object in memory just like any other reachable object.

getArg(*idx*)

Retrieve the given argument to the current function. The first argument is at index 1. idx must be in the range 1 to argcount, or the function throws a run-time error (“bad value for built-in function”).

getFuncParams(*funcptr*)

Returns information on the parameters taken by the given function. The return value is a list with three elements:

The second element gives the number of optional arguments; this element is always zero, because there’s no way for an ordinary function (non-intrinsic) to specify optional arguments. This element is included in the list specifically so that the list uses the same format as the Object.getPropParams() method.

If the third element is true, it indicates that the function was defined with the ... varying argument list notation.

getTime(*timeType*?)

Returns the current system time, according to timeType. If timeType is not specified, GetTimeDateAndTime is the default. The valid timeType values are:

makeList(*val*, *repeatCount*?)

Constructs a list by repeating the given value the given number of times. Returns the new list.

(For a more flexible way of constructing a list that allows for varying element values, see the List.generate method.)

makeString(*val*, *repeatCount*?)

Constructs a string by repeating the given value the given number of times. The result of the function depends on the data type of val:

If repeatCount is not specified, the default is 1. If the value is less than zero, an error is thrown.

max(*val1*, ...)

Returns the least argument value. The values must be of comparable types.

min(*val1*, ...)

Returns the greatest argument value. The values must be of types that can be compared with one another, or the function throws an error.

nextObj(*obj*, *cls*?, *flags*?)

Get the next object after obj of class cls. If cls is not specified, returns the next object of any type. Returns nil if obj is the last object of class cls. This is used (with firstObj()) to iterate over all objects of a given class, or over all objects. The order in which these functions enumerate objects is arbitrary, but each object will be returned exactly once. The flags argument has the same meaning as it does in firstObj().

Note that nextObj() can return objects that aren’t otherwise reachable, because objects that become unreachable aren’t removed from memory until the garbage collector runs, which only happens intermittently. For more on this, see firstObj().

rand(*x*, ...)

Returns a pseudo-random number, or randomly selects a value from a set of values.

In all cases, rand() chooses numbers that are uniformly distributed over the relevant range, which means that each value in the range has equal probability.

rand() uses a pseudo-random number generator algorithm that you can select via randomize(). The default is a generator called ISAAC that’s designed for general-purpose and cryptographic use.

Pseudo-random numbers aren’t truly random. They come from a mathematical formula that generates numbers that look random, in the sense that the results pass various statistical tests of randomness. Each RNG algorithm that TADS provides has a deterministic formula, which means that you’ll always get the same series of output values for a given starting state for the generator. If you want a different series of numbers each time the program runs (as you probably do), you have to randomize the starting state so that it’s different on each run. As of TADS 3.1, the interpreter does this automatically by default at launch. TADS asks the operating system for random initial “seed” data, and uses this to initialize the RNG. Most modern systems have sources of true entropy for this purpose.

In some cases you might actually want the same sequence of numbers on every run; for example, when running regression tests, you need a reproducible sequence of events that plays out exactly the same way each time. You can prevent the interpreter from automatically randomizing the RNG state by using the “-norand” option when starting the interpreter. You can alternatively use the randomize() function within the program to set a fixed seed value.

randomize(...)

Initialize the random number generator (RNG). This selects the random number generator algorithm used by rand(), initializes it with a “seed” value (fixed or randomly chosen), and can be used to save and restore the state of the RNG so that rand() result sequences can be repeated as needed.

Starting in TADS 3.1, the interpreter automatically makes a call to randomize() (the no-arguments version) when it starts up, unless the user specifies the “-norand” option when launching the interpreter. For most programs, this means that you’ll never have to make your own call to randomize(); you can just call rand() when you need random numbers.

This function performs several tasks, depending on how you invoke it:

TADS provides several different RNG algorithms. Each RNG has different properties, so some applications might have reasons to prefer a particular algorithm. For general purposes, any of them should produce good results. The RNG algorithms are identified by RNG_xxx constants, which you specify as the id parameter in calls to randomize():

Most programs that use random numbers want truly unpredictable numbers. That is, numbers that (a) have a statistically random distribution, with no discernible patterns, and (b) are different every time the program runs. rand() fulfills part (a): it uses a formula to generate a series of numbers that are statistically distributed in a random fashion (for example, so that 1 occurs as often as 2 or 3 or 1000 or any other number, so that any given sequence of 2, 3, or more numbers is equally likely, and so on. (Mathematicians have several much more formal tests that RNGs must satisfy to be considered random.) randomize() fulfills part (b), which is to ensure that the sequence of numbers is different every time you run the program. This is important because by itself, rand() is deterministic: it uses a fixed mathematical formula, so given the same initial conditions, it’ll always crank out the same sequence of numbers. So the trick is to randomize the initial conditions - and of course we can’t just turn to rand() for help, since it’s the thing we’re trying to randomize!

This is where the “seed” values come in. randomize() and randomize(id, nil) ask the operating system for truly random data to use for the initial conditions. The degree of entropy in this OS seed data varies by system; some systems have better entropy sources than others. But whatever the source, the seed data should be different each time you run the program when you use this option. randomize() feeds the OS seed data into the RNG to set its initial conditions, so each time you run, rand() will be starting from a different initial state. This makes for a different series of numbers from rand() on each run.

Note that it’s not necessary, or desirable, to call randomize() every time you want a random number. Once you seed the RNG, you should use rand() to generate random numbers. Given that we supposedly have this source of true randomness from the operating system, you might wonder why we shouldn’t use it every time we need a random number, and dispense with the formulaic RNGs. There are a couple of reasons. One is that the OS sources tend to be slow, since they can involve things like hardware device interaction and scans of large amounts of memory. RNGs are fast. Another problem with OS sources of randomness is that they don’t always change quickly - they’re designed to provide high entropy when called infrequently, like once per program session, but obvious patterns might emerge if you relied on them for many random numbers over a short period.

The fixed seed values, with randomize(id, val), are a little different. Rather than making the RNG produce different sequences on each run, a fixed seed makes rand() generate the same series of numbers every time. The numbers will still be statistically random, so they’ll look random to an observer, but the same sequence will be returned each time you run the program. (The sequence is a function of the seed value. You’ll get a different sequence for each different seed value.)

Why would you want a fixed series of random-looking numbers? You’d want this any time you need repeatability, but you still want the appearance of randomness. One common situation where this arises is regression testing, in which you run the program and compare its output to a reference version that you know is correct. If there are no differences, you know that changes you’ve made to the program since the last test didn’t break anything in the test script. Randomness makes this kind of testing difficult, because it makes the output intentionally different on each run; you can’t do a simple file comparison of the new and old output because they’ll always differ, whether or not the program is still working correctly. Fixed seeds offer a solution. Using a fixed seed, you can still exercise the program’s random behavior, but the sequence of random behavior will repeat on every run, so you can do regression testing after all. What’s more, the only thing that you have to change to switch between testing mode and release mode is the single call to randomize(), so the rest of the code can be identical in the two modes.

When you specify a fixed seed value, you should use an integer for the LCG algorithm, and a string for ISAAC or Mersenne Twister. The latter algorithms have large internal state vectors, so they can accept large seed data values. The LCG’s entire internal state is a single integer value, so there’s no point in specifying more seed data than an integer. The actual string values you use for ISAAC or MT seeds aren’t important; the algorithms should produce good sequences from any seed data.

restartGame()

Resets all objects (except transient objects) to their initial state, as they were when the program was just loaded. This function doesn’t affect transient objects.

restoreGame(*filename*)

Restore the saved state from a file. filename is the name of the file to restore; this can be a string with the name of a file in the local file system, a FileName object, or a TemporaryFile object.

All objects, except transient objects, are restored to the state they had when the state was saved to the given file.

If an error occurs, the function throws a run-time error. The errno_ property of the RuntimeError exception object gives a VM error code describing the problem; the possible errors are:

Starting in 3.1.1, the file safety settings must allow read access to the target file. FileName objects obtained from inputFile() “open” dialogs are always accessible.

rexGroup(*groupNum*)

Returns information on the text that matched a parenthesized group for the last regular expression search or match. groupNum is the number of the parenthesized group for which to retrieve the information. Groups are numbered according to the order of appearance of the left parenthesis of each group, starting from group number 1. The special group number 0 contains the entire match.

Only ordinary “capturing” groups are counted in the numbering scheme. Assertions and non-capturing groups aren’t counted.

The return value is nil if groupNum is higher than the number of groups in the regular expression, or if there was no match for the group. If there’s a match for the group, the return value is a three-element list: the first element (at index \[1\]) is the character index of the group match within the original source string; the second element is the length in characters of the group match; and third element is a string giving the matching text.

rexMatch(*pat*, *str*, *index*?)

Tests str to see if the substring starting at character index index matches the given regular expression pat. pat can be given as a string containing a valid regular expression, or as a RexPattern object.

If the leading substring of str matches the regular expression, the function returns the number of characters of the matching substring; if there’s no match, the function returns nil. This does not search for a match, but merely determines if str matches the expression in its leading substring. Note that a regular expression can successfully match zero characters, so a return value of zero is distinct from a return value of nil: zero indicates a successful match that’s zero characters long, and nil indicates no match.

If index is given, it indicates the starting index for the match; index 1 indicates the first character in the string, and is the default if index is omitted. If index is negative, it’s an index from the end of the string (-1 for the last character, -2 for the second to last, etc). This can be used to match a substring of str to the pattern without actually creating a separate substring value.

Refer to the regular expressions section for details on how to construct a pattern string.

rexReplace(*pat*, *str*, *replacement*, *flags*?, *index*?, *limit*?)

Replaces one or more matches for the regular expression pattern pat within the subject string str, starting at the character index given by index. replacement is a string giving the replacement text, or a function (regular or anonymous) to be invoked for each match to compute the replacement text.

The return value is the resulting string with the substitutions applied.

pat can a string that uses the regular expression syntax to specify the search pattern, or it can be a RexPattern object. (The latter is more efficient if you’ll be performing the same search repeatedly, since it saves the work of re-parsing the regular expression each time.) pat can also be a list (or a Vector or other list-like object) containing multiple search patterns; if it is, replacement can similarly be a list of replacements. More on this shortly.

Refer to the regular expressions section for details on how to construct a pattern string.

The flags value is optional. It controls variations on the replacement process. If it’s not provided, the default is ReplaceAll. If flags is specified, it’s a bitwise combination (with ‘|’) of the following values:

Note that you should never use 0 as the flags value. For compatibility with older versions, 0 has a special meaning equivalent to ReplaceOnce. If you have no other flags to specify, always use either ReplaceOnce or ReplaceAll, or simply omit the flags argument entirely.

If index is given, replacements start with the first instance of the pattern at or after the character index position. The first character is at index 1. If index is omitted, the search starts at the first character. If index is negative, it’s an index from the end of the string (-1 for the last character, -2 for the second to last, and so on). Note that a negative index doesn’t change the left-to-right order of the replacement; it’s simply a convenience for specifying the starting point.

If limit is included, it specifies the maximum number of replacements to perform. This can be nil to indicate that all matches are to be replaced, or an integer giving the maximum number of matches to replace. 0 (zero) means that no matches are to be replaced, in which case the original subject string is returned unchanged. If a limit argument is present, it overrides the ReplaceAll and ReplaceOnce flags - those flags are ignored if limit is present. If limit is omitted, the limit is taken from the flags.

replacement is a string to be substituted for each occurrence of a match to the regular expression pattern pat (or for just the first match, when ReplaceOnce is specified). Each match is deleted from the string, and replacement is inserted in its place.

The replacement text can include the special sequences %1 through %9 to substitute the original text that matches the corresponding parenthesized group in the regular expression. %1 is replaced by the original matching text of the first parenthesized group expression, %2 by the second group’s matching text, and so on. In addition, %\* is replaced by the match for the entire regular expression. Because of the special meaning of the percent sign, you have to use the special code %% if you want to include a literal percent sign in the replacement text.

For example, this would replace negative numbers in a string with accountant’s notation, by putting each negative number within parentheses and coloring it red:

    str = rexReplace('-(<digit>+)', str, '<font color=red>(%1)</font>', ReplaceAll);

Note that we’ve used a parenthesized group in the pattern to group the digits together. This grouped part of the match is available as %1 in the replacement text. This is how we manage to specify a replacement that includes the original numeric value that we matched. Note also that the minus sign is outside of the group, because we don’t want to include it in the substitution - we want to change a string like “-120” to “(120)”.

Using a pattern list: pat can be specified as a list of regular expressions (as strings, RexPattern objects, or a mix of the two). This lets you make substitutions for several different patterns at one time, without making successive calls to rexReplace().

When you supply a list of patterns, you can optionally supply a list of replacements (as strings, callback functions, or a mix). Each item in the pattern list is matched up with the corresponding item - the item at the same list index - in the replacement list. That is, pat[1] will be replaced with replacement[1], pat[2] will be replaced with replacement[2], and so on. If there are more patterns than replacements, the excess patterns are replaced with empty strings. Any excess replacements are simply ignored.

If pat is a list but replacement isn’t, rexReplace() simply uses the same replacement for every pattern. Note that this is different from passing replacement as a list containing one element: when replacement is a single-item list, all patterns after the first are replaced by empty strings, because of the rule for when the pattern list is longer than the replacement list.

There are two ways that rexReplace() can apply a list of replacements. The default is “parallel” mode. In this mode, rexReplace() scans the string for all of the patterns at once, and replaces the first (leftmost) occurrence of any pattern. (If two of the patterns match at the same position in the string, the one with the lower pat list index takes precedence.) If the ReplaceOnce flag is specified, the whole operation is done after that first replacement; otherwise, rexReplace() scans the remainder of the string, to the right of the first replacement, again looking for the leftmost occurrence of any of the patterns. It replaces that second occurrence, then repeats the process until there are no more matches for any of the patterns.

Parallel mode is similar to combining all of the patterns in the list using “|” to make a single pattern. There’s a key difference, though: using a list of patterns allows you to specify a separate replacement for each pattern.

The other mode is “serial” mode, which is used when you specify the ReplaceSerial flag. In serial mode, rexReplace() starts by scanning only for the first pattern, replacing each occurrence of that pattern throughout the string (or, if the ReplaceOnce flag is used, replacing just the first occurrence). If ReplaceOnce is specified, and we replaced a match for the first pattern, we’re done. Otherwise, rexReplace() starts over with the updated string - the result of applying the replacements for the first pattern - and scans this updated string for the second pattern. As with the first pass, we scan only for the second pattern on this pass, and we replace all occurrences (or just the first, if ReplaceOnce is used). We repeat this process for each additional pattern.

The ReplaceSerial mode is almost equivalent to calling rexReplace() iteratively, once for each pattern in the search list, using the result of the first call as the subject string on the second call, the result of the second as the subject string for the third, and so on. The difference is that a serial mode list will result in only one replacement overall, whereas calling the function iteratively could make another replacement on each iteration.

You should note an important feature of the serial mode: the replacement text from one pattern is subject to further replacement on the next pattern. This is because the entire result from each pass is used as the new subject string on the next pass. In contrast, in parallel mode the replaced text is never rescanned.

Using a callback function to generate the replacement: You can supply a function for replacement, instead of a string. This can be a regular named function or an anonymous function. When a function is specified, rexReplace() invokes it for each match to determine the replacement text. This is powerful because it lets you apply virtually any transformation to each replacement, rather than just substituting a fixed string.

A replacement function is invoked once for each matching string, as follows:

    func(matchString, matchIndex, originalString);

The matchString parameter receives a string containing the text that the regular expression matched, and which is to be replaced. matchIndex is the character index within the original string where this match starts. originalString is the full original string that’s being searched. The function should return a string giving the replacement text. It can alternatively return nil to replace the match with nothing, which is equivalent to returning an empty string. Within the function, you can use rexGroup() to retrieve the match text for any parenthesized groups within the search pattern.

You can omit one or more of the parameters when you define the callback function, because rexReplace will only supply as many arguments as the function actually wants. The arguments are always in the same order, though - the names don’t matter, just the order. This means that if you provide a callback that only takes one argument, it gets the match string value; with two arguments, they’ll be assigned the match string and match index, respectively.

Here’s an example that uses a replacement function to perform “title case” capitalization on a string. This capitalizes the first letter of each word in the string, except that it leaves a few small words (such as “of” and “the”) unchanged, but only when they occur in the middle of the text. This takes advantage of a callback function’s ability to vary the replacement based on the matched text and its position in the subject text. Note that this function omits the third parameter, since it doesn’t need the original string to carry out its task.

    titleCase(str)
    {
       local r = function(s, idx)
       {
           /* don't capitalize certain small words, except at the beginning */
           if (idx > 1 && ['a', 'an', 'of', 'the', 'to'].indexOf(s.toLower()) != nil)
               return s;

           /* capitalize the first letter */
           return s.substr(1, 1).toTitleCase() + s.substr(2);
       };
       return rexReplace('%<(<alphanum>+)%>', str, r, ReplaceAll);
    }

rexSearch(*pat*, *str*, *index*?)

Searches for the regular expression pat in the search string str, starting at the character position index. The pattern pat can be given as a string containing a valid regular expression, or as a RexPattern object.

If index is given, it gives the starting character position in str for the search. The first character is at index 1. If index is omitted, the search starts with the first character. A negative value is an index from the end of the string: -1 for the last character, -2 for the second to last, etc. Note that a negative index doesn’t change the left-to-right order of the search; it’s simply a convenience for specifying the starting point. The index value can be used to search for repeated instances of the pattern, by telling the function to ignore matches before the given point in the string.

If the function finds a match, it returns a list with three elements: the character index within str of the first character of the matching substring (the first character in the string is at index 1); the length in characters of the matching substring; and a string giving the matching substring. If there’s no match, the function returns nil.

Refer to the regular expressions section for details on how to construct a pattern string.

rexSearchLast(*pat*, *str*, *index*?)

Searches the string str backwards from the end for a match to the regular expression pat, starting at the character position index. This works like rexSearch(), but performs the search in the reverse order, starting at the end of the string and working towards the beginning.

index is the optional starting position for the search. The method looks for a match before but not including the character at this index. If index is omitted, the default is to search the entire string from the end. Equivalently, you can set index to str.length()+1, to indicate the imaginary character position just after the end of the string. Specifying 0 for index means the same thing. If index is negative, it’s an offset from the end of the string: -1 is the last character, -2 is the second to last, and so on. Note that -1 (or, equivalently, str.length()) means that you want to search the portion of the string up to but not including the last character, since the match can’t include the starting character.

The \<FirstBegin\> and \<FirstEnd\> modes (see regular expressions) work in mirror image compared to ordinary forward searches. The easiest way to think about this is to picture the reverse search as a forward search viewed in a mirror. For a reverse search, \<FirstBegin\> means that the match with its right end closest to the starting point is selected as the winner, while \<FirstEnd\> means that the match with its left end closest to the starting point is selected. Since the search proceeds right to left, closer to the starting point means further right, at a higher index.

Note that we use the terms left, and right-to-left loosely in the discussion above. In particular, we’re ignoring that different languages and scripts are written on paper in different directions. We’re talking purely about the order of characters in the string, using “left” to mean towards the beginning of the string and “right” to mean towards the end, regardless of whether the string contains Roman characters or Arabic characters or anything else.

saveGame(*filename*, *metaTable*?)

Saves the state of all objects (except transient objects) to a file.

filename specifies the file to save to; this can be a string giving the name of a file in the local file system, a FileName object, or a TemporaryFile object.

If an error occurs, the function throws a run-time error to indicate the problem. The saved state can later be restored using restoreGame().

metaTable is an optional LookupTable object containing “metadata” information to store in the file. This is a collection of game-specific descriptive information; this could include things like the current room name, score, number of turns, chapter number, etc. The interpreter and other tools can extract this information and display it to the user when browsing saved game files. For example, the file selector dialog for a RESTORE command could display the metadata for each available file.

The metaTable LookupTable must consist of string key/value pairs. saveGame() simply ignores any non-string keys or non-string values found in the table. Both the keys and the values are meant to be displayed to the user, so the keys should be descriptive titles for their respective values.

Starting in 3.1.1, the file safety settings must allow write access to the target file. FileName objects obtained from inputFile() “save” dialogs are always accessible.

savepoint()

Establish an undo savepoint. Multiple savepoints can be established, to mark multiple points in time. For example, you could establish a savepoint just after reading a command line from the user, so that the user can subsequently undo the entire effect of the command if desired. Similarly, if you wanted to perform an operation speculatively, to see what would happen if you carried out some series of actions, you could set an undo savepoint, then undo to the savepoint once you’ve finished the speculative operation.

sgn(*val*)

Returns the sign of the given number. val can be an integer or a BigNumber value. The return value is 1 if val is positive, 0 if val is zero, or -1 if val is negative. The result is always an integer, regardless of val’s type.

sprintf(*format*, ...)

Generates a formatted text string from a list of data values, according to the template string format. sprintf is especially handy for formatting numeric data, since it provides a number of numeric formatting options that are difficult to code by hand.

format is the format template string, which controls how the result string looks. The format string can include a mix of plain text and “format codes”. A format code starts with “%”, followed by a series of characters describing how to format one data item. The format code syntax is very similar to the syntax used in the C/C++ version of sprintf.

The additional arguments after format are the data values to plug into the the format codes in the template. Each format code in the format string corresponds to an item in the argument list, and is replaced by the string-formatted value of that argument.

The return value is a new string, consisting of the text of the format string, with each format code replaced by the corresponded argument value, formatted according to the format code.

Here’s a simple example:

    local str = sprintf('i=%d, j=%d, k=%d', 99, 23, 145);

“%d” is the format code for a decimal integer value; when sprintf sees “%d” in a format string, it takes the next argument, formats it into text as a decimal number, and replaces the “%d” with the formatted number. Each format item “consumes” an argument from the list, so when we have multiple format items, each one is replaced with the next argument in the list. So the code above produces the result string 'i=99, j=23, k=145'.

A format code consists of the following elements, in order: % flags width.precision type-spec. The % (a literal percent sign) and type-spec are required, and everything else is optional.

The flags, if present, consist of one or more of the following, in any order:

\[n\]

Argument number. n is a number from 1 to the number of arguments after the format string. The value for this item is taken from the given argument number rather than the default positional argument. For example, sprintf('i = %\[2\]d, j = %\[1\]d', 100, 200) produces 'i = 200, j = 100'. When this flag is used, the format item doesn’t affect the position counter for other items; for example, sprintf('i=\[%2\]d, j=%d', 100, 200) produces 'i=200, j=100', because the first format “i=[%2d]” item doesn’t count as a positional item, leaving the first positional argument still available for “j=%d”.

-

Left alignment. If the formatted value is shorter than the width value, padding is added after the value. By default, the value is right-aligned (padding is added before the value). For example, sprintf('i=%-4d', 123) produces 'i=123 '.

+

Always show the sign. A “+” sign is shown before positive numbers and 0. By default, only negative numbers are shown with a sign. For example, sprintf('i = %+d', 123) produces 'i = +123'.

``

(Space character): Show a space character before positive numbers and 0. This can be used to make positive and negative values use the same number of characters, without forcing a “+” sign before positive values.

,

Digit grouping. For integer and floating point types (b, d, e, E, f, g, G, o, u, x, X), adds a comma between each group of three digits (only before any decimal point). For example, sprintf('i = %,d', 1234567) produces 'i = 1,234,567'.

\_x

Padding character: changes the padding character from the default (a space) to x, which is any single character. E.g., %\_\*8x formats 123 as ‘*****123’.

\#

For integer types (b, d, o, u, x, X), if a width is specified, adds leading zeros as needed to display at least width digits.

For the floating point types (e, E, f, g, G) displays a decimal point even if there are no digits after the decimal.

For floating point types g and G, keeps all trailing zeros after the decimal point (trailing zeros are normally removed) so that precision digits are always displayed.

width is an optional number giving the minimum number of characters to use for the item. If the formatted value is shorter than width, padding is added before or after the item to fill out the specified field width. Spaces are used for padding by default, except that if width starts with a zero (e.g., “%08d”), leading zeros are used instead, provided that left alignment isn’t also specified. The “_” flag (see above) lets you specify a custom padding character. width is only a minimum; if the displayed value is longer than width, the value isn’t truncated.

precision is another optional number, preceded by a period “.”. For example, %.8d specifies a precision of 8. The meaning of precision varies by type:

Integer types (b, d, u, o, x, X)

The minimum number of digits to display. Leading zeros are added as needed. For example, ‘%.8d’ displays 1234 as ‘00001234’. By default, no leading zeros are added.

Basic floating point types (e, E, f)

The number of digits to display after the decimal point. The default is 6 digits. If the argument value has more digits than can be displayed, the value is rounded. For example, %.3f formats 123.456789 as '123.457'.

Variant floating point types (g, G)

The maximum number of significant digits to display (including before and after the decimal point). If the argument value has more significant digits than can be displayed, the value is rounded. For example, %.3g formats 12.789 as '12.8'.

Other types

Ignored

type-spec is the type specifier, which determines how the argument value is interpreted and formatted. This is a single character, from the following list:

%

a literal % sign. This type doesn’t use an argument value.

b

binary integer. The argument is interpreted as a number, and its unsigned integer value is rendered in binary (base 2, using 1s and 0s to represent the bits).

c

character. If the argument is a string, the first character of the string is used; otherwise the argument is interpreted as a number giving a Unicode character code, and that character is used.

d

decimal integer. The argument is interpreted as a number, and its integer value is rendered in decimal.

e

number in scientific notation (“exponent” format, such as 1.23e+010). The argument is interpreted as a number, and its value is rendered in scientific notation. By default, the value is displayed with exactly 6 digits after the decimal point, but you can change this by specifying a precision. For example, %.8e uses 8 digits after the decimal point. A precision of zero, %.0e or %.e, omits the decimal point, unless the \# flag is specified (e.g., %#0.e).

E

same as e, but displays the exponent with a capital “E”.

f

floating point number. The argument is interpreted as a number, and its value is rendered in floating point format. By default, the number is displayed with no limit on the digits before the decimal point, and exactly 6 digits after the decimal point. You can change the number of digits after the decimal point by specifying a precision value; e.g., %.8f displays 8 digits after the decimal point. A precision of zero, %.0f or %.f, omits the decimal point, unless the \# flag is specified (e.g., %#0.f).

g

compact floating point: uses either the e or f format, according to which one produces the shorter text output. The f format is shorter for any number whose decimal exponent is from -4 to the format’s precision option value, and the e format is shorter for anything outside this range. The precision option for this format specifies the total number of significant digits to display; the default is 6. By default, trailing zeros after the decimal point are removed, and the decimal point itself is removed if no digits are displayed after it. The \# flag keeps any trailing zeros, and keeps the decimal point even if there aren’t any digits to display.

G

same as g, but displays a capital “E” if scientific notation is used.

o

octal integer. The argument is interpreted as a number, and its unsigned integer value is rendered in octal (base 8).

r

Roman numerals, lower case. The argument is interpreted as a number, and its signed integer value is rendered in lower-case Roman numerals. If the value is less than 1 or greater than 4999, it’s displayed as a decimal integer, as though %d had been used.

R

Roman numerals, upper case. This is the same as %r, but displays the numeral in upper-case.

s

string. The argument value is rendered as a string. By default, the entire string is shown, but if there’s a precision setting, it specifies the maximum number of characters to show from the string; if the string is longer, it’s truncated to that number of characters.

u

decimal integer, unsigned. The argument is interpreted as a number, and its unsigned integer value is rendered in decimal.

x

hexadecimal integer. The argument is interpreted as a number, and its unsigned integer value is rendered in hexadecimal (base 16) using lower-case letters (abcdef).

X

same as x, but uses upper-case letters (ABCDEF).

Other characters are not valid as type specifiers. If you use an invalid type code, the whole % sequence is retained in the result string without any substitutions.

The first (leftmost) % item in the format string is matched up with the first argument in the argument list, and each subsequent % item is matched up with the next argument. (You can also use the \[ \] flag to select a particular argument, instead of automatically using the next argument in the list.)

Each type-spec code expects a particular datatype for its argument value. If the value isn’t of the correct type to begin with, sprintf will automatically try to convert it to the correct type, as follows:

Floating point types (e, E, f, g, G)
  • Integer and BigNumber values are displayed as given
  • Strings are converted to numbers, as though by calling toNumber()
  • `true` and `nil` are displayed as 1 and 0 respectively
  • Other types cause an error
Integer types (b, d, o, u, x, X)
  • Integer values are displayed as given
  • BigNumbers are rounded to the nearest integer (but they're still BigNumbers, so they don't have to fit into a 32-bit integer)
  • Strings are converted to numbers, as though by calling toNumber(), then rounded to the nearest integer if they have decimal points
  • `true` and `nil` are displayed as 1 and 0 respectively
  • Other types cause an error
Character (c)
  • An integer gives the Unicode character code to display, from 0 to 65535 (values outside this range cause an error)
  • A BigNumber value is rounded to the nearest integer, which gives the Unicode character code to display
  • For a string, the first character of the string is displayed
  • Other types cause an error
String (s)
  • String values are displayed as given
  • Anything else is converted to a string, as though by calling toString()

Unsigned integers: several of the integer types (b, o, u, x, X) display “unsigned” integer values. This means that if the argument is a regular 32-bit integer value, and it’s negative, the value is interpreted as an unsigned quantity in the native hardware format of the machine the program is running on. Almost all modern computers use two’s complement format, which represents negative numbers as though they were very large positive numbers. For example, %x formats -1 as 'ffffffff'. See toString for more discussion on unsigned integers.

There’s no such thing as an unsigned BigNumber, and no way to interpret a BigNumber as unsigned. If you format a negative BigNumber value with an unsigned integer type spec, the “unsigned” aspect of the format code is ignored, and the value is shown as negative, with a minus sign. For example, %x formats -255.0 as '-ff'. If you really want the two’s complement version of a BigNumber value, use toInteger() to explicitly convert the argument to an integer (but if you do this, note that the value must be in the valid range for a 32-bit integer, -2147483648 to +2147483647).

toInteger(*val*, *radix*?)

Convert the value given by val to an integer.

If the radix value is specified, the conversion uses the given radix as the numeric base for the conversion; this value can be any integer from 2 to 36. If radix is omitted, the default is 10 (decimal).

The interpretation of val depends on its type:

See also the toNumber function, which can parse strings containing floating point values and whole numbers too large for the ordinary integer type.

toNumber(*val*, *radix*?)

Convert the value given by val to an integer or BigNumber, as appropriate.

If the radix value is specified, the conversion uses the given radix as the numeric base for the conversion; this value can be any integer from 2 to 36. If radix is omitted, the default is 10 (decimal).

The interpretation of val depends on its type:

See also the toInteger function, which explicitly converts values to integers. The main difference between toInteger and toNumber is that toNumber can parse strings that have to be represented as BigNumber values (such as floating point values and very large whole numbers), whereas toInteger can only handle ordinary integers.

toString(*val*, *radix*?, *isSigned*?)

Convert the given value val to a string.

radix is a number giving the base for a numeric value; if this is omitted, the default is 10, for a decimal representation. The base can be any value from 2 (binary) to 36. Bases above 10 use letters for “digits” above 9: A=10, B=11, C=12, etc. This is analogous to the usual hexadecimal number system, just generalized for any base up to 36 (where Z=35). The radix is meaningful only when val is an integer, or a BigNumber with no fractional part. It’s ignored for other values.

isSigned is meaningful only with integer values. It’s ignored for other types (including BigNumber values, even when they’re whole numbers with no fractional part). true means that an integer value is represented in the result as its ordinary positive or negative arithmetic value; negative numbers are represented with a minus sign (a hyphen, ‘-‘) followed by the absolute value, and positive numbers simply with the digits of the absolute value. nil means that an integer is interpreted as “unsigned”, as explained below.

If isSigned is omitted, the default is true if the radix is 10 (or omitted, in which case the default value is 10), nil for any other radix value - so the default is “unsigned” for hex, octal, and all other non-decimal bases. This is the default because hex and octal notation are traditionally used as a way to represent bit patterns or raw byte-oriented data rather than arithmetic values.

“Unsigned” means that the value is interpreted as a simple binary number with no positive/negative sign information. Normally, TADS treats integer values as “signed”, meaning they have a plus or minus sign attached. Computers store signed integers by using half of the overall binary value range to represent positive numbers, and the other half to represent negative numbers. This reduces the maximum value that can be stored by a factor of two. That’s why a 32-bit signed integer can only hold values up to +2,147,483,647, even though 232 is 4,294,967,296. The unsigned interpretation means that we ignore the special meaning of the plus/minus sign information. This doubles the maximum value up to the full +4,294,967,295 (232-1, not 232, since zero takes up one of the available 232 values), but the price is that there’s no such thing as a negative number in this interpretation (thus the name “unsigned”).

The main value of an unsigned interpretation is when you’re using an integer as a combination of bit flags (using the bitwise operators \| and &), or for interpreting raw bytes from a binary file format or other binary source, rather than as an arithmetic value. In this case, an unsigned view lets you see all of the bits directly, without regard to how the machine uses the bits to encode negative numbers.

undo()

Undoes all changes to object state back to the most recent undo savepoint, as established with the savepoint() function. Returns true if successful, nil if insufficient undo information is available. This can be called repeatedly; each call undoes changes to the next most recent savepoint. All changes affecting object state since the last savepoint are undone by this operation, except that transient objects are not affected.

When the function returns nil, it will have made no changes to the system state. The function never makes any changes unless it has a complete set of undo information back to a savepoint, so the function will never leave the system in an inconsistent state. The VM has an internal limit on the total amount of undo information retained in memory at any given time, to keep memory consumption under control during a long-running session; as new undo information is added, the VM discards the oldest undo information as needed to keep within the memory limits. This maintains a rolling window of the most recent undo information.


TADS 3 System Manual
Table of Contents | The Intrinsics > tads-gen Function Set
Prev: t3vm Function Set     Next: Regular Expressions