REF ARRAYS John Gibson Feb 1996
COPYRIGHT University of Sussex 1996. All Rights Reserved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<< ARRAY PROCEDURES >>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
This REF file details the procedures which deal with the array data
structure. It introduces array structures and how they are implemented
in Poplog. There is also a section on sparse arrays, which can be used
for storing associations between arbitrary objects (see also
REF * PROPS).
CONTENTS - (Use <ENTER> g to access required sections)
1 Introduction
2 Predicates On Arrays
3 Obtaining Array Parameters
4 Constructing New Arrays
5 Generic Data Structure Procedures on Arrays
6 Miscellaneous
6.1 Storing Arrays on Disk
7 Examples of Arrays
8 Sparse Arrays
8.1 Examples of Sparse Arrays
9 Related Documentation
--------------- 1 Introduction
---------------
Arrays are data structures which enable information to be stored and
accessed on the basis of a set of integer subscripts, where each
subscript corresponds to a separate 'dimension' of the array; thus an
N-dimensional array requires N such subscripts to access or update each
of its elements.
A one dimensional array is analogous to a vector except that vectors
can be indexed only from 1 to their length, whereas arrays can have
bounds covering arbitrary ranges of integers. (For information on
vectors also see REF * VECTORS, * INTVEC, * STRINGS, * DATA, and
REF * KEYS.)
Because computer memory cannot directly represent multiple
dimensions, the elements of an array actually have to be stored in an
underlying 1-dimensional structure (e.g. a vector), each position in
which corresponds to a particular set of N subscript values. The task of
the array accessing mechanism is therefore to convert a given set of
N-dimensional subscripts into the appropriate 1-dimensional subscript
for accessing this structure.
In Poplog, arrays are implemented as special procedures constructed
by newanyarray. Such a procedure for an N-dimensional array takes N
arguments, and includes within it the arrayvector, that is, the
underlying 1-dimensional structure in which the elements of the array
are in reality stored (which, despite its name, need not actually be a
vector although it usually is). When called, e.g.
array(sub1, sub2, ..., subN)
the array procedure performs the computation of the 1-dimensional
subscript from the N values given, and supplies this to the appropriate
access procedure for the arrayvector structure (which then returns or
updates the specified element).
An array may have any number of elements in a given dimension,
subscripted by any suitable range of integers. That is, an array with
say, 30 in a particular dimension is not restricted to using subscripts
1 to 30, but can use 0 to 29, 5 to 34, -10 to 19, etc. An array's
dimensions are specified to newanyarray by supplying its boundslist,
i.e. a list of the minimum and maximum subscripts in each dimension (of
length 2 * N for an N-dimensional array). When called, array procedures
mishap if any of their subscript arguments is not in the appropriate
range.
When using an array in general, it is not necessary to know how its
elements are arranged in the arrayvector; however, this does need to be
known when the latter needs to be processed separately from the array.
While in principle many different schemes are possible, Poplog (like
most systems) provides just two: arraying 'by row' or 'by column'. Since
the terms 'row' and 'column' only make sense for 2-dimensional arrays,
we shall not attempt to define them, but simply say that 'by row' means
the elements are stored with subscripts increasing in significance from
left to right, and 'by column' with subscripts increasing in
significance from right to left. That is, letting array be a
3-dimensional array with subscripts 1 - 3 in each dimension, the order
of the 3 * 3 * 3 = 27 elements in its arrayvector with the two different
schemes would be as follows:
Arrayvector Subscript By Row By Column
--------------------- ------ ---------
1 array(1, 1, 1) array(1, 1, 1)
2 array(2, 1, 1) array(1, 1, 2)
3 array(3, 1, 1) array(1, 1, 3)
4 array(1, 2, 1) array(1, 2, 1)
5 array(2, 2, 1) array(1, 2, 2)
6 array(3, 2, 1) array(1, 2, 3)
7 array(1, 3, 1) array(1, 3, 1)
8 array(2, 3, 1) array(1, 3, 2)
9 array(3, 3, 1) array(1, 3, 3)
10 array(1, 1, 2) array(2, 1, 1)
... ... ...
26 array(2, 3, 3) array(3, 3, 2)
27 array(3, 3, 3) array(3, 3, 3)
Note that (as mentioned above), the arrayvector of an array is not
limited to being an actual vector-class structure. The arguments to
newanyarray allow the specification of any 'virtual' vector, in terms of
an 'vector init' procedure and a 'subscriptor' procedure; newanyarray
then calls the 'init' procedure with an appropriate length argument to
construct the arrayvector initially, and the array procedure then uses
the 'subscriptor' to store and retrieve elements.
Note also that two or more arrays can be made to share the same
arrayvector structure, either in whole or in part (for example, one
array can be a sub-array of another).
While arrays are limited to storing associations between sets of
integers and arbitrary values, Poplog also provides mechanisms for
storing associations between arbitrary objects -- see Sparse Arrays
below and REF * PROPS. See also REF * PROCEDURE for procedures
applicable to arrays as procedures in general.
----------------------- 2 Predicates On Arrays
-----------------------
As mentioned above, arrays are procedures: thus isprocedure is true for
any array, and the datakey of an array will be the same as the datakey
of any procedure. (See REF * KEYS.)
isarray(item) -> array [procedure]
Returns a true result if item is either an array procedure
itself, or is a closure of one through any number of
sub-closures (i.e. the pdpart of item is examined recursively
until a non-closure is found, and then this is tested for being
an array). If item is an array procedure, or one is found inside
a nest of closures, then that is returned, otherwise false.
(This enables closures of arrays to be recognised as arrays,
e.g. if array is 3-dimensional array then isarray will return
array when applied to the 2-dimensional array array(%2%), etc.)
isarray_by_row(array) -> bool [procedure]
Returns true if the array array is arrayed by row, false if
arrayed by column.
----------------------------- 3 Obtaining Array Parameters
-----------------------------
boundslist(array) -> bounds_list [procedure]
Given an array array, returns its list of minimum and maximum
subscripts in each dimension.
arrayvector(array) -> arrayvec [procedure]
arrayvec -> arrayvector(array)
Returns/updates the 1-dimensional structure used to hold the
elements of the array array. The updater checks that the new
arrayvec is the same data type as the old.
arrayvector_bounds(array) -> min_sub -> max_sub [procedure]
Returns the minimum and maximum subscripts used by the array
array on its arrayvector. (Generally, min_sub will be 1 and
max_sub will be the number of elements in the array; this will
not be the case, however, when array is actually a sub-array of
the arrayvector.)
array_subscrp(array) -> subscr_p [procedure]
Returns the subscriptor procedure used to access elements in the
arrayvector of the array array.
-------------------------- 4 Constructing New Arrays
--------------------------
newanyarray is the basic procedure for constructing arrays. For
historical and other reasons, the arguments to this procedure are
somewhat complicated, in that it can take a number of optional arguments
in various combinations. Essentially though, the pieces of information
it requires to construct a new array are quite straightforward, viz
¤ A 2 * N list of minimum and maximum subscripts for each of the N
dimensions (the boundslist). This is used to derive both the
number of dimensions, and the size in each dimension (and thus the
total number of elements in the array).
¤ A specification for the arrayvector to hold the elements of the
array, and a subscriptor procedure with which to access and update
it. These two can be specified in a number of different ways,
either as separate arguments or by a single argument which implies
values for both together.
Other optional pieces of information can specify whether the elements
are to be arrayed by row or by column, an initialisation for each array
element, and (when the new array is to be a sub-array of an existing
one), the starting offset of the new array within the existing
arrayvector.
newanyarray(bounds_list, element_init, arrayvec_spec, [procedure]
subscr_p, arrayvec_offset, by_row) -> array
This procedure constructs and returns a new N-dimensional array
procedure array (where N >= 0). Its arguments are as follows:
bounds_list
A list of 2 * N integers whose elements are alternately the
minimum and maximum subscript in each dimension, i.e.
[% min1, max1, min2, max2, ..., minN, maxN %]
This argument is always required. An empty boundslist
specifies a 0-dimensional array (an object which when
applied to zero subscripts returns its single value).
element_init
This argument is optional, and specifies an initialisation
for each array element: its interpretation depends on
whether it is a procedure or not.
If a procedure, it is assumed to take N subscript arguments
and to return a (possibly different) initialising value for
each position in the array. newanyarray will then initialise
the array by applying the procedure in turn to every
combination of subscripts, i.e.
element_init(sub1, sub2, ..., subN)
-> array(sub1, sub2, ..., subN)
(Note that the order in which the combinations of subscripts
are generated is UNDEFINED).
Otherwise, if element_init is not a procedure, every element
of the array is simply initialised to that value. (However,
to distinguish it from bounds_list, this argument must not
be a list -- if you wish to initialise the elements to some
list, you must use a procedure returning the list, as
above.)
arrayvec_spec
This argument is always required, and specifies the
arrayvector structure to be used to store the elements of
the array. It must supply either an existing structure, or a
procedure to construct a new one: for the former the value
may be either
(a) an actual vector-class structure (e.g. full vector,
string, intvec, shortvec, etc);
(b) an array procedure whose arrayvector is to be used;
while for the latter it may be
(c) a 'vector init' procedure p (which will be called as
p(size) -> arrayvec
where size is the number of elements in the array);
(d) a vector-class key whose class_init procedure is to
be used (see REF * DEFSTRUCT, * KEYS).
All cases except (c) also implicitly supply a subscriptor
procedure for the structure, making the next argument
(subscr_p) optional; for case (c), subscr_p must be present.
If an existing structure is specified with (a) or (b), it
must of course be large enough to accommodate the new array
elements.
Note that if element_init is omitted, the initial values of
the array elements will depend upon the arrayvec_spec
argument (i.e. for a new structure they will be whatever the
init procedure initialises them to, and for an existing
structure they will have their current values).
subscr_p
A subscriptor procedure for accessing and updating elements
of the arrayvector, i.e. a procedure with updater of the
form
subscr_p(subscript, arrayvec) -> element
element -> subscr_p(subscript, arrayvec)
This argument may always be supplied, but is essential only
when arrayvec_spec specifies a 'vector init' procedure; in
all other cases, i.e. an existing array, an existing
vector-class structure or vector-class key, the subscriptor
procedure derived from that is used if subscr_p is omitted.
arrayvec_offset
An optional integer argument specifying the starting offset
of the new array's elements within an existing array vector
got from arrayvec_spec (i.e. this argument is illegal in all
cases where a new arrayvector has to be constructed). Note
that this is an offset, not a subscript, i.e. 0 means start
at the first element (the default), 1 at the second, etc.
The existing structure must be large enough to accommodate
all the array elements starting at the given offset.
by_row
An optional argument specifying whether the array elements
are to be arrayed by row or by column: if supplied, it must
be a boolean, true meaning by row or false meaning by
column. If omitted, the value of poparray_by_row is used
instead.
poparray_by_row -> bool [variable]
bool -> poparray_by_row
This boolean variable controls the order in which the elements
of an array produced by newanyarray are stored in its underlying
vector, and supplies the default value for the BY_ROW argument
(q.v.).
newarray(bounds_list, element_init) -> full_array [procedure]
This procedure provides a simpler interface to newanyarray for
constructing arrays of full items, and is just
newanyarray(% vector_key %)
i.e. use standard full vectors to store the array elements (see
REF * VECTORS).
newsarray(bounds_list, element_init) -> char_array [procedure]
Same as newarray, but using strings rather than full vectors,
i.e.
newanyarray(% string_key %)
(see REF * STRINGS).
---------------------------------------------- 5 Generic Data Structure Procedures on Arrays
----------------------------------------------
The generic data structure procedures described in REF * DATA
(datalength, appdata, explode, fill, and others defined in terms of
those) can all be applied to arrays: they treat an array as the set of
its arrayvector elements between the minimum and maximum subscripts
given by the arrayvector_bounds. Thus, for example, if
arrayvector_bounds(array) -> min_sub -> max_sub
then datalength(array) will be
max_sub - min_sub + 1
Similarly, appdata(array, p) will apply the procedure p to the value of
each arrayvector element from min_sub to max_sub inclusive, and so on.
The procedure copy when applied to an array copies both the array
procedure and its arrayvector (so that the copy is completely
independent of the original).
---------------- 6 Miscellaneous
----------------
in_array [syntax]
This * FOR_FORM allows iteration over data in arrays, much as
other forms allow iteration over data in lists and vectors.
Its format is:
for var1, var2, ...
with_index ivar
in_array array1, array2, ...
updating_last n
in_region list
of_dimension d
do
statement-sequence
endfor
See HELP * in_array for details.
arrayscan(boundslist, p) [procedure]
Given a boundlist of array minimum and maximum subscripts (as
supplied to newanyarray, etc), applies the procedure p to every
combination of subscripts in that range (ordered 'by column',
i.e. with the last subscript varying fastest). For each
combination, p is given a list of subscripts of length N, where
boundslist is of length 2 * N. For example, the following
arrayscan([3 5 1 3], spr);
will produce
[3 1] [3 2] [3 3] [4 1] [4 2] [4 3] [5 1] [5 2] [5 3]
The for-forms * in_array and * in_region provide alternatives to
this.
6.1 Storing Arrays on Disk
---------------------------
The libraries LIB * ARRAYFILE and LIB * DATAFILE may be used to store
arrays on disk, and be subsequently read back into Poplog. arrayfile is
the more efficient and space-economical procedure; however, it can only
be used on arrays whose arrayvector is a 'byte-accessible' data
structure. datafile should be used to save arrays whose arrayvector is a
full vector.
arrayfile(filename) -> array [procedure]
arrayfile(filename, array) -> array
array -> arrayfile(filename)
Stores the array array in the disk file named by filename, using
a special (machine-specific) data format. The arrayvector of
array must be a byte-accesible structure. The following
information is recorded:
¤ The number of dimensions, and bounds, of array
¤ The dataword of the array's arrayvector
¤ Whether array is ordered by row or column
¤ The contents of the arrayvector of array
In its non-updater form, arrayfile reads the information stored
in filename, and returns a new array constructed from it. If the
optional second argument array is specified, the data saved in
filename is written into it; no new array is created. See
HELP * ARRAYFILE for examples.
The procedure datafile is documented in REF * datafile
--------------------- 7 Examples of Arrays
---------------------
The following:
constant procedure
multab = newarray([1 12 1 12], nonop *);
will create and assign to multab a 12 x 12 2-dimensional array each of
whose elements can be a full Poplog item, and where each element
subscripted by I, J is initialised to I * J (in other words, multab is a
multiplication table):
multab(4, 3) =>
** 12
To turn an existing string of "noughts-and-crosses" into an array:
constant procedure
oxo = newanyarray([1 3 1 3], '0 X X00 X');
cucharout(oxo(1, 3));
0
If, in the previous example, we make the array by row instead of by
column, the mapping between subscripts and positions in the string is
"reversed":
constant procedure
oxo2 = newanyarray([1 3 1 3], '0 X X00 X', false);
cucharout(oxo2(1, 3));
X
---------------- 8 Sparse Arrays
----------------
It is sometimes necessary to use large arrays in which many of the
elements will all have some default value, and only relatively few will
have a 'significant' value, i.e. different from the default. Represented
as ordinary arrays, such 'sparse' arrays are very wasteful of space,
since the arrayvector must allow storage cells for each element
corresponding to a given combination of subscripts, and yet many or most
of these will just repeat the default value.
Rather than using an vector-type structure to hold the elements,
Poplog sparse arrays are therefore implemented as multi-dimensional
properties, that is, they use a tree of properties to associate each set
of subscripts to a value (see REF * PROPS for a description of
properties). This means that only the 'significant' elements take up
space, and moreover, the 'subscripts' are not restricted to being
integers, but can be any Poplog items. (Although this approach saves
space, it does however mean that accessing sparse array elements is
slower than for ordinary arrays.)
A sparse array can also have an 'active default' -- i.e. a procedure
that is run to decide what the default value of an element should be
when it has not been assigned a value explicitly; for certain
applications this can save even more space.
newanysparse(dimensions, default_item) -> sparse_array [procedure]
newanysparse(dimensions, default_p, apply) -> sparse_array
Constructs and returns a new N-dimensional sparse array
procedure (where N >= 1). This can be called as
sparse_array(sub1, sub2, ..., subN) -> element
element -> sparse_array(sub1, sub2, ..., subN)
to access or update the element associated with a given set of N
'subscripts' (which can be any items at all, but note that
subscript equality is on the basis of ==, not =).
The dimensions argument specifies the number of dimensions N,
either directly as an integer or as an N-element list. In the
latter case, the elements of the list are integers giving the
table sizes to be used for the properties employed for each
dimension; in the former case, when dimensions is simply an
integer, the table size in each dimension defaults to 20. (The
table size for a property affects how fast items are accessed in
it -- see REF * PROPS. For maximum speed it should be roughly
the 'size', i.e. number of different subscript values, in each
dimension; also, subscripts are dealt with from right to left,
so that putting 'larger' dimensions to the left of 'smaller'
ones will increase efficiency.)
The remaining argument(s) specify the default value for elements
of the array: in the first form of the call, the item
default_item is the fixed default value for every element. The
second form specifies an 'active default' procedure default_p,
and to distinguish this from the first form (in which
default_item could be a procedure), the last argument must be
the procedure apply. default_p is expected to be a procedure
which takes N 'subscript' arguments and returns a default value,
i.e.
default_p(sub1, sub2, ..., subN) -> value
(this is comparable to a procedure specified as the element_init
argument to newanyarray).
newsparse(dimensions) -> sparse_array [procedure]
This is a simpler version of newanysparse which has the word
"undef" as a fixed default_item, i.e. same as
newanysparse(%"undef"%)
8.1 Examples of Sparse Arrays
------------------------------
The first example is an sparse array representing points in 3-D space,
where each element in the array is a list of points to which that point
is connected to; each point is defaults to being connected just to
itself:
define lconstant here(x, y, z);
lvars x, y, z;
[[^x ^y ^z]]
enddefine;
constant procedure
connected = newanysparse(3, here, apply);
In this array, each cell contains a list of the points it is directly
connected to:
connected(1, 2, 3) =>
** [[1 2 3]]
connected(8, 9, 10) =>
** [[8 9 10]]
Some explicit connectivity can then be added:
[8 9 10] :: connected(1, 2, 3) -> connected(1, 2, 3);
connected(1,2,3) =>
** [[8 9 10] [1 2 3]]
The second example creates a sparse array in which the 3 'dimensions'
are English words, foreign languages, and modalities, and where each
element is the translation of the English word into the corresponding
language and modality (the default value being false for no
translation):
constant procedure
dictionary = newanysparse(3, false);
"bonjour" -> dictionary("hello", "french", "polite");
"ola" -> dictionary("hello", "spanish", "familiar");
dictionary("hello", "french", "polite") =>
** bonjour
dictionary("hello", "french", "vulgar") =>
** <false>
------------------------ 9 Related Documentation
------------------------
REF * DATA
Information on Poplog data types and how they are represented.
REF * KEYS
Describes the information associated with each data type and some
general procedures for manipulating data and creating fast access
procedures.
REF * VECTORS
Describes standard full vectors (vectors containing any Poplog
items).
REF * INTVEC
Describes intvecs (vectors containing 32 bit signed integers) and
shortvecs (vectors containing 16 bit signed integers).
REF * STRINGS
Describes byte vectors.
REF * PROPS
Information on properties (hash-tables, indexed by arbitrary
objects).
REF * DEFSTRUCT
Describes syntax for creating new vector or record classes.
HELP * ARRAYS
An introduction to arrays in Pop-11.
HELP * NEWMAPPING
Describes a particular type of property for associating items with
structures on the basis of structure equality.
+-+ C.all/ref/arrays
+-+ Copyright University of Sussex 1996. All rights reserved.