REF PROPS John Gibson Apr 1992
COPYRIGHT University of Sussex 1992. All Rights Reserved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<< PROPERTIES >>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
This REF file describes the difference between the types of properties
and when each type should be used. It details how to construct
properties, the data structure procedures which apply, and also gives
some examples of the use of properties. Finally the use of hashing
procedures and hashing algorithims is described.
CONTENTS - (Use <ENTER> g to access required sections)
1 Introduction
2 Predicates on Properties
3 Constructing Properties
4 Manipulating Properties
5 Generic Data Structure Procedures on Properties
6 Destroy Properties
7 Examples of Using Properties
8 Hashing Procedures
8.1 Standard Hashing Algorithms
---------------
1 Introduction
---------------
A property is a means of storing associations between items: the
property itself is a procedure which when called with some item item as
argument will return the item associated with item, if there is one, or
a user-supplied default value otherwise. Associations between items may
be set up by calling the same procedure in update mode, or by supplying
a list of initial associations when the property is constructed using
newproperty or newanyproperty.
Properties come in two basic types: permanent and temporary (in
addition to a special destroy type for associating destroy actions with
objects, see below). The difference between permanent and temporary
involves the interaction of the property with garbage collection.
Suppose a property prop has an entry associating an item val (the value)
with an item arg (the argument), and suppose that there are no other
references remaining to the argument arg elsewhere in the system (i.e.
arg is otherwise garbage). Then, if the entry were treated by garbage
collection like any other record, the presence of arg in the entry would
prevent it (i.e arg) from being garbaged. Similar remarks apply to the
value val if that too has no other references remaining.
For some applications this may be the desired behaviour; for others,
the property entry may no longer be required at all if either ARG or VAL
are otherwise garbage. The system's action in this respect can therefore
be specified for the entries in a given property, by supplying one of
the following keyword arguments when the property is initially
constructed (with newproperty or newanyproperty):
Keyword Meaning
------- -------
"perm" Permanent: entries are never discarded.
"tmparg" Temporary on the argument: entries are discarded for
which the arg items are otherwise garbage.
"tmpval" Temporary on the value: entries are discarded for
which the val items are otherwise garbage.
"tmpboth" Temporary on both: entries are discarded for which
either the arg or val items are otherwise garbage.
"tmpclr" Temporary cleared: all entries are cleared from the
property at every garbage collection.
(Note of course that if an entry is kept in any case, then both arg and
val items are forced to be retained as well, regardless of whether they
were garbage or not.)
Thus for example, "perm" properties may be used for retaining
permanent tables of information, and "tmparg" properties for attaching
ad hoc pieces of information to arbitrary records (which will be garbage
collected along with the records). "tmpclr" properties are useful for
short-term caching of computed data.
A "destroy" property is a special kind of "tmparg" property. Like
the latter the presence of arg as the argument of a destroy property
entry will not stop it from being considered as garbage, but instead of
simply discarding arg, the entry containing it is added to a special
list maintained by the garbage collector. At the end of the garbage
collection those destroy entries whose argument would otherwise have
become garbage (i.e. and the entry discarded) are examined. If their
value slot contains a procedure, or an identifier whose idval is a
procedure, the procedure is run, with arg passed as argument. Only then
is arg discarded in the same manner as with temporary properties.
Destroy properties thus allow arbitrary procedures to be attached to
objects in such a way that the procedure gets run, with the object as
argument, at the point at which the object would otherwise have become
garbage (the point of its destruction).
Properties are organised around a table of "buckets", each of which
contains a list of entries; which bucket an entry is put in is
determined by a hashing algorithm. The number of buckets is
user-specifiable: a larger number of buckets will decrease the access
time for entry (since the lists in each bucket will be shorter), but at
the cost of using more space.
Using the more verbose procedure newanyproperty users may specify
their own hashing algorithm: this is particularly useful if you wish to
associate the same data with items that are not always identical (as
tested with ==) but merely structurally equal (as tested by =). Users
may also create "expandable" properties, the tables of which will
automatically enlarge by a given factor whenever the table passes a
given threshold value. Expandable properties will never decrease in size
even if the number of entries in the property drops below the given
threshold.
newanyproperty also allows the creation of "active" properties. When
an item is not found in an active property then instead of returning the
default value, the item and the property is passed to a user supplied
procedure to return the result.
A facility for constructing "multi-dimensional" properties is also
provided by the "sparse array" mechanism, described in REF * ARRAYS
---------------------------
2 Predicates on Properties
---------------------------
As mentioned above, properties are procedures: thus isprocedure is true
for any property.
isproperty(item) -> bool [procedure]
Returns true if item is a property, false if not.
--------------------------
3 Constructing Properties
--------------------------
newproperty(assoc_list, tab_size, default, gc_type) -> prop [procedure]
This procedure returns a new property prop. The arguments are:
assoc_list
An initial list of associations, where each association is a
2-element list of the form [<argument> <value>].
tab_size
The table size, i.e. the number of buckets to be used.
default
The default value to be produced when there is no entry for
a given argument.
gc_type
A keyword specifying the permanent/temporary status of the
property entries with respect to garbage collection (as
described in the "Introduction" section):
Keyword Meaning
------- -------
"perm" Permanent: entries are never discarded.
"tmparg" Temporary on the argument: entries are
discarded for which the argument items are
otherwise garbage.
"tmpval" Temporary on the value: entries are discarded
for which the value items are otherwise
garbage.
"tmpboth" Temporary on both: entries are discarded for
which either the argument or value items are
otherwise garbage.
"tmpclr" Temporary cleared: all entries are cleared from
the property at every garbage collection.
Note that assigning the default value default to an argument
item actually removes the entry for that item (if there was
one).
newassoc(assoc_list) -> prop [procedure]
Provides a simpler interface to newproperty, for constructing
permanent properties with default value false. Defined as
newproperty(assoc_list, 20, false, "perm") -> prop
where assoc_list is as for newproperty.
newanyproperty(assoc_list, tab_size, expand_pow, thresh, [procedure]
hash_p, equal_p, gc_type,
default, active_default) -> prop
This is for constructing more complex properties. It returns a
new property prop, where the arguments are:
assoc_list
An initial list of associations, as above.
tab_size
The initial table size, as above.
expand_pow
false for a static property, or an integer for an expandable
one. When thresh entries have been added the table size of
the property will increase by the formula:
new size = old size * (2 ** expand_pow)
thresh
An integer expansion threshold, as explained above (or false
if expand_pow is false). After expansion, the threshold is
increased in proportion to the property's size.
hash_p
Hashing procedure, or false to use the default hashing
algorithm. The hashing procedure is applied as
hash_p(item) -> bucket
where bucket is an item specifying a bucket in the table.
bucket can be a simple integer (first bucket is 0), or any
other item (in which case the address of that item is used).
In all cases, bucket is taken modulo the current table size.
(See Hashing Procedures below).
To specify that the hashing algorithm relies on the
address of an item, and that the property must therefore be
rehashed after every garbage collection, enclose the
procedure in a reference, i.e. consref(hash_p).
equal_p
Equality procedure used to check that an item matches on in
the table. The default procedures (used if equal_p is false)
is absolute identity (==). You cannot specify an equal_p if
you have not specified a hash_p.
gc_type
A keyword specifying the permanent/temporary status of the
property entries with respect to garbage collection; same as
the gc_type argument to newproperty above.
Note that because the concept of an argument item in a
property entry being 'otherwise garbage' is (generally)
irrelevant when entries are not matched on the basis of
absolute identity, you should exercise care in using the
"tmparg" and "tmpboth" types with an equal_p other than (or
defaulting to) == .
default
Default value, as for newproperty above.
active_default
Active default procedure for the property or false
otherwise. If an item cannot be found in the table default
is returned if active_default is false, otherwise
active_default is applied to the item and the property, i.e.
active_default(item, prop) -> value
newmapping(assoc_list, tab_size, default, expand) -> prop [procedure]
Provides a simpler interface to newanyproperty for constructing
properties that match argument items on the basis of the
standard structure equality operator "=" (see REF * DATA), and
that use the standard procedure syshash to hash items into
buckets (see the description of syshash below).
If true, the expand argument specifies that the property
should have an expand_pow of 1 and a thresh of tab_size (where
tab_size is also the initial table size). newmapping is thus
defined as
newanyproperty(assoc_list, tab_size,
if expand then
1, tab_size
else
false, false
endif,
syshash, nonop =, "perm",
default, false) -> prop
--------------------------
4 Manipulating Properties
--------------------------
See also Fast Operations on Properties in REF * FASTPROCS for
unsupported procedures which enable some property operations to be
performed quicker.
appproperty(prop, p) [procedure]
Applies the procedure p to each entry in the property prop. For
each entry, p is given two arguments, i.e
p(arg, value)
Note that p is effectively applied to a temporary copy of prop
so that any alterations made by the procedure p to prop itself
during the execution of appproperty will have no effect on the
entries to which p is applied (thus p can safely delete entries
or add new ones). Also see fast_appproperty.
Note also that since the organisation of property entries
depends on a hashing algorithm (and also on the order in which
entries are added, etc), the order in which entries are given to
p is indeterminate (and further, may change between garbage
collections).
fast_appproperty(prop, p) [procedure]
Applies the procedure p to each entry in the property prop. For
each entry, p is given two arguments, i.e.
p(arg, value)
This is just like appproperty except that property table is not
copied first. This means that fast_appproperty should not be
used if the procedure is going to change any of the entries in
the property.
clearproperty(prop) [procedure]
Removes all entries from the property prop.
clear_prop_entries(prop) [procedure]
Removes all entries from the property prop. This is functionally
the same as clearproperty with the addition of clearing the
contents of the individual "property entry" data structures in
prop. This is only relevant if you use fast_get_prop_entry to
access the internal structure of prop.
property_default(prop) -> item [procedure]
Returns the default value for the property prop.
property_size(prop) -> tab_size [procedure]
Returns the current table size for the property prop. For a
non-expandable property, this will be the same as the tab_size
specified to newproperty or newanyproperty when the property was
created.
property_active(prop) -> active_default [procedure]
active_default -> property_active(prop)
Returns or updates the active default procedure for the property
prop (which is either a procedure or false, see the description
of the active_default argument to newanyproperty above).
A false argument to the updater will remove the active default
for prop (so that the property_default value will henceforth be
returned for any item not in the property table).
Note that the updater of this procedure can be used to give an
active default procedure to any property, including one
constructed with newproperty.
Prop_entry_state_init [procedure]
Prop_entry_state_next [procedure]
These two system procedures are used in LIB * in_property, to
support the in_property for-form.
property_equal(prop1, prop2) -> bool [procedure]
property_equal(prop1, prop2, val_eq_p) -> bool
An equality predicate for properties. Two properties are equal
if the following conditions hold:
(a) They both use the same equality procedure (6'th argument
to newanyproperty) when locating and removing entries.
(b) They both contain the same number of item/value
associations.
(c) For every item1, value1 association in prop1, there is a
corresponding association item2, value2 in prop2, such
that item1 and item2 are equal according to the equality
procedure used in constructing prop1 and prop2, and
value1 and value2 are equal according to the procedure
val_eq_p. (If val_eq_p is not supplied, the equality
procedure associated with prop1 is used instead).
--------------------------------------------------
5 Generic Data Structure Procedures on Properties
--------------------------------------------------
The following generic data structure procedures (described in
REF * DATA) are applicable to a property prop:
datalength(prop)
returns the number of entries in prop.
appdata(prop, p)
applies the procedure p to each entry of prop, each entry being
given as a 2-element list [^arg ^val].
explode(prop)
puts all the entries of prop on the stack, in the same 2-element
list form as appdata.
copy(prop) copies both its table and all its entries; thus the result
does not share structure with the original, and they can be
independently modified.
---------------------
6 Destroy Properties
---------------------
As described above, destroy properties are used to associate with an
object a procedure to be run when the object becomes garbage. There are
only two standard destroy properties in the system (sys_destroy_action
and sys_process_destroy_action) but others can be created by copying
them. By this means it is possible for an object to have more than one
destroy action associated with it (in different properties), and in such
a case, the order in which the actions get run when the object becomes
garbage is undefined.
The association of a destroy action with an object is called a
'destroy entry', and there are in fact two types: dependent and
independent. The difference between them is in the handling of destroy
actions of other objects referred to by the object or its action. It is
commonly the case that when an object becomes garbage, some of its
sub-objects (objects it has pointers to) become garbage also. If the
object and one of its sub-objects both have destroy actions, there is an
issue as to what actions should be done and in what order. Some destroy
actions do not care about whether the sub-objects' destroy actions have
been run. These are called independent actions, and the order of
execution of their and their sub-objects' destroy actions is undefined.
Others, however, require that their sub-objects still exist when
their destroy action is run (possibly because the destroy action is
going to 'save' the object ie not destroy it at all). These are called
dependent actions. When a dependent action is scheduled to be run, any
destroy actions associated with sub-objects are ignored -- the
sub-objects are in fact treated as though they were non-garbage
(typically the sub-objects will then become garbage on the NEXT garbage
collection).
The dependency status of a destroy action is controlled by a flag
called its 'dependency flag'. This is set according to the value of
sys_destroy_dependent when the entry is created, and may be inspected or
altered later using sys_destroy_isdependent.
sys_destroy_action(item) -> action [procedure]
action -> sys_destroy_action(item)
This is the standard destroy property. The destroy action action
is a procedure P or identifier record; in the latter case the
idval of the identifier at the time of running the destroy
action is used (and must be a procedure). The procedure p is
passed the object being destroyed as argument, i.e.
p(item)
If you wish to create a new destroy property you can copy
sys_destroy_action and then clear it, i.e.
copy(sys_destroy_action) -> destroy_prop;
clearproperty(destroy_prop);
sys_destroy_dependent -> bool [variable]
bool -> sys_destroy_dependent
This variable, which defaults to false, controls the setting of
the destroy dependency flag when a destroy action is assigned to
an object (in any destroy property). When false, the destroy
action is created as independent, when true it is created as
dependent.
sys_destroy_isdependent(item, destroy_prop) -> bool [procedure]
bool -> sys_destroy_isdependent(item, destroy_prop)
This procedure returns or updates the dependency flag associated
with item in the destroy property destroy_prop. A mishap occurs
is destroy_prop is not a destroy property (eg
sys_destroy_action), or if an attempt is made to update the flag
of an object not stored in the property.
sys_process_destroy_action(item) -> action [procedure]
action -> sys_process_destroy_action(item)
This is a special destroy property which, on Unix systems, is
cleared completely (with clear_prop_entries) after a sys_fork.
In all other respects it is identical to sys_destroy_action. It
is used when you only want destroy actions to be executed in the
same process they were set.
NOTE: Copies of sys_process_destroy_action will NOT be cleared
on a sys_fork, only sys_process_destroy_action.
-------------------------------
7 Examples of Using Properties
-------------------------------
The first example uses newproperty to create a property relating
people's names to their ages:
constant procedure
ageof = newproperty([[fred 23][mary 18][bill 99]], 10, 0,
"perm");
ageof("bill")=>
** 99
ageof("mary")=>
** 18
ageof("susan")=>
** 0 ;;; produces default value
"ageless" -> ageof("susan");
ageof("susan")=>
** ageless
0 -> ageof("bill"); ;;; removes entry for bill
define lconstant printage(person,age);
pr(person), pr(`\t`), pr(age), pr(newline);
enddefine;
appproperty(ageof, printage);
fred 23
mary 18
susan ageless
The second example uses newanyproperty to create a property relating
peoples names to their aliases, where an active default procedure
nochange is used to map names with no explicit alias to the name itself:
define lconstant nochange(item, prop) -> item;
lvars item, prop;
enddefine;
constant procedure
alias = newanyproperty([[henry 'Henry James']
[fred 'Alfred Tenyson']],
8, false, false, false, false,
"perm", false, nochange);
alias("henry") =>
** Henry James
alias("john") =>
** john
---------------------
8 Hashing Procedures
---------------------
Hashing procedures are procedures which take any item as argument and
return a hash code for that item, the hash code normally being a simple
integer. Although their principal use is in properties (as described
above), they are potentially usable in many situations, and for this
reason Poplog provides for a hashing procedure to be associated with
each class of object, via the * class_hash field in key structures (see
REF * KEYS). The procedure syshash when applied to an item then returns
the result of applying the class_hash of the item to the item, i.e.
class_hash(datakey(item))(item) -> hash_code
The hashing algorithm for a particular class of object can thus be
altered by changing the class_hash procedure for the key of that class.
For example:
datalength -> class_hash(word_key);
syshash("foo") =>
** 3
syshash("syshash") =>
** 7
While class_hash procedures may compute hash codes for objects in
any appropriate way, the resulting codes should conform to the following
rules:
1) Hash codes are simple integers;
2) Hash codes are independent of machine addresses (and hence
do not change after garbage collections);
3) X = Y implies syshash(X) == syshash(Y).
(note however that future versions of Poplog may use different criteria
for hashing on Prolog variables or terms).
Feature (3) makes it possible to use syshash and = as arguments for
newanyproperty, to create a hash table that maps arbitrary structures to
values on the basis of their contents (as done by newmapping above);
Feature (2) implies that syshash does not require an enclosing reference
when supplied as the hash_p argument to newanyproperty.
The result of any new user-defined class_hash procedure should obey
the three rules stated above; also, if the class_= procedure for a key
is changed, then consideration should be given to changing the
class_hash procedure too (since structures that are = are supposed to
produce == hash codes).
syshash(item) -> hash_code [procedure]
This procedure returns a hash code for any item item by applying
its class_hash procedure. The standard hashing algorithms for
different data types are shown below, many of which recursively
call syshash on sub-structures; on each call syshash decrements
pop_hash_lim locally, and if this becomes 0, further recursion
is stopped by simply returning the hash code for the key of the
current item. That is, syshash is defined as:
define syshash(item) -> hash_code;
lvars item, hash_code;
dlocal pop_hash_lim;
if pop_hash_lim == 0 then
syshash(datakey(item)) -> hash_code
else
pop_hash_lim - 1 -> pop_hash_lim;
class_hash(datakey(item))(item) -> hash_code
endif
enddefine;
pop_hash_lim -> int [variable]
int -> pop_hash_lim
This (active) integer variable (default value 3) controls the
depth to which syshash will recurse on structures or iterate on
lists. E.g. To make the hash code for a list take account of its
first N elements, assign N + 1 to pop_hash_lim.
NOTE: Common Lisp users should beware of GLOBAL changes to the
class_hash of built-in classes: the Lisp function SXHASH may cease to
work properly. In such cases, it may be safer to use the dlocal
mechanism to locally redefine a class_hash procedure. For example:
define myprogram();
dlocal % class_hash(pair_key) % = myhashpdr;
...
enddefine;
8.1 Standard Hashing Algorithms
--------------------------------
The standard class_hash algorithms for different data types are roughly
as follows:
¤ number:
intof(item)
¤ word, string:
datalength(item) -> len;
if len > 2 then
(item(1)+len) + ((item(len)+len) << 3)
+ ((item((len>>1)+1)+len) << 6)
-> n;
n + (n >> 10)
elseif len = 2 then
item(1) + (item(2) << 3)
elseif len == 1 then
item(1)
else
0
endif;
(Note that this is intended to be used modulo 10 bits, e.g. the
dictionary hashing algorithm is the above && 16:3FF.)
¤ list: (N.B. dynamic lists are NOT expanded)
syshash(pair_key) -> n;
for i from 1 to min(pop_hash_lim, listlength(item)) do
(syshash(item(i)) << i) + n -> n
endfor
¤ record:
syshash(dataword(item)) + syshash(last(item))
¤ vector:
syshash(dataword(item))
+ syshash(last(item))
+ syshash(datalength(item))
¤ array:
syshash(arrayvector(item))
¤ procedure:
datasize(item) << pdnargs(item)
¤ prologterm:
syshash(prolog_functor(item)) + syshash(prolog_arity(item))
+ syshash(prolog_arg(1, item))
¤ key:
syshash(class_dataword(item))
+-+ C.all/ref/props
+-+ Copyright University of Sussex 1992. All rights reserved.