REF PROPS John Gibson Apr 1992 COPYRIGHT University of Sussex 1992. All Rights Reserved. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<< PROPERTIES >>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< This REF file describes the difference between the types of properties and when each type should be used. It details how to construct properties, the data structure procedures which apply, and also gives some examples of the use of properties. Finally the use of hashing procedures and hashing algorithims is described. CONTENTS - (Use <ENTER> g to access required sections) 1 Introduction 2 Predicates on Properties 3 Constructing Properties 4 Manipulating Properties 5 Generic Data Structure Procedures on Properties 6 Destroy Properties 7 Examples of Using Properties 8 Hashing Procedures 8.1 Standard Hashing Algorithms --------------- 1 Introduction --------------- A property is a means of storing associations between items: the property itself is a procedure which when called with some item item as argument will return the item associated with item, if there is one, or a user-supplied default value otherwise. Associations between items may be set up by calling the same procedure in update mode, or by supplying a list of initial associations when the property is constructed using newproperty or newanyproperty. Properties come in two basic types: permanent and temporary (in addition to a special destroy type for associating destroy actions with objects, see below). The difference between permanent and temporary involves the interaction of the property with garbage collection. Suppose a property prop has an entry associating an item val (the value) with an item arg (the argument), and suppose that there are no other references remaining to the argument arg elsewhere in the system (i.e. arg is otherwise garbage). Then, if the entry were treated by garbage collection like any other record, the presence of arg in the entry would prevent it (i.e arg) from being garbaged. Similar remarks apply to the value val if that too has no other references remaining. For some applications this may be the desired behaviour; for others, the property entry may no longer be required at all if either ARG or VAL are otherwise garbage. The system's action in this respect can therefore be specified for the entries in a given property, by supplying one of the following keyword arguments when the property is initially constructed (with newproperty or newanyproperty): Keyword Meaning ------- ------- "perm" Permanent: entries are never discarded. "tmparg" Temporary on the argument: entries are discarded for which the arg items are otherwise garbage. "tmpval" Temporary on the value: entries are discarded for which the val items are otherwise garbage. "tmpboth" Temporary on both: entries are discarded for which either the arg or val items are otherwise garbage. "tmpclr" Temporary cleared: all entries are cleared from the property at every garbage collection. (Note of course that if an entry is kept in any case, then both arg and val items are forced to be retained as well, regardless of whether they were garbage or not.) Thus for example, "perm" properties may be used for retaining permanent tables of information, and "tmparg" properties for attaching ad hoc pieces of information to arbitrary records (which will be garbage collected along with the records). "tmpclr" properties are useful for short-term caching of computed data. A "destroy" property is a special kind of "tmparg" property. Like the latter the presence of arg as the argument of a destroy property entry will not stop it from being considered as garbage, but instead of simply discarding arg, the entry containing it is added to a special list maintained by the garbage collector. At the end of the garbage collection those destroy entries whose argument would otherwise have become garbage (i.e. and the entry discarded) are examined. If their value slot contains a procedure, or an identifier whose idval is a procedure, the procedure is run, with arg passed as argument. Only then is arg discarded in the same manner as with temporary properties. Destroy properties thus allow arbitrary procedures to be attached to objects in such a way that the procedure gets run, with the object as argument, at the point at which the object would otherwise have become garbage (the point of its destruction). Properties are organised around a table of "buckets", each of which contains a list of entries; which bucket an entry is put in is determined by a hashing algorithm. The number of buckets is user-specifiable: a larger number of buckets will decrease the access time for entry (since the lists in each bucket will be shorter), but at the cost of using more space. Using the more verbose procedure newanyproperty users may specify their own hashing algorithm: this is particularly useful if you wish to associate the same data with items that are not always identical (as tested with ==) but merely structurally equal (as tested by =). Users may also create "expandable" properties, the tables of which will automatically enlarge by a given factor whenever the table passes a given threshold value. Expandable properties will never decrease in size even if the number of entries in the property drops below the given threshold. newanyproperty also allows the creation of "active" properties. When an item is not found in an active property then instead of returning the default value, the item and the property is passed to a user supplied procedure to return the result. A facility for constructing "multi-dimensional" properties is also provided by the "sparse array" mechanism, described in REF * ARRAYS --------------------------- 2 Predicates on Properties --------------------------- As mentioned above, properties are procedures: thus isprocedure is true for any property. isproperty(item) -> bool [procedure] Returns true if item is a property, false if not. -------------------------- 3 Constructing Properties -------------------------- newproperty(assoc_list, tab_size, default, gc_type) -> prop [procedure] This procedure returns a new property prop. The arguments are: assoc_list An initial list of associations, where each association is a 2-element list of the form [<argument> <value>]. tab_size The table size, i.e. the number of buckets to be used. default The default value to be produced when there is no entry for a given argument. gc_type A keyword specifying the permanent/temporary status of the property entries with respect to garbage collection (as described in the "Introduction" section): Keyword Meaning ------- ------- "perm" Permanent: entries are never discarded. "tmparg" Temporary on the argument: entries are discarded for which the argument items are otherwise garbage. "tmpval" Temporary on the value: entries are discarded for which the value items are otherwise garbage. "tmpboth" Temporary on both: entries are discarded for which either the argument or value items are otherwise garbage. "tmpclr" Temporary cleared: all entries are cleared from the property at every garbage collection. Note that assigning the default value default to an argument item actually removes the entry for that item (if there was one). newassoc(assoc_list) -> prop [procedure] Provides a simpler interface to newproperty, for constructing permanent properties with default value false. Defined as newproperty(assoc_list, 20, false, "perm") -> prop where assoc_list is as for newproperty. newanyproperty(assoc_list, tab_size, expand_pow, thresh, [procedure] hash_p, equal_p, gc_type, default, active_default) -> prop This is for constructing more complex properties. It returns a new property prop, where the arguments are: assoc_list An initial list of associations, as above. tab_size The initial table size, as above. expand_pow false for a static property, or an integer for an expandable one. When thresh entries have been added the table size of the property will increase by the formula: new size = old size * (2 ** expand_pow) thresh An integer expansion threshold, as explained above (or false if expand_pow is false). After expansion, the threshold is increased in proportion to the property's size. hash_p Hashing procedure, or false to use the default hashing algorithm. The hashing procedure is applied as hash_p(item) -> bucket where bucket is an item specifying a bucket in the table. bucket can be a simple integer (first bucket is 0), or any other item (in which case the address of that item is used). In all cases, bucket is taken modulo the current table size. (See Hashing Procedures below). To specify that the hashing algorithm relies on the address of an item, and that the property must therefore be rehashed after every garbage collection, enclose the procedure in a reference, i.e. consref(hash_p). equal_p Equality procedure used to check that an item matches on in the table. The default procedures (used if equal_p is false) is absolute identity (==). You cannot specify an equal_p if you have not specified a hash_p. gc_type A keyword specifying the permanent/temporary status of the property entries with respect to garbage collection; same as the gc_type argument to newproperty above. Note that because the concept of an argument item in a property entry being 'otherwise garbage' is (generally) irrelevant when entries are not matched on the basis of absolute identity, you should exercise care in using the "tmparg" and "tmpboth" types with an equal_p other than (or defaulting to) == . default Default value, as for newproperty above. active_default Active default procedure for the property or false otherwise. If an item cannot be found in the table default is returned if active_default is false, otherwise active_default is applied to the item and the property, i.e. active_default(item, prop) -> value newmapping(assoc_list, tab_size, default, expand) -> prop [procedure] Provides a simpler interface to newanyproperty for constructing properties that match argument items on the basis of the standard structure equality operator "=" (see REF * DATA), and that use the standard procedure syshash to hash items into buckets (see the description of syshash below). If true, the expand argument specifies that the property should have an expand_pow of 1 and a thresh of tab_size (where tab_size is also the initial table size). newmapping is thus defined as newanyproperty(assoc_list, tab_size, if expand then 1, tab_size else false, false endif, syshash, nonop =, "perm", default, false) -> prop -------------------------- 4 Manipulating Properties -------------------------- See also Fast Operations on Properties in REF * FASTPROCS for unsupported procedures which enable some property operations to be performed quicker. appproperty(prop, p) [procedure] Applies the procedure p to each entry in the property prop. For each entry, p is given two arguments, i.e p(arg, value) Note that p is effectively applied to a temporary copy of prop so that any alterations made by the procedure p to prop itself during the execution of appproperty will have no effect on the entries to which p is applied (thus p can safely delete entries or add new ones). Also see fast_appproperty. Note also that since the organisation of property entries depends on a hashing algorithm (and also on the order in which entries are added, etc), the order in which entries are given to p is indeterminate (and further, may change between garbage collections). fast_appproperty(prop, p) [procedure] Applies the procedure p to each entry in the property prop. For each entry, p is given two arguments, i.e. p(arg, value) This is just like appproperty except that property table is not copied first. This means that fast_appproperty should not be used if the procedure is going to change any of the entries in the property. clearproperty(prop) [procedure] Removes all entries from the property prop. clear_prop_entries(prop) [procedure] Removes all entries from the property prop. This is functionally the same as clearproperty with the addition of clearing the contents of the individual "property entry" data structures in prop. This is only relevant if you use fast_get_prop_entry to access the internal structure of prop. property_default(prop) -> item [procedure] Returns the default value for the property prop. property_size(prop) -> tab_size [procedure] Returns the current table size for the property prop. For a non-expandable property, this will be the same as the tab_size specified to newproperty or newanyproperty when the property was created. property_active(prop) -> active_default [procedure] active_default -> property_active(prop) Returns or updates the active default procedure for the property prop (which is either a procedure or false, see the description of the active_default argument to newanyproperty above). A false argument to the updater will remove the active default for prop (so that the property_default value will henceforth be returned for any item not in the property table). Note that the updater of this procedure can be used to give an active default procedure to any property, including one constructed with newproperty. Prop_entry_state_init [procedure] Prop_entry_state_next [procedure] These two system procedures are used in LIB * in_property, to support the in_property for-form. property_equal(prop1, prop2) -> bool [procedure] property_equal(prop1, prop2, val_eq_p) -> bool An equality predicate for properties. Two properties are equal if the following conditions hold: (a) They both use the same equality procedure (6'th argument to newanyproperty) when locating and removing entries. (b) They both contain the same number of item/value associations. (c) For every item1, value1 association in prop1, there is a corresponding association item2, value2 in prop2, such that item1 and item2 are equal according to the equality procedure used in constructing prop1 and prop2, and value1 and value2 are equal according to the procedure val_eq_p. (If val_eq_p is not supplied, the equality procedure associated with prop1 is used instead). -------------------------------------------------- 5 Generic Data Structure Procedures on Properties -------------------------------------------------- The following generic data structure procedures (described in REF * DATA) are applicable to a property prop: datalength(prop) returns the number of entries in prop. appdata(prop, p) applies the procedure p to each entry of prop, each entry being given as a 2-element list [^arg ^val]. explode(prop) puts all the entries of prop on the stack, in the same 2-element list form as appdata. copy(prop) copies both its table and all its entries; thus the result does not share structure with the original, and they can be independently modified. --------------------- 6 Destroy Properties --------------------- As described above, destroy properties are used to associate with an object a procedure to be run when the object becomes garbage. There are only two standard destroy properties in the system (sys_destroy_action and sys_process_destroy_action) but others can be created by copying them. By this means it is possible for an object to have more than one destroy action associated with it (in different properties), and in such a case, the order in which the actions get run when the object becomes garbage is undefined. The association of a destroy action with an object is called a 'destroy entry', and there are in fact two types: dependent and independent. The difference between them is in the handling of destroy actions of other objects referred to by the object or its action. It is commonly the case that when an object becomes garbage, some of its sub-objects (objects it has pointers to) become garbage also. If the object and one of its sub-objects both have destroy actions, there is an issue as to what actions should be done and in what order. Some destroy actions do not care about whether the sub-objects' destroy actions have been run. These are called independent actions, and the order of execution of their and their sub-objects' destroy actions is undefined. Others, however, require that their sub-objects still exist when their destroy action is run (possibly because the destroy action is going to 'save' the object ie not destroy it at all). These are called dependent actions. When a dependent action is scheduled to be run, any destroy actions associated with sub-objects are ignored -- the sub-objects are in fact treated as though they were non-garbage (typically the sub-objects will then become garbage on the NEXT garbage collection). The dependency status of a destroy action is controlled by a flag called its 'dependency flag'. This is set according to the value of sys_destroy_dependent when the entry is created, and may be inspected or altered later using sys_destroy_isdependent. sys_destroy_action(item) -> action [procedure] action -> sys_destroy_action(item) This is the standard destroy property. The destroy action action is a procedure P or identifier record; in the latter case the idval of the identifier at the time of running the destroy action is used (and must be a procedure). The procedure p is passed the object being destroyed as argument, i.e. p(item) If you wish to create a new destroy property you can copy sys_destroy_action and then clear it, i.e. copy(sys_destroy_action) -> destroy_prop; clearproperty(destroy_prop); sys_destroy_dependent -> bool [variable] bool -> sys_destroy_dependent This variable, which defaults to false, controls the setting of the destroy dependency flag when a destroy action is assigned to an object (in any destroy property). When false, the destroy action is created as independent, when true it is created as dependent. sys_destroy_isdependent(item, destroy_prop) -> bool [procedure] bool -> sys_destroy_isdependent(item, destroy_prop) This procedure returns or updates the dependency flag associated with item in the destroy property destroy_prop. A mishap occurs is destroy_prop is not a destroy property (eg sys_destroy_action), or if an attempt is made to update the flag of an object not stored in the property. sys_process_destroy_action(item) -> action [procedure] action -> sys_process_destroy_action(item) This is a special destroy property which, on Unix systems, is cleared completely (with clear_prop_entries) after a sys_fork. In all other respects it is identical to sys_destroy_action. It is used when you only want destroy actions to be executed in the same process they were set. NOTE: Copies of sys_process_destroy_action will NOT be cleared on a sys_fork, only sys_process_destroy_action. ------------------------------- 7 Examples of Using Properties ------------------------------- The first example uses newproperty to create a property relating people's names to their ages: constant procedure ageof = newproperty([[fred 23][mary 18][bill 99]], 10, 0, "perm"); ageof("bill")=> ** 99 ageof("mary")=> ** 18 ageof("susan")=> ** 0 ;;; produces default value "ageless" -> ageof("susan"); ageof("susan")=> ** ageless 0 -> ageof("bill"); ;;; removes entry for bill define lconstant printage(person,age); pr(person), pr(`\t`), pr(age), pr(newline); enddefine; appproperty(ageof, printage); fred 23 mary 18 susan ageless The second example uses newanyproperty to create a property relating peoples names to their aliases, where an active default procedure nochange is used to map names with no explicit alias to the name itself: define lconstant nochange(item, prop) -> item; lvars item, prop; enddefine; constant procedure alias = newanyproperty([[henry 'Henry James'] [fred 'Alfred Tenyson']], 8, false, false, false, false, "perm", false, nochange); alias("henry") => ** Henry James alias("john") => ** john --------------------- 8 Hashing Procedures --------------------- Hashing procedures are procedures which take any item as argument and return a hash code for that item, the hash code normally being a simple integer. Although their principal use is in properties (as described above), they are potentially usable in many situations, and for this reason Poplog provides for a hashing procedure to be associated with each class of object, via the * class_hash field in key structures (see REF * KEYS). The procedure syshash when applied to an item then returns the result of applying the class_hash of the item to the item, i.e. class_hash(datakey(item))(item) -> hash_code The hashing algorithm for a particular class of object can thus be altered by changing the class_hash procedure for the key of that class. For example: datalength -> class_hash(word_key); syshash("foo") => ** 3 syshash("syshash") => ** 7 While class_hash procedures may compute hash codes for objects in any appropriate way, the resulting codes should conform to the following rules: 1) Hash codes are simple integers; 2) Hash codes are independent of machine addresses (and hence do not change after garbage collections); 3) X = Y implies syshash(X) == syshash(Y). (note however that future versions of Poplog may use different criteria for hashing on Prolog variables or terms). Feature (3) makes it possible to use syshash and = as arguments for newanyproperty, to create a hash table that maps arbitrary structures to values on the basis of their contents (as done by newmapping above); Feature (2) implies that syshash does not require an enclosing reference when supplied as the hash_p argument to newanyproperty. The result of any new user-defined class_hash procedure should obey the three rules stated above; also, if the class_= procedure for a key is changed, then consideration should be given to changing the class_hash procedure too (since structures that are = are supposed to produce == hash codes). syshash(item) -> hash_code [procedure] This procedure returns a hash code for any item item by applying its class_hash procedure. The standard hashing algorithms for different data types are shown below, many of which recursively call syshash on sub-structures; on each call syshash decrements pop_hash_lim locally, and if this becomes 0, further recursion is stopped by simply returning the hash code for the key of the current item. That is, syshash is defined as: define syshash(item) -> hash_code; lvars item, hash_code; dlocal pop_hash_lim; if pop_hash_lim == 0 then syshash(datakey(item)) -> hash_code else pop_hash_lim - 1 -> pop_hash_lim; class_hash(datakey(item))(item) -> hash_code endif enddefine; pop_hash_lim -> int [variable] int -> pop_hash_lim This (active) integer variable (default value 3) controls the depth to which syshash will recurse on structures or iterate on lists. E.g. To make the hash code for a list take account of its first N elements, assign N + 1 to pop_hash_lim. NOTE: Common Lisp users should beware of GLOBAL changes to the class_hash of built-in classes: the Lisp function SXHASH may cease to work properly. In such cases, it may be safer to use the dlocal mechanism to locally redefine a class_hash procedure. For example: define myprogram(); dlocal % class_hash(pair_key) % = myhashpdr; ... enddefine; 8.1 Standard Hashing Algorithms -------------------------------- The standard class_hash algorithms for different data types are roughly as follows: ¤ number: intof(item) ¤ word, string: datalength(item) -> len; if len > 2 then (item(1)+len) + ((item(len)+len) << 3) + ((item((len>>1)+1)+len) << 6) -> n; n + (n >> 10) elseif len = 2 then item(1) + (item(2) << 3) elseif len == 1 then item(1) else 0 endif; (Note that this is intended to be used modulo 10 bits, e.g. the dictionary hashing algorithm is the above && 16:3FF.) ¤ list: (N.B. dynamic lists are NOT expanded) syshash(pair_key) -> n; for i from 1 to min(pop_hash_lim, listlength(item)) do (syshash(item(i)) << i) + n -> n endfor ¤ record: syshash(dataword(item)) + syshash(last(item)) ¤ vector: syshash(dataword(item)) + syshash(last(item)) + syshash(datalength(item)) ¤ array: syshash(arrayvector(item)) ¤ procedure: datasize(item) << pdnargs(item) ¤ prologterm: syshash(prolog_functor(item)) + syshash(prolog_arity(item)) + syshash(prolog_arg(1, item)) ¤ key: syshash(class_dataword(item)) +-+ C.all/ref/props +-+ Copyright University of Sussex 1992. All rights reserved.