Word indexing

Word search indexing is an alternative way of accessing rows in a table. When a word search index is created on a table with CREATE WORD INDEX certain columns can be nominated for inclusion. On an insert or update KCML will parse those columns breaking them into words which are then stored, together with the relevant ROWIDs in a compressed form in an index. When adding an index you can nominate an ordering either by listing columns or by referencing an existing index.

A word is defined as a run of alphabetic characters separated by whitespace or punctuation but it is possible to tell KCML to consider other characters such as digits using the NONALPHA clause. In particular numbers can be considered as words and INTEGER and NUMERIC columns can be indexed this way. When parsing into words only words longer than a minimum number of characters, usually 3 will be considered. With longer words, only a maximum number of characters, usually 8, will be considered and the rest of the word will be ignored. However numbers are always indexed irrespective of length. The minimum and maximums are set in the CREATE WORD INDEX statement.

Noise words like 'the' or 'and' can be ignored while words that would otherwise be ignored, like 'mm' or 'cm', can be included by listing them in an XML file passed to KCML in the CREATE WORD INDEX.

Once a word index exists it will be kept up to date automatically.

To use a word index you call KI_WS_START giving it a string containing the words of interest. This string will be parsed using the same rules as used to index the data and KCML will make a list of the rows in the table that contain all these words sorted into the order specified for the index. The program can then traverse this result set bidirectionally using KI_WS_READ_NEXT. When the result set is no longer required it should be dropped with a call to KI_WS_END.

Some words will not be very selective so KCML will have to truncate the search to avoid consuming unreasonable amounts of memory. The maximum number of rows in a result is held in the handle attribute _KDB_HAND_ATTR_WSMAXROWS and defaults to 1000. If KCML is forced to truncate the results set then KI_WS_START will return the number of rows found as a negative number.

Rows can be excluded from word indexing by using the EXCLUDE COLUMN property of the table. This can be set when the table is created with CREATE TABLE or later with ALTER TABLE.

Compatibility

Prior to KCML 6 and the type 7 table storage format there was an earlier word indexing implementation which is now considered obsolete. The API calls KI_WS_CREATE, KI_WS_OPEN, KI_WS_READ, KI_WS_WRITE, KI_WS_REWRITE and KI_WS_DELETE are not required in a KDB database and should not be used.