CREATE WORD INDEX

CREATE WORD INDEX wsindexname ON tablename ( colname [, colname ...] ) [ORDER BY [ indexname | ( colname [, colname ...] )] [indexparm [, indexparm] ...]

where the tablename should match that in the preceding CREATE TABLE statement. The wsindexname should be the same as the table name with a suffix of "_Wx" where x is the path number (1 to the maximum supported as defined in the constant _KDB_MAX_WS_INDEX). The parentheses contain the columns (maximum of _KDB_MAX_WS_KEY_COLS) defining the segments of the key. The columns specified should be of type CHAR, VARCHAR, INTEGER, NUMERIC or IBMPACKED (if the field is one of the numeric types then NONAPLHA '0123456789' should be specified so that numbers are correctly included in the index).

Matching rows will be returned by KI_WS_READ_NEXT in the order defined by any ORDER BY clause. Without an ORDER BY clause there will be no predefined ordering but it is likely that the index lookup will be faster. In general one should only use an ORDER BY clause if the final order really matters. The ORDER BY clause specifies either an indexname for an existing ordinary index on the table, or a list of up to _KDB_MAX_KEY_COLS columns. Any of the columns in the indexed table may appear in the column list. Using an index name to specify the order is a short cut for an explicit list of columns and the index itself is never used at the point of the lookup. This usage is now deprecated and new code should always explicitly list the columns.

The indexparm list describes properties of the index e.g.

[MAX integer]
[MIN integer]
[NONALPHA 'string']
[NOISE 'filename']
[BREAK IDEOGRAM]
[NORMALIZE CJKWIDTH]

MAX	Maximum length in characters of word to put into the index, longer words will be truncated (default 8, maximum 16). Although the maximum length is 16 characters the UTF-8 encoded length of the word may not exceed 16 bytes.
MIN	Minimum length in characters of word to put into the index (default 3, minimum 2). Whilst it is possible to use a minimum word length of 2 this may cause the index to become undesirably large, so the default value of 3 is recommended for most applications. Words that are numbers will be included even if less than the minimum.
NONALPHA	String of up to 16 non-alphabetic characters to be included in words. Normally non-alphanumeric characters are considered delimiters.
NOISE	Full path of file containing words that should be forced to be included or excluded from the index
BREAK IDEOGRAM	Break each ideogram/logogram into its own word
NORMALIZE CJKWIDTH	Characters within the Halfwidth and Fullwidth Forms Unicode block U+FF00-FFEF are normalized so that fullwidth ASCII characters become regular ASCII characters and halfwidth Chinese, Japanese or Korean characters become their fullwidth forms. Thus a search will find matches in both full and halwaidth forms.

The noise file is an XML file, which conforms to the following dtd:

<!ELEMENT NoiseWords (IncludeList?, ExcludeList?, ReplaceList?) >
<!ELEMENT IncludeList (Word+) >
<!ELEMENT ExcludeList (Word+) >
<!ELEMENT ReplaceList (Pair+) >
<!ELEMENT Word (#PCDATA) >
<!ELEMENT Pair(in, out) >
<!ELEMENT in (#PCDATA) >
<!ELEMENT out (#PCDATA) >

Words in the IncludeList section will be included in the index even if the fail on the basis of some other criterion such as minimum length. Similarly words in the ExcludeList section will be ignored for indexing even if valid. The optional ReplaceList section contains a list of word pairs. The first word in the pair (tagged as <in>) should contain only one character and any instances of the character in words will be replaced by the whole of the second string (tagged as <out>) provided it fits the maximum length criterion. This replacement avoids the valid character check.

Examples

CREATE WORD INDEX SL00trans_W1 ON SL00trans (fname, sname) ORDER BY (accountno) MAX 15, MIN 5

An example noise word file might be

<?xml version="1.0"?>
<NoiseWords>
	<IncludeList>
		<Word>in</Word>
		<Word>ft</Word>
		<Word>mm</Word>
		<Word>cm</Word>
	</IncludeList>
	<ExcludeList>
		<Word>silly</Word>
		<Word>stupid</Word>
	</ExcludeList>
	<ReplaceList>
		<Pair><in>ö</in><out>oe</out></Pair>
		<Pair><in>ç</in><out>c</out></Pair>
		<Pair><in>ß</in><out>ss</out></Pair>
		<Pair><in>Å</in><out>A</out></Pair>
	</ReplaceList>
</NoiseWords>