UNEXT8(

General Form:

UNEXT8(strexpr, numexpr, numexpr(Optional))

Where:

strexpr= an alphanumeric expression corresponding to a UTF-8 encoded string
numexpr= a byte index into the string corresponding to the first byte of the current character. Counted from 1.
numexpr(Optional)= the number of characters to advance. If this is not specified, the function will return the position of the character following the current character. Must be greater than zero.

The UNEXT8( function is used to return the byte index of the first byte of a character in a string expression which is presumed to represent a UTF-8 encoded Unicode string. In UTF-8 the characters are encoded as one, two or three byte sequences. The second argument must contain the byte index of the first byte of the current character. The third (optional) argument specifies how many characters to move forwards. If no argument is supplied, UNEXT8 will return the byte index of the first byte of the character following the current character. If a larger argument is supplied, the byte index of the first byte of subsequent characters will be returned. Any argument less than 1 will cause an error. The UNEXT8( function is valid wherever a numeric function is legal.

Note that if the string contains trailing spaces, UNEXT8( will return the position of those if called with the appropriate arguments. This is most relevent when using UNEXT8( in a loop to move through a string.

The function will return 0 if the string does not contain enough characters to complete the request. This can occur either by calling UNEXT8( with the index of the last character and no optional argument, or an optional argument sufficiently large as to run off the end of the string.

The function will return -1 if there is a problem with the current position argument. This can be the result of:

The function will return -2 in the case of invalid UTF-8, for example if the byte at the current position is HEX(FE) or HEX(FF). In versions of KCML before 7.13, the function would return 0.

Example:

REM must be run with the USING_UTF8 environment variable set to true
DIM a$="$€£"
IF (a$ <> HEX(24E282ACC2A3)) THEN PANIC
IF (UNEXT8(a$,1)) <> 2 THEN PANIC
IF (UNEXT8(a$,2)) <> 5 THEN PANIC
IF (UNEXT8(a$,1,2)) <> 5 THEN PANIC

See also:

UPREV8(
ULEN8(