Simple XML parsing in KCML

KCML has a built in XML 1.0 non-validating parser based on version 1.0 of James Clark's Expat which can build a tree from XML supplied either in a file (XML_OPEN) or in a string buffer (XML_PARSE_BUFFER). A program can then traverse the tree from top to bottom by calling XML_NEXT repeatedly to get the next element together with its value and any attributes.

Advantages

Disadvantages

Encodings

The data returned by the parser in XML_NEXT is always Unicode expressed as UTF-8. The parser also can accept documents in other encodings or subsets of Unicode that it can reformat into UTF-8. The parser recognizes the following encoding in the initial <? xml ?> tag.

ISO-8859-1
US-ASCII
UTF-8
UTF-16
UTF-16BE
UTF-16LE

If no encoding attribute is given then the parser will try to autodetect UTF-16 and if not detected it will assume the document is in UTF-8. An error will be thrown if the data is not compatible with the declared or presumed encoding.

Note that ISO-8859-1 is the same as Latin-1 and almost identical to Windows 1252 however the Euro symbol € is at 0x0080 rather than 0x20AC. Latin-1 is default encoding for KCML 5.x and 6.00. UTF-8 is the default encoding for KCML 6.10 and above.

KCML systems can run in one of two encoding modes

Before KCML 6.20 only code page encoding was supported. The code page used in a system is determined by the locale settings for the version of Windows hosting the KCML client and must be the same for all clients. For more about code pages and Unicode see this tutorial.

Consequently in a code page encoded system you will need to convert the data returned by XML_NEXT into the local code page using $PACK(E="UTF-8"). You will first need to set the locale in byte 61 of $OPTIONS RUN to the language number for that locale. You can set the byte to HEX(FF) on a Unicode encoded system to have common code. The $PACK operation will then just be a copy.

Given data in UTF-8 you could convert it to the Korean code page CP949 with this code

CALL XML_NEXT h, SYM(tag$), SYM(val$), 0, 0 TO s
STR($OPTIONS RUN,61,1) = BIN(10)
REDIM cp$LEN(STR(tag$))
$UNPACK(E="UTF-8") val$ TO cp$

The encoding of a DBCS language is 1 or 2 bytes per character whereas in UTF-8 it will be 1, 2 or 3 byte per character and thus the code page string will be shorter but we make the two strings the same as that is guaranteed to work but be aware that $UNPACK will space fill. Another strategy might be to use 2*ULEN8(tag$) for the length. The $UNPACK might fail if the data was not encoded in the specified code page.

Example:

This example shows parsing XML and building a tree control displaying the values. You can either parse a file or parse the contents of the clipboard. An example XML fragment is included below to test this feature.

// Example showing simple XML parsing

XMLform.Open()
END
- DEFFORM XMLform()={.form,.form$,.Style=0x50c000c4,.Width=316,.Height=255,.Text$="XML tree example",.Id=1024},       {.editControl1,.kcmldbedit$,.Style=0x50810080,.Left=48,.Top=7,.Width=196,.Height=13,.Id=1001,.EditGroup=.editgroup1,.Label$="&File:"}, {.cmdButton1,.button$,.Style=0x50010001,.Left=263,.Top=2,.Width=50,.Height=23,.Text$="&Open",.Id=1003,.Type=9,.Click()}, {.cmdButton2,.button$,.Style=0x50012000,.Left=263,.Top=40,.Width=50,.Height=23,.Text$="&Parse from clipboard",.Id=1004,.Click()}, {.cancel,.button$,.Style=0x50010000,.Left=263,.Top=78,.Width=50,.Height=23,.Text$="Close",.__Anchor=5,.Id=2}, {.treeControl1,.tree$,.Style=0x50810007,.Left=9,.Top=32,.Width=235,.Height=203,.Id=1000},          {.editgroup1,.EditGroup$,.Left=9,.Top=2,.Width=253,.Height=26,.Id=1002},{.textControl1,.static$,.Style=0x50000000,.Left=9,.Top=241,.Width=297,.Height=9,.Id=1005,.Font=.Bold}
    + DEFEVENT XMLform.cmdButton1.Click()
        REM parse from file
        LOCAL DIM s,h
        CALL XML_OPEN .editControl1.Text$ TO s,h
        IF (s==0) THEN 'parsetree(h)
    END EVENT
    + DEFEVENT XMLform.cmdButton2.Click()
        REM parse from clipboard
        LOCAL DIM s,h,clipsize,a$1
        clipsize = 'kcmlgetclipboardlength()
        MAT REDIM a$clipsize+1
        'kcmlreadclipboard(a$,clipsize)
        CALL XML_PARSE_BUFFER SYM(a$) TO s,h
        IF (s==0) THEN 'parsetree(h)
    END EVENT
FORM END
DEFSUB 'ParseTree(h)
LOCAL DIM s,level,last,parent,s1,root
LOCAL DIM a$(1),v$(1),name$,value$,buf$128,x(20),x,i
.treeControl1.Delete()
REM get the root
CALL XML_NEXT h,SYM(name$),SYM(value$),SYM(a$()),SYM(v$()) TO s,x
IF (s==0)
    parent = .treeControl1.Add(name$,1)
    root = parent
    last = 0
    x = 1
    REM now the tree
    WHILE s==0 DO
        CALL XML_NEXT h,SYM(name$),SYM(value$),SYM(a$()),SYM(v$()) TO s,level
        IF (s==0)
            REM build string with value and any attributes
            buf$ = name$
            IF (value$<>" ")
                buf$ = buf$ & " (" & value$ & ")"
            END IF
            IF (DIM(a$(),1)>0)
                FOR i = 1 TO DIM(a$(),1)
                    buf$ = & " " & a$(i)
                    IF (DIM(v$(),1)>=i) THEN buf$ = & "=""" & v$(i) & """"
                NEXT i
            END IF
            SELECT CASE TRUE
            CASE level==x
                REM sibling
            CASE level>x
                REM child
                parent = last
            CASE ELSE
                REM back up the stack
                REPEAT
                    parent = .treeControl1.Item(parent).Parent
                UNTIL  --x==level
            END SELECT
            last = .treeControl1.Item(parent).Add(buf$,0)
            x = level
        END IF
    WEND
ELSE IF (s==40)
    LOCAL DIM errcode,errtext$100,row,col,offset
    REM parsing error
    CALL XML_ERROR h TO s,errcode,errtext$,row,col,offset
    .textControl1.Text$ = $PRINTF("ERROR: %s at row %d, column %d",errtext$,row,col)
END IF
REM display top level
IF (root) THEN .treeControl1.Item(root).Expand()
CALL XML_CLOSE h TO s1
END SUB
$DECLARE 'KCMLGetClipboardLength()
$DECLARE 'KCMLReadClipboard(RETURN STR(),INT())

Example XML document

<?xml version='1.0' encoding="ISO-8859-1"?>
<!-- This file represents a fragment of a book store inventory database -->
<bookstore>
  <book genre="autobiography">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <price>8.99</price>
  </book>
  <book genre="novel">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <price>11.99</price>
  </book>
  <book genre="philosophy">
    <title>The Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <price>9.99</price>
  </book>
</bookstore>