Simple XML parsing in KCML
KCML has a built in XML 1.0 non-validating parser based on version 1.0 of James Clark's Expat which can build a tree from XML supplied either in a file (XML_OPEN) or in a string buffer (XML_PARSE_BUFFER). A program can then traverse the tree from top to bottom by calling XML_NEXT repeatedly to get the next element together with its value and any attributes.
Advantages
Disadvantages
Encodings
The data returned by the parser in XML_NEXT is always Unicode expressed as UTF-8. The parser also can accept documents in other encodings or subsets of Unicode that it can reformat into UTF-8. The parser recognizes the following encoding in the initial <? xml ?> tag.
| ISO-8859-1 |
| US-ASCII |
| UTF-8 |
| UTF-16 |
| UTF-16BE |
| UTF-16LE |
If no encoding attribute is given then the parser will try to autodetect UTF-16 and if not detected it will assume the document is in UTF-8. An error will be thrown if the data is not compatible with the declared or presumed encoding.
Note that ISO-8859-1 is the same as Latin-1 and almost identical to Windows 1252 however the Euro symbol € is at 0x0080 rather than 0x20AC. Latin-1 is default encoding for KCML 5.x and 6.00. UTF-8 is the default encoding for KCML 6.10 and above.
KCML systems can run in one of two encoding modes
Before KCML 6.20 only code page encoding was supported. The code page used in a system is determined by the locale settings for the version of Windows hosting the KCML client and must be the same for all clients. For more about code pages and Unicode see this tutorial.
Consequently in a code page encoded system you will need to convert the data returned by XML_NEXT into the local code page using $PACK(E="UTF-8"). You will first need to set the locale in byte 61 of $OPTIONS RUN to the language number for that locale. You can set the byte to HEX(FF) on a Unicode encoded system to have common code. The $PACK operation will then just be a copy.
Given data in UTF-8 you could convert it to the Korean code page CP949 with this code
CALL XML_NEXT h, SYM(tag$), SYM(val$), 0, 0 TO s STR($OPTIONS RUN,61,1) = BIN(10) REDIM cp$LEN(STR(tag$)) $UNPACK(E="UTF-8") val$ TO cp$
The encoding of a DBCS language is 1 or 2 bytes per character whereas in UTF-8 it will be 1, 2 or 3 byte per character and thus the code page string will be shorter but we make the two strings the same as that is guaranteed to work but be aware that $UNPACK will space fill. Another strategy might be to use 2*ULEN8(tag$) for the length. The $UNPACK might fail if the data was not encoded in the specified code page.
Example:
This example shows parsing XML and building a tree control displaying the values. You can either parse a file or parse the contents of the clipboard. An example XML fragment is included below to test this feature.
// Example showing simple XML parsing XMLform.Open() END - DEFFORM XMLform()={.form,.form$,.Style=0x50c000c4,.Width=316,.Height=255,.Text$="XML tree example",.Id=1024}, {.editControl1,.kcmldbedit$,.Style=0x50810080,.Left=48,.Top=7,.Width=196,.Height=13,.Id=1001,.EditGroup=.editgroup1,.Label$="&File:"}, {.cmdButton1,.button$,.Style=0x50010001,.Left=263,.Top=2,.Width=50,.Height=23,.Text$="&Open",.Id=1003,.Type=9,.Click()}, {.cmdButton2,.button$,.Style=0x50012000,.Left=263,.Top=40,.Width=50,.Height=23,.Text$="&Parse from clipboard",.Id=1004,.Click()}, {.cancel,.button$,.Style=0x50010000,.Left=263,.Top=78,.Width=50,.Height=23,.Text$="Close",.__Anchor=5,.Id=2}, {.treeControl1,.tree$,.Style=0x50810007,.Left=9,.Top=32,.Width=235,.Height=203,.Id=1000}, {.editgroup1,.EditGroup$,.Left=9,.Top=2,.Width=253,.Height=26,.Id=1002},{.textControl1,.static$,.Style=0x50000000,.Left=9,.Top=241,.Width=297,.Height=9,.Id=1005,.Font=.Bold} + DEFEVENT XMLform.cmdButton1.Click() REM parse from file LOCAL DIM s,h CALL XML_OPEN .editControl1.Text$ TO s,h IF (s==0) THEN 'parsetree(h) END EVENT + DEFEVENT XMLform.cmdButton2.Click() REM parse from clipboard LOCAL DIM s,h,clipsize,a$1 clipsize = 'kcmlgetclipboardlength() MAT REDIM a$clipsize+1 'kcmlreadclipboard(a$,clipsize) CALL XML_PARSE_BUFFER SYM(a$) TO s,h IF (s==0) THEN 'parsetree(h) END EVENT FORM END DEFSUB 'ParseTree(h) LOCAL DIM s,level,last,parent,s1,root LOCAL DIM a$(1),v$(1),name$,value$,buf$128,x(20),x,i .treeControl1.Delete() REM get the root CALL XML_NEXT h,SYM(name$),SYM(value$),SYM(a$()),SYM(v$()) TO s,x IF (s==0) parent = .treeControl1.Add(name$,1) root = parent last = 0 x = 1 REM now the tree WHILE s==0 DO CALL XML_NEXT h,SYM(name$),SYM(value$),SYM(a$()),SYM(v$()) TO s,level IF (s==0) REM build string with value and any attributes buf$ = name$ IF (value$<>" ") buf$ = buf$ & " (" & value$ & ")" END IF IF (DIM(a$(),1)>0) FOR i = 1 TO DIM(a$(),1) buf$ = & " " & a$(i) IF (DIM(v$(),1)>=i) THEN buf$ = & "=""" & v$(i) & """" NEXT i END IF SELECT CASE TRUE CASE level==x REM sibling CASE level>x REM child parent = last CASE ELSE REM back up the stack REPEAT parent = .treeControl1.Item(parent).Parent UNTIL --x==level END SELECT last = .treeControl1.Item(parent).Add(buf$,0) x = level END IF WEND ELSE IF (s==40) LOCAL DIM errcode,errtext$100,row,col,offset REM parsing error CALL XML_ERROR h TO s,errcode,errtext$,row,col,offset .textControl1.Text$ = $PRINTF("ERROR: %s at row %d, column %d",errtext$,row,col) END IF REM display top level IF (root) THEN .treeControl1.Item(root).Expand() CALL XML_CLOSE h TO s1 END SUB $DECLARE 'KCMLGetClipboardLength() $DECLARE 'KCMLReadClipboard(RETURN STR(),INT())
Example XML document
<?xml version='1.0' encoding="ISO-8859-1"?>
<!-- This file represents a fragment of a book store inventory database -->
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
<book genre="philosophy">
<title>The Gorgias</title>
<author>
<name>Plato</name>
</author>
<price>9.99</price>
</book>
</bookstore>