I wrote this to satisfy my need:
function select_by_attribute(xml as Object, attrName as String, attrValue as String) as Object:
res = []
if xml = invalid then return res
typ = type(xml)
if typ = "roXMLElement":
if xml.getAttributes()[attrName] = attrValue then res.push(xml)
res.append( select_by_attribute(xml.GetChildElements(), attrName, attrValue) )
else if typ = "roXMLList" or typ = "roList" or typ = "roArray":
for each x in xml:
res.append( select_by_attribute(x, attrName, attrValue) )
end for
else if typ = "roAssociativeArray":
if xml[attrName] = attrValue then res.push(xml)
res.append( select_by_attribute(xml.__, attrName, attrValue) )
else: 'error condition
? typ, xml
STOP
end if
return res
end function
When invoked, it does a DFS (depth-first search) walk over the xml tree and returns an array of all nodes where the attribute has the desired value. For example,
select_by_attribute(html, "class", "image") will give me list of all html tags with
class="image".
Detail notes:
- Because of the order it walks the tree, the matching nodes in the returned list are in the same order in which they appeared textually in the ML file. In other words in the order ctrl-F would have found them in browser/text editor.
- It always returns roArray, even if empty or say we were looking by id, e.g. select_by_attribute(html, "id", "postingbody"). Semantically element ids are unique in html but fn does not know nor care, for generality.
- It does not work with elements that belong to multiple classes (e.g. <div class="slide first"> belongs both to "slide" and to "first"). Because i don't need it - but is simple to implement
- If multiple selects for different attributes/values will be done, walking the tree every time is slow. I don't need it for my purposes but it is relatively easy to re-factor the function so that a single call/walk builds an index by class names so later multiple dictionary lookups can be done by class name, returning list of all matching nodes.
Comments/questions are welcome.