Forum Discussion

EnTerr's avatar
EnTerr
Roku Guru
11 years ago

Fetch by ID or CLASS attribute in roXML?

I find it usable (nay - enjoyable) how i can drill-down with dot-operator, say how
? xml.body.article.section.table.tr.td.table
gives me all the nested tables at that level. Even as they are in separate branches (split occurring at <tr>), the implicit ifXmlList.getNamedElements() keep filtering down the subsets because of the way it is defined. Very nice, while barely known.

Is there some trick like this i can use to pin-point the element with certain ID attribute? Other than manually twiddling through the whole tree?

1 Reply

  • I wrote this to satisfy my need:
    function select_by_attribute(xml as Object, attrName as String, attrValue as String) as Object:
    res = []
    if xml = invalid then return res

    typ = type(xml)
    if typ = "roXMLElement":
    if xml.getAttributes()[attrName] = attrValue then res.push(xml)
    res.append( select_by_attribute(xml.GetChildElements(), attrName, attrValue) )
    else if typ = "roXMLList" or typ = "roList" or typ = "roArray":
    for each x in xml:
    res.append( select_by_attribute(x, attrName, attrValue) )
    end for
    else if typ = "roAssociativeArray":
    if xml[attrName] = attrValue then res.push(xml)
    res.append( select_by_attribute(xml.__, attrName, attrValue) )
    else: 'error condition
    ? typ, xml
    STOP
    end if

    return res
    end function

    When invoked, it does a DFS (depth-first search) walk over the xml tree and returns an array of all nodes where the attribute has the desired value. For example, select_by_attribute(html, "class", "image") will give me list of all html tags with class="image".

    Detail notes:

    • Because of the order it walks the tree, the matching nodes in the returned list are in the same order in which they appeared textually in the ML file. In other words in the order ctrl-F would have found them in browser/text editor.

    • It always returns roArray, even if empty or say we were looking by id, e.g. select_by_attribute(html, "id", "postingbody"). Semantically element ids are unique in html but fn does not know nor care, for generality.

    • It does not work with elements that belong to multiple classes (e.g. <div class="slide first"> belongs both to "slide" and to "first"). Because i don't need it - but is simple to implement

    • If multiple selects for different attributes/values will be done, walking the tree every time is slow. I don't need it for my purposes but it is relatively easy to re-factor the function so that a single call/walk builds an index by class names so later multiple dictionary lookups can be done by class name, returning list of all matching nodes.

    Comments/questions are welcome.