Forum Discussion

renojim's avatar
renojim
Community Streaming Expert
14 years ago

Regular expressions and hex characters

It's quite possible I'm doing something wrong, but it appears the \x escape sequence isn't recognized. Example:
ba = CreateObject("roByteArray")
ba[0] = &hE2
ba[1] = &h80
ba[2] = &h99
str = ba.ToAsciiString()
regex = CreateObject("roRegEx","\xe2","")
print regex.IsMatch(str)

This tries to match the hex byte E2, but the IsMatch() returns false.

-JT

2 Replies

  • i think you are running into issue related to string encoding, likely one of those gets treated as UTF-8. When i try your code on &h12 &h10 &h19 and \x12 it works just fine, probably with everything else in range 0-x7f too.

    On your example
    BrightScript Debugger> ? len(str)
    1
    seems to show you are not getting what you expect (len should be 3)

    PS. ok here is what's going on: bytes e2 80 99 is a single character encoded in utf-8:
    in binary, utf8 prefixes marked: (1110)0010 (10)000000 (10)011001
    the encoded symbol is: 0010-000000-011001 (bin) or 0x2019 (right-single-quotation-mark)

    So this is how it should be reg-ex-ed:
    BrightScript Debugger> regex = CreateObject("roRegEx","\x{2019}","")
    BrightScript Debugger> ? regex.IsMatch(str)
    true

    Based on the syntax for Unicode (\u2019 did not work), chances are the library underneath is PCRE
  • renojim's avatar
    renojim
    Community Streaming Expert
    Thanks! I knew I was missing something. By the way, the documentation does state that Brightscript uses the PCRE library.

    -JT