I doubt it is any consolation but here `len()` is doing the right thing: the utf8 byte sequence you gave (ec 97 b0) represents a single unicode character, U+C5F0, hence the string is 1 character long.
(And yes, i understand your plight and have no better solution for it)