Re: whereas Python couldn’t get from 2 to 3 without breaking compatibility
with surrogate codepoints, utf-16 is also variable length.
from experience, it turns out that knowing character boundaries isn't as helpful as you'd think. Ken Thompson reimplemented grep for utf-8 and the matching engine only matches bytes using byte classes (think [a-z]) and |.