Unicode (not to be confused with Unabomber) is awesome, The fact that facebook is now on board only makes me wish we had it when i was at school 40 years ago. Imagine writing your homework in Cherokee using 16 bit or 32 bit variants and telling your teacher it was Big Endian. What a laaaaarf we would have had.
Facebook gains power to Like any word ever written
Facebook as signed up as a full member of the Unicode Consortium, the body that universal character encoding standard for written characters and text. Why should we care? Because Facebook is just the eleventh full member of the organisation and now has voting rights alongside the likes of Google, Apple, Oracle, SAP, Microsoft …
COMMENTS
-
This post has been deleted by its author
-
-
Wednesday 16th September 2015 08:54 GMT Forget It
Come on unicode you can dot it!
In Chinese, emphasis in body text is supposed to be indicated by using an "emphasis mark"which is a dot placed under each character to be emphasized. This is still taught in schools but in practice it is not usually done, probably due to the difficulty of doing this using most computer software.
src: http://en.wikipedia.org/wiki/Emphasis_%28typography%29#Punctuation_marks
-
-
Wednesday 16th September 2015 09:30 GMT Chairo
Asian languages and Unicode
The implementation of Chinese characters is really a problem in Unicode. You have simplified ones, used in Mainland China, traditional ones used in Taiwan and a slightly different subset of the traditional ones used in Japan + an additional of ~100 other characters that they use for the two other alphabets they use in Japan. All in all we talk about a set of several thousand characters for each set. It seems 16 bit Unicode is already at its limit there. Using a Chinese smartphone with Japanese web pages gives mixed results. Some of the Japanese style characters in unicode seem to be replaced with slightly different Chinese ones.
I suppose the Chinese have the same trouble, the other way around. I wonder if this is sorted out with newer implementations of Unicode.
-
Wednesday 16th September 2015 13:40 GMT nijam
Re: Asian languages and Unicode
Unicode is not 16-bit.
You're thinking, perhaps, of the UTF-16 encoding which is used in some Microsoft systems but not AFAIK elsewhere, because it isn't a particularly good way of encoding Unicode. Most network systems, as well as Linux, use UTF-8, which is a more natural encoding scheme for an essentially-unbounded set of symbols.
-
Wednesday 16th September 2015 17:47 GMT Vincent Ballard
Various nit-picks
It's not essentially unbounded either. It's 20-bit. Nor is UTF-8 unbounded: in principle it can encode 36-bit values (although it's never been specified for more than 31-bit), but beyond that you need to make major design changes. UTF-16 is a good way for encoding certain strings, in particular ones which mainly use characters from the top half of the BMP (e.g. a lot of the supported Asian languages).
-
-