a12n-collaboration Mailing List Archive: Re: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin)[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
On Dec 22, 2007 3:15 AM, John Hudson <tiro@xxxxxxxx> wrote: > Tunde Adegbola wrote: > > > In my own work in language technology however, I do have problems with > > the lack of unique code points for high/low tone sub dotted vowels. > > This presents ambiguity because they can be achieve in more that one > > way; by subdotting a tone-marked vowel, or by tone-marking a subdotted > > vowel. Both look exactly the same to a human reader but requires extra > > lines of code for a computer to see both as the same. It starts getting > > distracting when you consider that this has to be cattered for in both > > lowe and upper cases. > > As Andrew Cunningham pointed out, handling different character sequences for > the same > typeform is not very difficult, and it is for precisely this reason that > normalisation > exists and is well defined in the Unicode standard. This is an issue that > affects any > situation in which more than one mark is applied to a base letter, not just > some African > orthographies, and since it is a given that any combining mark characters may > be combined > in any quantity with any base characters, encoding precomposed combinations > not only is > not a viable option but simply shifts the normalisation issue into a > comparison of > precomposed and decomposed strings instead of comparision of variant > decomposed strings. The main reason why people are advocating addition of precomposed characters is because precomposed characters from legacy encodings are very well supported when compared to composed characters. It's rather paradoxal to have a simple solution, such a normalization on input or before comparison, not being widely used. There are too few applications, on the desktop or online, that do it. Unicode support does not imply normalization in general. It's rather disappointing and discouraging to see companies with i18n teams doing great work but totally failing in that aspect. Category 4 orthographies (with composed characters) face numerous basic issues Category 3 or 2 orthographies don't, not only at input and display but also at the data handling level. There really needs to be a greater awareness for the need of normalization. Happy holidays, -- Denis Moyogo Jacquerye --- http://home.sus.mcgill.ca/~moyogo Nkótá ya Kongó míbalé --- http://info-langues-congo.1sd.org/ DejaVu fonts --- http://dejavu.sourceforge.net/ Unicode (UTF-8)
Last Updated: Sun Dec 23 06:56:00 2007 |
a12n-collaboration is hosted on Kabissa - Space for Change in Africa
Your feedback is important. Click here to send a message to the Kabissa team.
Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates