a12n-collaboration Mailing List Archive: RE: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin)
[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: RE: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin)
- From: Tunde Adegbola <taintransit@xxxxxxxxxxx>
- Date: Sat, 22 Dec 2007 08:42:16 +0100
- Importance: Normal
Thanks to Andrew and John for their remarks. Tunde
-----------------------------------------------------------------------------------------------
Tunde Adegbola (Ph.D.)
Executive Director
African Languages Technology Initiative
(Alt-I ... Inserting African issues into the agenda of the knowledge age)
President
Tiwa Systems Ltd.
11 Oluyole Way, New Bodija Ibadan, Nigeria.
+234 8034019398
------------------------------------------------------------------------------------------------
> Date: Fri, 21 Dec 2007 18:15:18 -0800 > From: tiro@xxxxxxxx > Subject: Re: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin) > To: a12n-collaboration@xxxxxxxxxxxx > CC: dwanders@xxxxxxxxx; dzo@xxxxxxxxxxxx; africa@xxxxxxxxxxx; lisam@xxxxxxxxxx > > Tunde Adegbola wrote: > > > In my own work in language technology however, I do have problems with > > the lack of unique code points for high/low tone sub dotted vowels. > > This presents ambiguity because they can be achieve in more that one > > way; by subdotting a tone-marked vowel, or by tone-marking a subdotted > > vowel. Both look exactly the same to a human reader but requires extra > > lines of code for a computer to see both as the same. It starts getting > > distracting when you consider that this has to be cattered for in both > > lowe and upper cases. > > As Andrew Cunningham pointed out, handling different character sequences for the same > typeform is not very difficult, and it is for precisely this reason that normalisation > exists and is well defined in the Unicode standard. This is an issue that affects any > situation in which more than one mark is applied to a base letter, not just some African > orthographies, and since it is a given that any combining mark characters may be combined > in any quantity with any base characters, encoding precomposed combinations not only is > not a viable option but simply shifts the normalisation issue into a comparison of > precomposed and decomposed strings instead of comparision of variant decomposed strings. > > In any case, this point must be clearly understood: it is not possible to add any more > precomposed diacritic combinations with canonical decompositions to Unicode, due to > stability agreements with other international standards that rely on this aspect of > Unicode to remain stable. Personally, I would be happy if Unicode did not include any > precomposed characters, and if that had been possible from the beginning -- it was not, > due to the principle of providing one-to-one backwards compatibility with pre-existing > encodings -- then the software for seamlessly handling normalisation and display of > combining mark text would have matured many years ago and African and other non-European > languages would enjoy much better support that they have. > > John Hudson >
Express yourself instantly with MSN Messenger! MSN Messenger
|
[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
Last Updated: Sat Dec 22 06:26:19 2007
|