Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-collaboration Mailing List Archive: RE: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin)

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: RE: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin)
  • From: Tunde Adegbola <taintransit@xxxxxxxxxxx>
  • Date: Sat, 22 Dec 2007 08:42:16 +0100
  • Importance: Normal
Thanks to Andrew and John for their remarks.
Tunde

-----------------------------------------------------------------------------------------------
Tunde Adegbola (Ph.D.)
Executive Director
African Languages Technology Initiative
(Alt-I ... Inserting African issues into the agenda of the knowledge age)
President
Tiwa Systems Ltd.
 
11 Oluyole Way, New Bodija Ibadan, Nigeria.
+234 8034019398
------------------------------------------------------------------------------------------------


> Date: Fri, 21 Dec 2007 18:15:18 -0800
> From: tiro@xxxxxxxx
> Subject: Re: [A12n-Collab] Re: [africa] 5 categories of African orthographies (Latin)
> To: a12n-collaboration@xxxxxxxxxxxx
> CC: dwanders@xxxxxxxxx; dzo@xxxxxxxxxxxx; africa@xxxxxxxxxxx; lisam@xxxxxxxxxx
>
> Tunde Adegbola wrote:
>
> > In my own work in language technology however, I do have problems with
> > the lack of unique code points for high/low tone sub dotted vowels.
> > This presents ambiguity because they can be achieve in more that one
> > way; by subdotting a tone-marked vowel, or by tone-marking a subdotted
> > vowel. Both look exactly the same to a human reader but requires extra
> > lines of code for a computer to see both as the same. It starts getting
> > distracting when you consider that this has to be cattered for in both
> > lowe and upper cases.
>
> As Andrew Cunningham pointed out, handling different character sequences for the same
> typeform is not very difficult, and it is for precisely this reason that normalisation
> exists and is well defined in the Unicode standard. This is an issue that affects any
> situation in which more than one mark is applied to a base letter, not just some African
> orthographies, and since it is a given that any combining mark characters may be combined
> in any quantity with any base characters, encoding precomposed combinations not only is
> not a viable option but simply shifts the normalisation issue into a comparison of
> precomposed and decomposed strings instead of comparision of variant decomposed strings.
>
> In any case, this point must be clearly understood: it is not possible to add any more
> precomposed diacritic combinations with canonical decompositions to Unicode, due to
> stability agreements with other international standards that rely on this aspect of
> Unicode to remain stable. Personally, I would be happy if Unicode did not include any
> precomposed characters, and if that had been possible from the beginning -- it was not,
> due to the principle of providing one-to-one backwards compatibility with pre-existing
> encodings -- then the software for seamlessly handling normalisation and display of
> combining mark text would have matured many years ago and African and other non-European
> languages would enjoy much better support that they have.
>
> John Hudson
>


Express yourself instantly with MSN Messenger! MSN Messenger
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Sat Dec 22 06:26:19 2007

a12n-collaboration is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates