Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-collaboration Mailing List Archive: RE: [A12n-Collab] Re: 5 categories of African orthographies (Latin)

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: RE: [A12n-Collab] Re: 5 categories of African orthographies (Latin)
  • From: Tunde Adegbola <taintransit@xxxxxxxxxxx>
  • Date: Wed, 26 Dec 2007 20:39:18 +0100
  • Importance: Normal

Hi Don,
I have been out of circulation for some time, having been working in a remote part of Sao Tome & Principe.  I have therefore had rather weak access to the Internet and hence the on-going discussion.  However, I managed to acknowledge some postings, particularly that of Andrew Cunningham, which addressed my position directly.
From the discussion so far and based on the strong valid augments put forward by Andrew and others, I feel convinced to withdraw the request for unique code points for tone-marked under-dotted vowels.  I feel it will be more productive to devote my energy to promoting uniformity in the way these characters are realised. 
May I therefore revive the discussion you and I had in South Africa on the possibility of giving a language technology flavour to the West African Languages Congress scheduled to hold in Ghana, August 2008.  It will be worthwhile invite someone with knowledge of how UNI CODE works for African languages to lead a workshop.  It should be possible to get some of our local funding partners to assist.
Tunde

-----------------------------------------------------------------------------------------------
Tunde Adegbola (Ph.D.)
Executive Director
African Languages Technology Initiative
(Alt-I ... Inserting African issues into the agenda of the knowledge age)
President
Tiwa Systems Ltd.
 
11 Oluyole Way, New Bodija Ibadan, Nigeria.
+234 8034019398
------------------------------------------------------------------------------------------------



From: dzo@xxxxxxxxxxxx
To: a12n-collaboration@xxxxxxxxxxxx
Subject: RE: [A12n-Collab] Re: 5 categories of African orthographies (Latin)
Date: Mon, 24 Dec 2007 00:10:09 -0500
CC: petercon@xxxxxxxxxxxxx

Hi Peter. I agree that there are some problems with Taylor's schema of 5 levels. Part of that is due to his book having been written before Unicode was more widely understood. Unfortunately it has not been updated even to at least include a caveat about its relevance and also about some errors in the character repertoires listed for some languages.

 

I'm afraid there may also be some confusion between his 5 levels and my 5 categories I just proposed. The latter have to do with support in the current context (that is, understanding Unicode and where i18n and L10n are now). The point of my schema of 5 categories of orthographies is to facilitate discussion of various support issues.

 

Briefly, the first two of my categories happen to correspond with what Taylor has. His level #4 is my category #3, and his #3 my #4 (with some differences). My category #5 is orthographies for which there is one or more character not at all supported in Unicode. I do not consider non-Latin scripts in my categories.

 

This is a bit of a coincidence - it has been a few years since I looked at Taylor's work and I had forgotten he broke things down in such a way. On the other hand, it is not that novel an approach.

 

I had a comment offline also about the 5 categories, suggesting it was a step back. But still I think it is useful to be able to say something is a "category 4 orthography" for instance, and know that it is one that requires use of combining diacritics, or "category 3" and know that a Latin-1 font simply won't do, etc. Also to be able to list languages under these headings (sometimes with asterisks as these things are not always so clear cut) so as to better understand the terrain.

 

More responses in text below. (I had to insert tags to clarify who wrote what.)

 

From: a12n-collaboration-bounces@xxxxxxxxxxxx [mailto:a12n-collaboration-bounces@xxxxxxxxxxxx] On Behalf Of Peter Constable
Sent: Friday, December 21, 2007 11:39 AM
To: A12n tech support
Subject: RE: [A12n-Collab] Re: 5 categories of African orthographies (Latin)

 

[PC] I see many problems with Taylor’s five levels.

 

I don’t see a significant difference between levels 1 and 2. Maybe learning to enter é is a slight challenge for an English speaker, but anyone living in France or Spain (e.g.) has probably known how to do that from day one.

 

[DO] The differences are becoming less and less important, I agree, but they are still there. Input of accents is one issue not only on an English QWERTY, but probably also on keyboards designed for one or another language using different accents.

 

Display is another issue that keeps cropping up for category 2 languages, even though it shouldn't. Hardly a week goes by when I don't encounter some problem or other with simple accented characters in French. Technically this shouldn't happen, one may argue, and maybe on fr locales it doesn't. But with different people using different systems and encodings and so on, it definitely does.

 

Correct me if I'm wrong, but are there not  also programming contexts where ASCII only (= category 1 but not 2) can be used?

 

[PC] Level 3 is too simplistic and out of date:

 

LEVEL 3

— The next step up in difficulty is those languages which use

‘ordinary’ letterforms but in some non-standard combinations – such

as a dot under a vowel, or an acute accent over a consonant. These

languages cannot be set with standard applications and fonts.

 

[PC] There are certainly “standard” applications and fonts that can be used to set these. Maybe not *all* “standard” applications and fonts – depending on what “standard” means. Then there’s the confusion around suggested increasing complexity / lack of software/font support with higher levels. In particular, consider level 4 in relation to level 3:

 

[DO] I agree this is simplistic and out of date. In my schema this is category 4, the orthographies that use combining diacritics.  

 

LEVEL 4

— These are the languages which clearly require a number

of special letterforms that do not exist in the standard fonts oriented

towards Western European language typesetting, for example the

‘hooked consonants’ of Hausa. Here, a special font is definitely

required, but no other modification of the system is needed.

 

[PC] Word 97 could handle the level-4 scenario even though it couldn’t handle the level-3 scenario. I suspect there are several products from the past 10 years that are like that. (InDesign is another that comes to mind.)

 

[DO] This is my category 3 because,  as you indicate, it was simpler to handle than the combining diacritics. You are of course right that some older applications and systems were able to display extended Latin characters. (Actually I think that not a few people who were relying on 8-bit "special fonts" for these characters earlier this decade actually had unicode fonts with the same characters on their computers but did not know it.)

 

[PC] Then, level 5 is a bit of a muddle:

 

LEVEL 5

— The most problematic languages have a non-latin character

set which is so large in its required repertoire that a single standard

font cannot contain them all – or perhaps they have unusual behaviours,

such as requiring different forms of letter depending on where

they occur in a word. This level of problem requires more than just a

special font: some other modifications will be needed, such as special

software or operating system extensions.

 

[PC] The size of the repertoire and the need to support “unusual behaviours” are two very different issues. Arabic versions of Word 95 could handle the latter but not the former; English Word 97 could handle the former but (IIRC) not the latter. And obviously the claim that “a single standard font cannot contain [all characters for large-repertoire languages]” is completely wrong: ever since there have been TrueType fonts, they have been able to support tens of thousands of characters in a standards-conformant way. The real issue behind what he’s saying is one of encoding: 8-bit encodings, indeed, cannot handle a script like Ethiopic (except by some kind of escaping mechanism or indirect representation scheme).

 

[DO] You are correct of course.  I'm reminded in reading this again of a distinguished scholar who wrote 2-3 years ago that a computer couldn't handle the 3 alphabets that could be used to transcribe Berber languages. Public education - even informing of language experts - about what Unicode does has never been very strong. Works such as Taylor's which don't take this into account but remain onlne with out update may keep confusing people.

 

I'll see if I can contact Conrad Taylor about whether something can be added to at least give readers an updated context.

 

Don

 



Express yourself instantly with MSN Messenger! MSN Messenger
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Mon Dec 31 10:52:56 2007

a12n-collaboration is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates