Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-collaboration Mailing List Archive: RE: [A12n-Collab] Re: 5 categories of African orthographies (Latin)

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: RE: [A12n-Collab] Re: 5 categories of African orthographies (Latin)
  • From: Peter Constable <petercon@xxxxxxxxxxxxx>
  • Date: Fri, 21 Dec 2007 08:39:29 -0800
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • Thread-index: AchDlnCixOwm50WWR+SSG06QSNDKNQAVqAjg
  • Thread-topic: [A12n-Collab] Re: 5 categories of African orthographies (Latin)

I see many problems with Taylor’s five levels.

 

I don’t see a significant difference between levels 1 and 2. Maybe learning to enter é is a slight challenge for an English speaker, but anyone living in France or Spain (e.g.) has probably known how to do that from day one.

 

Level 3 is too simplistic and out of date:

 

LEVEL 3

— The next step up in difficulty is those languages which use

‘ordinary’ letterforms but in some non-standard combinations – such

as a dot under a vowel, or an acute accent over a consonant. These

languages cannot be set with standard applications and fonts.

 

There are certainly “standard” applications and fonts that can be used to set these. Maybe not *all* “standard” applications and fonts – depending on what “standard” means. Then there’s the confusion around suggested increasing complexity / lack of software/font support with higher levels. In particular, consider level 4 in relation to level 3:

 

LEVEL 4

— These are the languages which clearly require a number

of special letterforms that do not exist in the standard fonts oriented

towards Western European language typesetting, for example the

‘hooked consonants’ of Hausa. Here, a special font is definitely

required, but no other modification of the system is needed.

 

Word 97 could handle the level-4 scenario even though it couldn’t handle the level-3 scenario. I suspect there are several products from the past 10 years that are like that. (InDesign is another that comes to mind.) Then, level 5 is a bit of a muddle:

 

LEVEL 5

— The most problematic languages have a non-latin character

set which is so large in its required repertoire that a single standard

font cannot contain them all – or perhaps they have unusual behaviours,

such as requiring different forms of letter depending on where

they occur in a word. This level of problem requires more than just a

special font: some other modifications will be needed, such as special

software or operating system extensions.

 

The size of the repertoire and the need to support “unusual behaviours” are two very different issues. Arabic versions of Word 95 could handle the latter but not the former; English Word 97 could handle the former but (IIRC) not the latter. And obviously the claim that “a single standard font cannot contain [all characters for large-repertoire languages]” is completely wrong: ever since there have been TrueType fonts, they have been able to support tens of thousands of characters in a standards-conformant way. The real issue behind what he’s saying is one of encoding: 8-bit encodings, indeed, cannot handle a script like Ethiopic (except by some kind of escaping mechanism or indirect representation scheme).

 

 

 

Peter

 

From: a12n-collaboration-bounces@xxxxxxxxxxxx [mailto:a12n-collaboration-bounces@xxxxxxxxxxxx] On Behalf Of Don Osborn
Sent: Thursday, December 20, 2007 9:58 PM
To: a12n-collaboration@xxxxxxxxxxxx
Subject: [A12n-Collab] Re: 5 categories of African orthographies (Latin)

 

There are several points in this thread that I'd like to clarify. I'd also like to say right off that I am glad that there is discussion again about the issue of combining diacritics. It's an issue that seems to float out there among some experts even as the technology improves. So - to take a neutral point of view (NPOV; learned that on Wikipedia) - it needs to be dealt with one way or another so we can move forward. More below in the fourth point.

 

The points in this thread now include:

 

1) The original question: whether the system of categorization - 5 categories of orthographies according to how unicode etc. supports them. Am I correct in concluding that no one has a problem with this? Reason I ask is that I want to use this in some writing and would rather get criticism now than later.

 

Thanks, Tunde, for mentioning Conrad Taylor's site. For those not familiar with it, it's actually a book and a nice piece of work, but a bit dated. I'd seen it some years ago but had forgotten he had a list of levels of difficulty on page 6. His Levels 1 & 2 are the same as the Categories 1&2 I suggested. His 4 is basically my 3, and his 3 my 4. Partly because we are now using Unicode, I see the extended characters themselves as less of an issue than they were when he wrote his book - hence they are now only a step above #2 in difficulty (these include the Extended Latin Additional range that has the subdot letters). Also, my Category 4 includes characters in Category 3 which need combining diacritics, which is a bit different than Taylor's Level 3. I add a category (my #5) he doesn't have because he wasn't writing from the viewpoint of Unicode support. Also, While I'm very aware of the importance of non-Latin scripts I did not include them in my schema (or else his #5 would be my #6 and on)

 

FYI, there was mention of Taylor's book on this list about 5 years ago. See:

http://lists.kabissa.org/lists/archives/public/a12n-collaboration/msg00120.html

http://lists.kabissa.org/lists/archives/public/a12n-collaboration/msg00121.html

 

 

2) Existing information on Latin-based orthographies of African languages.

2a) Hartell's 1993 book. Yes, this is one we refer to often. I used it for a series of charts on http://www.bisharat.net/A12N/#countrytables (Lee Pearce also did some work there), and more recently Christian Chanard set up a database using Hartell's data at http://sumale.vjf.cnrs.fr/phono/  . Problem is that there is no update to this, and indeed that expanding it would be a challenge given the fact some orthographies are not set. Some even apparently are changing

 

2b) Documents like the one by Jim Agenbroad that Charles referred to, and indeed the oft-discussed research John did (time to bring that up again) would indeed be great to get online for greater access.

 

 

3) With regard to missing characters in Unicode (which define the category 5 orthographies), more work could certainly be done to identify these, but this may not be as big a priority now - given the fact that many outstanding needs have been addressed in recent years - as getting full support for category 4 orthographies.

 

 

4) With regard to support for Category 4 orthographies (if we agree on that terminology), that is orthographies that need combining diacritics and hence support for those, the question of how good that support is, and indeed how good the concept is, have been around for a while. The suggestion that more precomposed characters be added to Unicode has been discussed on this list - see for instance the thread beginning with http://lists.kabissa.org/lists/archives/public/a12n-collaboration/msg00182.html .

 

The fact that the question keeps getting raised (I hear it from others occasionally) is sign enough that there is a need to either clarify the support issues and how those are being addressed, or clarify how the system doesn't work. Continued doubts about dynamic composition either need to be addressed with better explanations (and real support) or alternatively - again from a NPOV - with a real proposal that makes the justification and proposes specific precomposed characters. This so that we can move forward one way or another rather than recycling debates.

 

That said (now I'm no longer NPOV), the system apparently works but the support is not yet there for African languages. Maybe what the problem is, and also the key to the concerns raised by Tunde and Samuel, is that there still work to do to support input and display of Yoruba diacritics (and other "category 4 orthographies") - so obviously it doesn't seem to work.

 

In any event, I think this discussion is very timely and would like to encourage people with whatever experience or expertise with category 4 orthographies (i.e., ones that require use of combining diacritics or even stacking of diacritics) to let us know what they think.

 

Don Osborn

Bisharat.net

PanAfriL10n.org

 

 

 

 

 

 

 

 

 

 

* For mention of Taylor's site see:

 

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Sun Dec 23 23:10:48 2007

a12n-collaboration is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates