I see many problems with Taylor’s
five levels.
I don’t see a significant
difference between levels 1 and 2. Maybe learning to enter é is a slight
challenge for an English speaker, but anyone living in France or Spain (e.g.)
has probably known how to do that from day one.
Level 3 is too simplistic and
out of date:
LEVEL 3
— The next step up in difficulty is
those languages which use
‘ordinary’ letterforms but in
some non-standard combinations – such
as a dot under a vowel, or an acute
accent over a consonant. These
languages cannot be set with standard applications
and fonts.
There are certainly “standard”
applications and fonts that can be used to set these. Maybe not *all* “standard”
applications and fonts – depending on what “standard” means.
Then there’s the confusion around suggested increasing complexity / lack
of software/font support with higher levels. In particular, consider level 4 in
relation to level 3:
LEVEL 4
— These are the languages which
clearly require a number
of special letterforms that do not exist
in the standard fonts oriented
towards Western European language
typesetting, for example the
‘hooked consonants’ of Hausa.
Here, a special font is definitely
required, but no other modification of
the system is needed.
Word 97 could handle the level-4
scenario even though it couldn’t handle the level-3 scenario. I suspect
there are several products from the past 10 years that are like that. (InDesign
is another that comes to mind.) Then, level 5 is a bit of a muddle:
LEVEL 5
— The most problematic languages
have a non-latin character
set which is so large in its required
repertoire that a single standard
font cannot contain them all – or
perhaps they have unusual behaviours,
such as requiring different forms of
letter depending on where
they occur in a word. This level of
problem requires more than just a
special font: some other modifications
will be needed, such as special
software or operating system extensions.
The size of the repertoire and
the need to support “unusual behaviours” are two very different
issues. Arabic versions of Word 95 could handle the latter but not the former;
English Word 97 could handle the former but (IIRC) not the latter. And
obviously the claim that “a single standard font cannot contain [all
characters for large-repertoire languages]” is completely wrong: ever
since there have been TrueType fonts, they have been able to support tens of
thousands of characters in a standards-conformant way. The real issue behind
what he’s saying is one of encoding: 8-bit encodings, indeed, cannot
handle a script like Ethiopic (except by some kind of escaping mechanism or
indirect representation scheme).
Peter
From: a12n-collaboration-bounces@xxxxxxxxxxxx
[mailto:a12n-collaboration-bounces@xxxxxxxxxxxx] On Behalf Of Don Osborn
Sent: Thursday, December 20, 2007 9:58 PM
To: a12n-collaboration@xxxxxxxxxxxx
Subject: [A12n-Collab] Re: 5 categories of African orthographies (Latin)
There are several points in this thread that I'd like to
clarify. I'd also like to say right off that I am glad that there is discussion
again about the issue of combining diacritics. It's an issue that seems to
float out there among some experts even as the technology improves. So - to
take a neutral point of view (NPOV; learned that on Wikipedia) - it needs to be
dealt with one way or another so we can move forward. More below in the fourth
point.
The points in this thread now include:
1) The original question: whether the system of
categorization - 5 categories of orthographies according to how unicode etc.
supports them. Am I correct in concluding that no one has a problem with this?
Reason I ask is that I want to use this in some writing and would rather get
criticism now than later.
Thanks, Tunde, for mentioning Conrad Taylor's site. For
those not familiar with it, it's actually a book and a nice piece of work, but
a bit dated. I'd seen it some years ago but had forgotten he had a list of
levels of difficulty on page 6. His Levels 1 & 2 are the same as the
Categories 1&2 I suggested. His 4 is basically my 3, and his 3 my 4. Partly
because we are now using Unicode, I see the extended characters themselves as
less of an issue than they were when he wrote his book - hence they are now
only a step above #2 in difficulty (these include the Extended Latin Additional
range that has the subdot letters). Also, my Category 4 includes characters in
Category 3 which need combining diacritics, which is a bit different than
Taylor's Level 3. I add a category (my #5) he doesn't have because he wasn't
writing from the viewpoint of Unicode support. Also, While I'm very aware of
the importance of non-Latin scripts I did not include them in my schema (or
else his #5 would be my #6 and on)
FYI, there was mention of Taylor's book on this list about 5
years ago. See:
http://lists.kabissa.org/lists/archives/public/a12n-collaboration/msg00120.html
http://lists.kabissa.org/lists/archives/public/a12n-collaboration/msg00121.html
2) Existing information on Latin-based orthographies of
African languages.
2a) Hartell's 1993 book. Yes, this is one we refer to often.
I used it for a series of charts on http://www.bisharat.net/A12N/#countrytables
(Lee Pearce also did some work there), and more recently Christian Chanard set
up a database using Hartell's data at http://sumale.vjf.cnrs.fr/phono/
. Problem is that there is no update to this, and indeed that expanding
it would be a challenge given the fact some orthographies are not set. Some
even apparently are changing
2b) Documents like the one by Jim Agenbroad that Charles
referred to, and indeed the oft-discussed research John did (time to bring that
up again) would indeed be great to get online for greater access.
3) With regard to missing characters in Unicode (which
define the category 5 orthographies), more work could certainly be done to
identify these, but this may not be as big a priority now - given the fact that
many outstanding needs have been addressed in recent years - as getting full
support for category 4 orthographies.
4) With regard to support for Category 4 orthographies (if
we agree on that terminology), that is orthographies that need combining
diacritics and hence support for those, the question of how good that support
is, and indeed how good the concept is, have been around for a while. The
suggestion that more precomposed characters be added to Unicode has been
discussed on this list - see for instance the thread beginning with http://lists.kabissa.org/lists/archives/public/a12n-collaboration/msg00182.html
.
The fact that the question keeps getting raised (I hear it
from others occasionally) is sign enough that there is a need to either clarify
the support issues and how those are being addressed, or clarify how the system
doesn't work. Continued doubts about dynamic composition either need to be
addressed with better explanations (and real support) or alternatively - again
from a NPOV - with a real proposal that makes the justification and proposes
specific precomposed characters. This so that we can move forward one way or
another rather than recycling debates.
That said (now I'm no longer NPOV), the system apparently
works but the support is not yet there for African languages. Maybe what the
problem is, and also the key to the concerns raised by Tunde and Samuel, is
that there still work to do to support input and display of Yoruba diacritics
(and other "category 4 orthographies") - so obviously it doesn't seem
to work.
In any event, I think this discussion is very timely and
would like to encourage people with whatever experience or expertise with
category 4 orthographies (i.e., ones that require use of combining diacritics
or even stacking of diacritics) to let us know what they think.
Don Osborn
Bisharat.net
PanAfriL10n.org
* For mention of Taylor's site see: