Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-collaboration Mailing List Archive: [A12n-Collab] 5 categories of African orthographies (Latin)

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: [A12n-Collab] 5 categories of African orthographies (Latin)
  • From: "Don Osborn" <dzo@xxxxxxxxxxxx>
  • Date: Tue, 18 Dec 2007 11:01:37 -0500
  • Thread-index: AchBj0Pn7PEPnbFQS3yBUQKN4NeOUg==

At various points we and others have discussed categories of issues with regard to handling orthographies of African languages, and how well Unicode and its implementations support these. In particular, the numerous Latin-based orthographies tend to fall into 4 or 5 categories. I'm not aware of any formal terminology or frame of reference to easily refer to these so would like to propose the following categories of Latin-based orthographies. I've been thinking along these lines, with that thinking prompted in part by a re-reading of an old article by Laurent Bourbeau and François Pinard. I'd responded recently to a question by Lisa Moore and Debbie Anderson about how many African languages are supported by Unicode by breaking the orthography issues down and turning the question around - how many are *not*  supported by Unicode (the answer is very few, but that support for orthographies using combining diacritics is still an issue).

 

This formulation below was refined in an IM chat I had with Andrew Cunningham yesterday, with thanks to him for feedback:

 

Category 1 orthographies:  ASCII - all characters and combinations covered by the ASCII character set

 

Category 2 orthographies:   Latin-1, meaning all characters and combinations covered by characters in ISO/IEC 8859-1 / Windows 1252

 

Category 3 orthographies:  Extended-Latin with no combining diacritics, meaning that the orthographies are covered by the Latin ranges of Unicode without need to use combining diacritics. Here there may be issues with systems for input and available fonts coverage.

 

Category 4 orthographies:  Latin as complex script, meaning the orthographies are covered by Extended-Latin with use of combining diacritics. Here there are issues with input, fonts and rendering that are not encountered in the above.

 

Category 5 orthographies:  Orthographies not fully supported by Unicode, which at this point would mean a missing character (these are probably very few).

 

 

Obviously this is somewhat simplified, since for example a few orthographies might be both 3 & 4, meaning that they have diacritics for indicating tone, but that these are not always used - so for some uses only the support issues for category 3 are needed to get going, but category 4 issues are there for diacritic support. (Manding languages, or at least Bambara in Mali, would be an example). So in some cases one might say that a language is category 3 for full, ordinary usage and category 4 for optimal coverage.

 

Also there are also degrees of complexity. Wolof for instance is category 3 because of its use of the "eng" letter (upper & lower case). On the other hand, Hausa and Fula use several characters in the extended ranges. Nevertheless, category would be determined for what is necessary for full support.

 

For many uses, categories 1 & 2 are practically the same, but the latter does occasionally pose some issues wrt display and input (and in some cases still programming??).

 

For reference, Category 2 would also include French orthography and category 1, English. Vietnamese would I think be category 4.

 

All the above of course refers only to Latin-based orthographies. There are issues with orthographies based on Arabic script and Ge'ez/Ethiopic abugida, which have extended ranges of their own, as well as with dicritic support in N'Ko and issues regarding possible additional characters in Tifinagh. Not to mention some lesser-used writing systems that are not yet in Unicode.

 

Anyway, I put this forth for discussion. Whatever the case I think it is useful to have a common typology for referring to the orthographies of African languages, and to begin to classify these according to that for further support, enabling, and localization issues.

 

Don Osborn

Bisharat.net

PanAfriL10n.org

 

 

 

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Wed Dec 19 23:47:32 2007

a12n-collaboration is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates