Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-collaboration Mailing List Archive: [A12n-Collab] Re: [PALNet-general] Utilities for analyzing keyboards?

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: [A12n-Collab] Re: [PALNet-general] Utilities for analyzing keyboards?
  • From: "Andrew Cunningham" <andrewc@xxxxxxxxxxxxx>
  • Date: Wed, 2 Jul 2008 21:54:10 +1000 (EST)
  • Importance: Normal
Don,

heartily agree

at the moment i'l running some perl scripts across translations of the UDHR. The Yoruba translation was interesting. Four most frequent characters in the document were the three combining diacritics (acute, grave and dot-below) and the letter "i". Would seem to indicate that for Yoruba the combining diacritics should be on the 1st level of the keyboard rather than the 3rd level (AltGr).

Languages like Igbo will be interesting. Normally tone isn't written, but could be. I'd expect Igbo with tone marking would have a similar high occurrence of combining diacritics.

I intend to run two versions. One with a pre-filter to normalise to NFD. The second pre-filter to normalise to NFD and then lowercase all characters.

And run it across the available languages in the UDHR.

With keyboard layouts, if you assume a QWERTY layout, there seems to be a number of alternative approaches to designing the keyboard.

1) reassign alphabetic keys not need by a specific language. A useful approach for single language keyboards. Add removed letters to AltGr and Shift+AltGr key sequences

2) reassign limited use non-alphanumeric keys.

3) reassign limited use non-alphanumeric characters on primary digit keys.

Various European keyboards use strategy 2) and 3).

4) Move digits to AltGr key sequence, and use digit keys for diacritics and other necessary characters. Assumes users prefer to use number keypads. Used on Microsoft and IBM keyboard layouts for Vietnamese.

5) deadkey system to extended Latin characters. Common on some SIL keyboard layouts. I have occasionally used this approach on some internal projects. We have a number of keyboards that are QWERTY  US layout but have the additional key common to European keyboards. We use this extra key as a  composition key (essentially a dead key) to type additional characters.

6) Add additional characters to AltGr and Shift-AltGr key sequences.

The current draft for the keyboard layouts uses a mix of  1), 2) and 6)

Probably one other comment. It is important to be sensitive to potential users perceptions about their languages. I remember having discussions with a few people regarding a layout I was working on for Igbo. The people I was consulting with felt that the letters of the alphabet should be accessible form the basic (1st/2nd) level of the keyboard. the sub-dotted vowels and n-dot_above are distinct letters of the Igbo alphabet and should be treated differently to other characters with diacritics.

That doesn't mean they need to be single precomposed characters, since each key could easily output more than one character (including base character and combining diacritic).

Just that there is a possibility of the success of the layouts being determined based on how accurately users feel that the keyboard layout supports and matches the perception of their language and alphabet.

Long ago I decided there is no such thing as the ideal layout. The usability and ergonomic factors of a keyboard layout design can be optimised within the constraints of the design parameters.

But a users perception of the keyboard is just as an important factor in the layout design's success.

Just my two cents worth.

Andrew

On Mon, June 30, 2008 9:35 am, Don Osborn wrote:
> Hi Andrew, One problem we run up against in talking about various advanced
> applications is the issue of corpora. There is a need to find ways to (1)
> more effectively digitize existing text, and (2) generate new text. On the
> former (1), I would really like to see a project to (a) assure that
> extended
> Latin texts already scanned for projects like Google books are OCR'd
> properly when extended Latin and diacritics are involved (I've written
> that
> particular project about that already), and (b) a new/additional focused
> effort be undertaken to digitize all extant texts in under-resourced
> languages. On the latter (2) , Mark Liberman and colleagues at the
> Linguistic Data Consortium (University of Pennsylvania) have an
> interesting
> project concept for involving school students transcribing oral histories
> that then could become part of local heritage resources as well as
> developing the corpora for the languages (makes me wonder if OLPC and
> similar projects could be involved in a pilot effort along these lines).
>
>
>
> That said, and returning to the topic of analyzing keyboards: I would hope
> that even a relatively small amount of text could in the meantime give us
> an
> idea how efficient alternative keyboard layouts are. We can sort of give
> an
> educated guess about what might be more advantageous in one way or another
> of particular key arrangements, but until we can begin to collect and
> statistically analyze basic data on keystokes, etc. it is just estimates.
> With small texts that are probably not "representative samplings" (if such
> a
> thing were possible in language), there is a risk that a particular text
> could give a misleading result. But at this stage in discussion we may be
> just talking about beginning to get some better ideas about the efficiency
> of alternative layouts.
>
>
>
> Don
>
>
>
>
>
>
>
>
From: a12n-collaboration-bounces@xxxxxxxxxxxx
> [mailto:a12n-collaboration-bounces@xxxxxxxxxxxx] On Behalf Of Andrew
> Cunningham
> Sent: Sunday, June 29, 2008 8:23 AM
> To: Tunde Adegbola
> Cc: keyboards@xxxxxxxxxxxxx; 'A12n tech support'; Don Osborn; 'Indigenous
> Languages and Technology'
> Subject: [A12n-Collab] Re: [PALNet-general] Utilities for analyzing
> keyboards?
>
>
>
> Don,
>
> your second tool would necessitate having a large corpus in each language
> to
> use for the analysis.
>
> as a quick experiment, i thought I'd look at some character frequencies in
> a
> single text, just an experiment, since a single text couldn't be
> considered
> adequate for a proper analysis.
>
> Since the draft Yoruba keyboard layout uses combining diacritics for all
> the
> diacritics, I took the Yoruba translation of the UDHR. Then normalised the
> text using NFD. I then ran it through a script to count the occurrence of
> each character.
>
> Of the four most frequent characters, three were the combining diacritics:
> acute, grave and dot-below. Although a single text is inconclusive, it is
> suggestive that for Yoruba the combining diacritics need to be typed
> frequently and should be in positions allowing them to be typed easily and
> quickly.
>
> And yes, i converted the vertical line below to a dot below before running
> the test on the UDHR translation.
>
> Andrew
> --
> Andrew Cunningham
> Research and Development Coordinator
> Vicnet
> State Library of Victoria
> Australia
>
> andrewc@xxxxxxxxxxxxx
>
> _______________________________________________
> PALNet-general mailing list
> PALNet-general@xxxxxxxxxxxxxxx
> http://lists.panafril10n.net/cgi-bin/mailman/listinfo/palnet-general
>


--
Andrew Cunningham
Research and Development Coordinator
Vicnet
State Library of Victoria
Australia

andrewc@xxxxxxxxxxxxx [Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Sat Jul 05 11:57:20 2008

a12n-collaboration is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates