Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-collaboration Mailing List Archive: Re: [A12n-Collab] RE: Utilities for analyzing keyboards?

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: Re: [A12n-Collab] RE: Utilities for analyzing keyboards?
  • From: Andrew Cunningham <andrewc@xxxxxxxxxxxxx>
  • Date: Mon, 30 Jun 2008 14:37:18 +1000
  • Organization: Vicnet, State Library of Victoria
I'd settle for a Unicode based OCR application that you could add new languages to and train it for it new language, including identifying the alphabet for each language, punctuation and any auxiliary characters that may be encountered, esp. wrt to older texts.. It seems most OCR software doesn't lend itself to this purpose easily.


Andrew


Don Osborn wrote:

Hi Andrew, One problem we run up against in talking about various advanced applications is the issue of corpora. There is a need to find ways to (1) more effectively digitize existing text, and (2) generate new text. On the former (1), I would really like to see a project to (a) assure that extended Latin texts already scanned for projects like Google books are OCR'd properly when extended Latin and diacritics are involved (I've written that particular project about that already), and (b) a new/additional focused effort be undertaken to digitize all extant texts in under-resourced languages. On the latter (2) , Mark Liberman and colleagues at the Linguistic Data Consortium (University of Pennsylvania) have an interesting project concept for involving school students transcribing oral histories that then could become part of local heritage resources as well as developing the corpora for the languages (makes me wonder if OLPC and similar projects could be involved in a pilot effort along these lines).

That said, and returning to the topic of analyzing keyboards: I would hope that even a relatively small amount of text could in the meantime give us an idea how efficient alternative keyboard layouts are. We can sort of give an educated guess about what might be more advantageous in one way or another of particular key arrangements, but until we can begin to collect and statistically analyze basic data on keystokes, etc. it is just estimates. With small texts that are probably not "representative samplings" (if such a thing were possible in language), there is a risk that a particular text could give a misleading result. But at this stage in discussion we may be just talking about beginning to get some better ideas about the efficiency of alternative layouts.

Don

*From:* a12n-collaboration-bounces@xxxxxxxxxxxx [mailto:a12n-collaboration-bounces@xxxxxxxxxxxx] *On Behalf Of *Andrew Cunningham
*Sent:* Sunday, June 29, 2008 8:23 AM
*To:* Tunde Adegbola
*Cc:* keyboards@xxxxxxxxxxxxx; 'A12n tech support'; Don Osborn; 'Indigenous Languages and Technology' *Subject:* [A12n-Collab] Re: [PALNet-general] Utilities for analyzing keyboards?

Don,

your second tool would necessitate having a large corpus in each language to use for the analysis.

as a quick experiment, i thought I'd look at some character frequencies in a single text, just an experiment, since a single text couldn't be considered adequate for a proper analysis.

Since the draft Yoruba keyboard layout uses combining diacritics for all the diacritics, I took the Yoruba translation of the UDHR. Then normalised the text using NFD. I then ran it through a script to count the occurrence of each character.

Of the four most frequent characters, three were the combining diacritics: acute, grave and dot-below. Although a single text is inconclusive, it is suggestive that for Yoruba the combining diacritics need to be typed frequently and should be in positions allowing them to be typed easily and quickly.

And yes, i converted the vertical line below to a dot below before running the test on the UDHR translation.

Andrew
--
Andrew Cunningham
Research and Development Coordinator
Vicnet
State Library of Victoria
Australia

andrewc@xxxxxxxxxxxxx

------------------------------------------------------------------------

_______________________________________________
A12n-collaboration mailing list
A12n-collaboration@xxxxxxxxxxxx
http://lists.kabissa.org/mailman/listinfo/a12n-collaboration

--
Andrew Cunningham
Vicnet Research and Development Coordinator
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@xxxxxxxxxxxxx
Alt email: lang.support@xxxxxxxxx

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au

begin:vcard
fn:Andrew Cunningham
n:Cunningham;Andrew
org:State Library of Victoria;Vicnet
adr:;;328 Swanston Street;Melbourne;VIC;3000;Australia
email;internet:andrewc@xxxxxxxxxxxxx
title:Research and Development Coordinator
tel;work:+61-3-8664-7430
tel;fax:+61-3-9639-2175
tel;cell:0421-450-816
x-mozilla-html:FALSE
url:http://www.openroad.net.au/
version:2.1
end:vcard

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Wed Jul 02 08:48:37 2008

a12n-collaboration is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates