a12n-forum Mailing List Archive: [A12n-forum] Re: FW: Resource-Scarce Languages - ELECTRONIC RESOURCES & PEOPLE NEEDED[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
My general comment on this... I fail to see how *proprietary* language technology built on a platform that few can afford advances our "Resource-Scarce Languages". This kind of work actually holds back language development in these languages and does little to advance them since most people end up not buying them (although here it seems they might be given away). Open Source developed spell checkers would allow reapplication of the developed technology on web, cellphone, etc and advances in speech technology. The work of CTexT will only run on Microsoft Office - nothing else. I need to restate it again as their always seems to be confusion. Doing Open Source spell checkers does not preclude them on running on Microsoft since Open Source is a development strategy. In South Africa we have seen little advance in the use of spell checker and benefit from research into spell checkers since all work is proprietary. When reviewing the literature for work on Translate.org.za's spell checkers data was so sparse that the only benefit we got was the understanding that Nguni spell checker would be hard :) - we new that anyway. So much for academic papers advancing the field. By sharing we enable other and all Bantu language to benefit and work is thus not be limited to a handful of top languages. Now that is true enabling of the real resource-scarce languages. I guess it bothers me doubly when an academic group produces closed secret work for our languages that need a massive boost to be able to compete and advance. I know many people will collaborate with CTexT. Just be honest about the impact and don't be disappointed when nobody uses them. My request is that you ensure that all your work is freely available so that as an academic you can forward your own career and advance your language. And so that others can reuse it for creating spell checkers in other languages and free spell checkers in that language. Also check that your work doesn't actually go against the national IT policy which might support Open Source. That has happened here in South Africa where CTexT will develop proprietary spell checkers for South Africa's Department of Art and Culture (who seemed confused between OpenOffice and Open Source) even though the national government has a policy to prefer Open Source. I don't think we have yet felt the damage that this will do to our languages, lets hope it doesn't happen to yours. On Sat, 2007-02-10 at 09:02 -0500, Don Osborn wrote: > FYI... > > -----Original Message----- > From: Martin Puttkammer [mailto:Martin.Puttkammer@xxxxxxxxx] > Sent: Thursday, February 08, 2007 8:14 AM > To: dzo@xxxxxxxxxxxx > Subject: Resource-Scarce Languages - ELECTRONIC RESOURCES AND PEOPLE NEEDED > > APOLOGIES FOR CROSS-POSTINGS OR IF THIS DOESN'T DIRECTLY CONCERN YOU > > The Centre for Text Technology (CTexT) at the North-West University (South > Africa) is developing proprietary spelling checkers for various African > languages. In order to conduct this project successfully, we are currently > sourcing various resources, most notably electronic resources (word lists, > corpora, etc.) and people (experts and assistants) to contribute to the > project. > > BACKGROUND > > Microsoft's Local Language Program is a global initiative to provide desktop > software and tools to their customers by collaborating with local experts > (governments, universities and other interested parties) to help build a > robust local IT economy to: > > - Help bridge the language and digital divide between developed and emerging > markets. > - Help preserve language and culture. Help technology impact language and > culture in a positive way. > - Help maintain the connections between communities. > > Proofing tools such as spelling checkers and grammar checkers are important > human language technology resources that enable speakers of the language to > preserve and promote their language and culture while benefiting from > Information Technology advancements. In this project, proprietary spelling > checkers for various languages will be developed in cooperation with expert > communities to ensure that the local languages are well defined and > represented. Hence, the Centre for Text Technology (CTexT) at the North-West > University (South Africa) is looking for co-workers to assist in the > development of lexical data to be used in spelling checkers for: > > - Hausa, > - Igbo, > - Kinyarwanda, > - Wolof, and > - Yoruba. > > Assisting in this project will help promote communication and interaction in > these languages. > > RESOURCES NEEDED: > > We have the following needs: > > Electronic Resources > 1. Common and specialist word lists (such as lists of common spelling > mistakes, lists of abbreviations, phonetic similarities, repetitive words, > hyphen words, etc.). > 2. Corpora, dictionaries, and books in electronic format. > 3. A balanced corpus of 30,000 words (for testing purposes). > 4. Rules for morphologically productive word formation processes, plus word > lists to which these rules apply. > > People > 1. Linguists and/or language practitioners who can assist in the quality > control of word lists. > 2. Linguists who can assist in the compilation and/or refinement of > morphological rules and rules for tokenisation. > 3. Linguists who can provide a description of the standard written variant > of the languages, as well as an annotated paragraph of 500-1000 words. > > Should you have access to resources in respect of this project: kindly > submit a brief description of what you can provide us with, as well as an > indication of the conditions under which you would be prepared to make these > available to us. > > If you are a linguist or language practitioner interested in working on the > project: kindly submit a description or shortened CV to highlight your > relevant expertise and/or experience. Please note that linguists looking to > become co-workers should comply with the following prerequisites: > 1. Be computer literate and have regular access to email. > 2. Have expert knowledge of the standard written variant of the language(s) > they intend to work on. > 3. Be able to commence work in February 2007. > 4. Be able and willing to travel to South Africa for training, if needed. > > Please send information per email only to: > > Martin Puttkammer > Programme Manager: Proofing Tools > Centre for Text Technology (CTexT), North-West University, South Africa > Martin.Puttkammer@xxxxxxxxx > +27 18 299 1512 > > > > Kindly forward this message to other colleagues who might be interested in > this project. > > We also welcome comments and suggestions regarding this project. > > CTexT reserves the right to accept or refuse any offers pertaining to this > announcement at its sole discretion. > > > > > This message (and attachments) is subject to restrictions and a disclaimer. > Please refer to http://www.puk.ac.za/itb/e-pos/disclaimer.html for full > details, or at itbsekr@xxxxxxxxxxxxxxxx > > > > > _______________________________________________ > A12n-forum mailing list > A12n-forum@xxxxxxxxxxxx > http://lists.kabissa.org/mailman/listinfo/a12n-forum -- Dwayne Bailey Translate.org.za +27-12-460-1095 (w) +27-83-443-7114 (cell)
Last Updated: Wed Mar 14 23:48:29 2007 |
a12n-forum is hosted on Kabissa - Space for Change in Africa
Your feedback is important. Click here to send a message to the Kabissa team.
Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates