Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-forum Mailing List Archive: [A12n-forum] Re: FW: Resource-Scarce Languages - ELECTRONIC RESOURCES & PEOPLE NEEDED

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: [A12n-forum] Re: FW: Resource-Scarce Languages - ELECTRONIC RESOURCES & PEOPLE NEEDED
  • From: Dwayne Bailey <dwayne@xxxxxxxxxxxxxxxx>
  • Date: Wed, 14 Feb 2007 14:16:36 +0200
  • Organization: Translate.org.za
My general comment on this...

I fail to see how *proprietary* language technology built on a platform
that few can afford advances our "Resource-Scarce Languages".  This kind
of work actually holds back language development in these languages and
does little to advance them since most people end up not buying them
(although here it seems they might be given away).  

Open Source developed spell checkers would allow reapplication of the
developed technology on web, cellphone, etc and advances in speech
technology.  The work of CTexT will only run on Microsoft Office -
nothing else.

I need to restate it again as their always seems to be confusion.  Doing
Open Source spell checkers does not preclude them on running on
Microsoft since Open Source is a development strategy.

In South Africa we have seen little advance in the use of spell checker
and benefit from research into spell checkers since all work is
proprietary.  When reviewing the literature for work on
Translate.org.za's spell checkers data was so sparse that the only
benefit we got was the understanding that Nguni spell checker would be
hard :) - we new that anyway.  So much for academic papers advancing the
field.  By sharing we enable other and all Bantu language to benefit and
work is thus not be limited to a handful of top languages.  Now that is
true enabling of the real resource-scarce languages.

I guess it bothers me doubly when an academic group produces closed
secret work for our languages that need a massive boost to be able to
compete and advance.

I know many people will collaborate with CTexT.  Just be honest about
the impact and don't be disappointed when nobody uses them.  My request
is that you ensure that all your work is freely available so that as an
academic you can forward your own career and advance your language.  And
so that others can reuse it for creating spell checkers in other
languages and free spell checkers in that language.

Also check that your work doesn't actually go against the national IT
policy which might support Open Source. That has happened here in South
Africa where CTexT will develop proprietary spell checkers for South
Africa's Department of Art and Culture (who seemed confused between
OpenOffice and Open Source) even though the national government has a
policy to prefer Open Source.  I don't think we have yet felt the damage
that this will do to our languages, lets hope it doesn't happen to
yours.

On Sat, 2007-02-10 at 09:02 -0500, Don Osborn wrote:
> FYI...
> 
> -----Original Message-----
> From: Martin Puttkammer [mailto:Martin.Puttkammer@xxxxxxxxx] 
> Sent: Thursday, February 08, 2007 8:14 AM
> To: dzo@xxxxxxxxxxxx
> Subject: Resource-Scarce Languages - ELECTRONIC RESOURCES AND PEOPLE NEEDED
> 
> APOLOGIES FOR CROSS-POSTINGS OR IF THIS DOESN'T DIRECTLY CONCERN YOU
> 
> The Centre for Text Technology (CTexT) at the North-West University (South
> Africa) is developing proprietary spelling checkers for various African
> languages. In order to conduct this project successfully, we are currently
> sourcing various resources, most notably electronic resources (word lists,
> corpora, etc.) and people (experts and assistants) to contribute to the
> project.
> 
> BACKGROUND
> 
> Microsoft's Local Language Program is a global initiative to provide desktop
> software and tools to their customers by collaborating with local experts
> (governments, universities and other interested parties) to help build a
> robust local IT economy to:
> 
> - Help bridge the language and digital divide between developed and emerging
> markets.
> - Help preserve language and culture. Help technology impact language and
> culture in a positive way.
> - Help maintain the connections between communities.
> 
> Proofing tools such as spelling checkers and grammar checkers are important
> human language technology resources that enable speakers of the language to
> preserve and promote their language and culture while benefiting from
> Information Technology advancements. In this project, proprietary spelling
> checkers for various languages will be developed in cooperation with expert
> communities to ensure that the local languages are well defined and
> represented. Hence, the Centre for Text Technology (CTexT) at the North-West
> University (South Africa) is looking for co-workers to assist in the
> development of lexical data to be used in spelling checkers for:
> 
> - Hausa, 
> - Igbo, 
> - Kinyarwanda, 
> - Wolof, and 
> - Yoruba. 
> 
> Assisting in this project will help promote communication and interaction in
> these languages.
> 
> RESOURCES NEEDED:
> 
> We have the following needs:
> 
> Electronic Resources
> 1. Common and specialist word lists (such as lists of common spelling
> mistakes, lists of abbreviations, phonetic similarities, repetitive words,
> hyphen words, etc.).
> 2. Corpora, dictionaries, and books in electronic format.
> 3. A balanced corpus of 30,000 words (for testing purposes).
> 4. Rules for morphologically productive word formation processes, plus word
> lists to which these rules apply.
> 
> People
> 1. Linguists and/or language practitioners who can assist in the quality
> control of word lists.
> 2. Linguists who can assist in the compilation and/or refinement of
> morphological rules and rules for tokenisation.
> 3. Linguists who can provide a description of the standard written variant
> of the languages, as well as an annotated paragraph of 500-1000 words.
> 
> Should you have access to resources in respect of this project: kindly
> submit a brief description of what you can provide us with, as well as an
> indication of the conditions under which you would be prepared to make these
> available to us.
> 
> If you are a linguist or language practitioner interested in working on the
> project: kindly submit a description or shortened CV to highlight your
> relevant expertise and/or experience. Please note that linguists looking to
> become co-workers should comply with the following prerequisites: 
> 1. Be computer literate and have regular access to email.
> 2. Have expert knowledge of the standard written variant of the language(s)
> they intend to work on.
> 3. Be able to commence work in February 2007.
> 4. Be able and willing to travel to South Africa for training, if needed.
> 
> Please send information per email only to:
> 
> Martin Puttkammer
> Programme Manager: Proofing Tools
> Centre for Text Technology (CTexT), North-West University, South Africa
> Martin.Puttkammer@xxxxxxxxx 
> +27 18 299 1512
> 
> 
> 
> Kindly forward this message to other colleagues who might be interested in
> this project.
> 
> We also welcome comments and suggestions regarding this project.
> 
> CTexT reserves the right to accept or refuse any offers pertaining to this
> announcement at its sole discretion.
> 
> 
> 
> 
> This message (and attachments) is subject to restrictions and a disclaimer.
> Please refer to http://www.puk.ac.za/itb/e-pos/disclaimer.html for full
> details, or at itbsekr@xxxxxxxxxxxxxxxx 
> 
> 
> 
> 
> _______________________________________________
> A12n-forum mailing list
> A12n-forum@xxxxxxxxxxxx
> http://lists.kabissa.org/mailman/listinfo/a12n-forum
-- 
Dwayne Bailey
Translate.org.za

+27-12-460-1095 (w)
+27-83-443-7114 (cell)


[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Wed Mar 14 23:48:29 2007

a12n-forum is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates