Papers by Ruvan Weerasinghe
The cumulative effort over the past few decades that have gone into developing linguistic resourc... more The cumulative effort over the past few decades that have gone into developing linguistic resources for tasks ranging from machine readable dictionaries to translation systems is enormous. Such effort is prohibitively expensive for languages outside the (largely) European family. The possibility of building such resources automatically by accessing electronic corpora of such languages are therefore of great interest to those involved in studying these ‘new’ - ‘lesser known’ languages. The main stumbling block to applying these data driven techniques directly is that most of them require large corpora rarely available for such ‘new’ languages. This paper describes an attempt at setting up a bootstrapping agenda to exploit the scarce corpus resources that may be available at the outset to a researcher concerned with such languages. In particular it reports on results of an experiment to use state-of-the-art data-driven techniques for building linguistic resources for Sinhala - a non-European language with virtually no electronic resources.
This paper brings together the development of the first Text-to- Speech (TTS) system for Sinhala ... more This paper brings together the development of the first Text-to- Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting letter-to-sound rules in Festival’s context sensitive rule format and the implementation of Sinhala syllabification algorithm. A Modified Rhyme Test (MRT) was conducted to evaluate the intelligibility of the synthesized speech and yielded a score of 71.5% for the TTS system described.
This paper presents a study of Sinhala syllable structure and an algorithm for identifying syllab... more This paper presents a study of Sinhala syllable structure and an algorithm for identifying syllables in Sinhala words. After a thorough study of the Syllable structure and linguistic rules for syllabification of Sinhala words and a survey of the relevant literature, a set of rules was identified and implemented as a simple, easy-to-implement algorithm. The algorithm was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually syllabified. The algorithm performs with 99.95 % accuracy.
Uploads
Papers by Ruvan Weerasinghe