Letter-to-phoneme conversion is a classic problem in machine learning (ML), as it is both hard (at least for languages like English and French) and important. For non-linguists, a 'phoneme' is an abstract unit corresponding to the equivalence class of physical sounds that 'represent' the same speech sound. That is, members of the equivalence class are perceived by a speaker of the language as the 'same' phonemes: the word 'cat' consists of three phonemes, two of which are shared with the word 'bat'. A phoneme is defined by its role in distinguishing word pairs like 'bat' and 'cat'. Thus, /b/ and /k/ are different phonemes. But the /b/ in 'bat' and the /b/ in 'tab' are the same phoneme, in spite of their different acoustic realisations, because the difference between them is never used (in English) to signal a difference between minimally-distinctive word-pairs.
The problem is important because letter-to-sound conversion is central to the technology of speech synthesis, where input text has to be transformed to a representation that can drive the synthesis hardware, and necessary for some aspects of speech recognition. It is usual to employ phonemic symbols as the basis of this representation. However, letter-to-sound conversion is not a single mapping problem but a class of problems, which include not just automatic pronunciation but stress assignment, letter-phoneme alignment, syllabification and/or morphemic decomposition, and so on, hence the PRONALSYL acronym. Although we intend to give most prominence to letter-to-phoneme conversion, the community is challenged to develop and submit innovative solutions to these related problems.
As the specifics of the letter-to-sound problem vary from language to language, we intend that participants try their algorithms on a variety of languages. To this end, we will be making available different dictionaries covering a range of languages. They will minimally give a list of word spellings and their corresponding pronunciations. Be warned that the different dictionaries will typically use different conventions for representing the phonemes of the relevant language; this is all part of the fun. If participants have public-domain dictionaries of other interesting languages that they are willing to donate to the PRONALSYL challenge, we will be very pleased indeed to receive them. Please contact one of the organisers.
Virtually all existing letter-to-phoneme conversion methods require the letters of the word spelling and the phonemes of the pronunciation to be aligned in one-to-one fashion, as a bijection. This converts the string transcription problem to a classification problem. We will pre-align all the dictionaries using our EM-based algorithm before making them available to PRONALSYL participants. We also intend to make available a self-service alignment facility, so that researchers can submit their dictionaries, align them and have the results sent back by email. PLEASE WATCH THIS SPACE.
We also hope to make a couple of representative learning algorithms available for participants to use as benchmarks for quick initial assessment of their own algorithms. One of these will be pronunciation by analogy (PbA); the other will probably be a well-known rule induction algorithm. I am negotiating with the owner of the latter to let us use it.
Finally, not everyone is convinced that machine learning is the right way to approach this problem. In particular, there has been a long tradition of expert linguists writing rules manually. These rules are intended to encode the expert's knowledge of spelling-to-sound regularities in the language of interest. We are very keen for participants both to donate their own rules for comparison with ML methods, and/or to report on such comparisons. An especially interesting issue is whether or not the relative advantages and disadvantages of rules versus ML approaches vary systematically across languages according to some measure of the complexity of the writing system.
The timetable for the challenge is as follows:
| February 2006 | Challenge goes live |
| 10-12 April 2006 | Preliminary reporting at Pascal workshop in Venice | January 2007 | Challenge closes |
The timescale is rather longer than most Pascal challenges, not least because our principal motivation is to produce the best possible result for letter-to-sound conversion rather than to conduct a prize competition. We want to give participants every chance to achieve good performance without being unduly worried about a level playing field.
Organising Committee: