As the use of computers spreads throughout society with expectations of further future growth, demand is growing for a more user friendly computer with more applications in our everyday life. Speech processing technology enables computers to recognize ...
As the use of computers spreads throughout society with expectations of further future growth, demand is growing for a more user friendly computer with more applications in our everyday life. Speech processing technology enables computers to recognize and understand everyday speech so that natural communication with human beings is possible, thereby satisfying user demand for increased convenience. The integration of speech processing technology with internet contents and IT technology has paved the way for diverse products and services. Call centers with voice recognition and synthesis capabilities, automobile telematics services, question and answer natural language utilizing search engines, and language translation systems are some of the main applications of speech processing technology.
Speech processing technology requires the interdisciplinary work in electronic engineering, computer engineering, psychology and linguistics and others. So far in Korea, however, speech processing technology research has been approached primarily from an engineering perspective, which has been thought to have limited its development. Although recently there has been an increased awareness of the necessity of linguistic integration, its application has been limited to the use of very basis linguistic units such as phonemes and morphemes, and there has been few instances where linguistics methods have been integrated into speech processing systems. This project aims to establish phonological and prosodic knowledge in a systematic way in order to integrate them to the spoken language processing technology. The phonological knowledge for this purpose focuses on the grapheme-to-phoneme conversion of compound nouns, and the prosodic characteristics of dialogue speech are analyzed as prosodic knowledge, which is to be employed in spoken language processing systems.
The first year of the project is concentrated on the grapheme-to-phoneme conversion of compound nouns based on the definition of the basic prosodic units of compound words. A method of extracting words with exceptional pronunciations is proposed, which includes modeling of pronunciation variations at the boundary of prosodic units. This study will, on the one hand, provide an understanding of Korean language from the phonological point of view, and, on the other hand, enable a systematic development of a multiple pronunciation lexicon for Korean TTS or ASR systems of high performance.
The second year focuses of the problem of the prosodic characteristics of dialogue speech and proposes how to extract intonation patterns using Momel, a pitch stylization algorithm, and includes results of analyzing speech corpora in comparison with those in earlier researches. Furthermore, a method of automatically detecting prosodic boundaries is also proposed using acoustic and grammatical information for the performance improvement of speech information processing systems, based on the research on previous studies in this area,
The third year is concentrated on the investigation of the intonation patterns of Korean spontaneous speech through an analysis of four dialogues in the domain of travel planning, based on K-ToBI system, which assumes two hierarchical units of Accentual Phrases (AP) and Intonation Phrases (IP), using Momel, an intonation stylization algorithm. The results of the study are compared to those of the read speech.