论文→[ 科技论文 ]

发音偏误模式之督导式侦测与非督导式探勘用于电脑辅助语言学习

阅读量:02021-11-11作者:王祐邦来源:电机工程学研究所
首页 - 科技论文 - 本页网址:https://www.woailunwen.com/keji/72826/

研究生: 王祐邦
研究生(外文): Yow-Bang Wang
论文名称: 发音偏误模式之督导式侦测与非督导式探勘用于电脑辅助语言学习
论文名称(外文): Supervised Detection and Unsupervised Discovery of Pronunciation Error Patterns for Computer-Assisted Language Learning
指导教授: 李琳山李琳山引用关係
指导教授(外文): Lin-shan Lee
口试委员: 贝苏章、郑士康、徐宏民、吴家麟、林守德、于天立
口试委员(外文): Soo-Chang Pei、Shyh-Kang Jeng、Winston Hsu、Ja-Ling Wu、Shou-De Lin、Tian-Li Yu
口试日期: 2013-10-03
学位类别: 博士
校院名称: 国立台湾大学
系所名称: 电机工程学研究所
学门: 工程学门
学类: 电资工程学类
论文种类: 学术论文
论文出版年: 2014
毕业学年度: 102
语文别: 英文
论文页数: 61
中文关键词: 电脑辅助语言学习、电脑辅助发音训练、偏误模式侦测、偏误模式探勘、宇集音素事后机率
外文关键词: Computer-Assisted Language Learning、Computer-Aided Pronunciation Training、Error Pattern Detection、Error Pattern Discovery、Universal Phoneme Posteriorgram


Pronunciation error patterns (EPs) are patterns of mispronunciation frequently produced by language learners, and are usually different for different pairs of target and native languages. Accurate information of EPs can offer helpful feedbacks to the learners to improve their language skills. However, the major difficulty of EP detection comes from the fact that EPs are intrinsically similar to their corresponding canonical pronunciation, and EPs corresponding to the same canonical pronunciation are also intrinsically similar to each other. As a result, distinguishing EPs from their corresponding canonical pronunciation and between different EPs of the same phoneme is a difficult task – perhaps even more difficult than distinguishing between different phonemes in one language. On the other hand, the cost of deriving all EPs for each pair of target and native languages is high, usually requiring extensive expert knowledge or high-quality annotated data. Unsupervised EP discovery from a corpus of learner recordings would thus be an attractive addition to the field.
In this dissertation, we propose new frameworks for both supervised EP detection and unsupervised EP discovery. For supervised EP detection, we use hierarchical MLPs as the EP classifiers to be integrated with the baseline using HMM/GMM in a two-pass Viterbi decoding architecture. Experimental results show that the new framework enhances the power of EP diagnosis. For unsupervised EP discovery we propose the first known framework, using the hierarchical agglomerative clustering (HAC) algorithm to explore sub-segmental variation within phoneme segments and produce fixed-length segment-level feature vectors in order to distinguish different EPs. We tested K-means (assuming a known number of EPs) and the Gaussian mixture model with the minimum description length principle (estimating an unknown number of EPs) for EP discovery. Preliminary experiments offered very encouraging results, although there is still a long way to go to approach the performance of human experts. We also propose to use the universal phoneme posteriorgram (UPP), derived from an MLP trained on corpora of mixed languages, as frame-level features in both supervised detection and unsupervised discovery of EPs. Experimental results show that using UPP not only achieves the best performance , but also is useful in analyzing the mispronunciation produced by language learners.


志谢 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.1 Computer assisted language learning . . . . . . . . . . . . . . . . . . . .1
1.2 Major contributions of this dissertation . . . . . . . . . . . . . . . . . . .5
1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
2 Background Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
2.2 Error Pattern definition and labeling . . . . . . . . . . . . . . . . . . . .8
2.3 Multi-layer perceptron in acoustic modeling . . . . . . . . . . . . . . . . 11
2.4 Universal Phoneme Posteriorgram (UPP) . . . . . . . . . . . . . . . . . 14
3 Supervised Detection of Pronunciation Error Patterns . . . . . . . . . . . . . . 17
3.1 Acoustic modeling for EPs . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 EP detection framework based on the hybrid approach . . . . . . . . . . 20
3.3 Hierarchical MLPs as the EP classifiers . . . . . . . . . . . . . . . . . . 22
3.4 EP diagnosis confidence estimation . . . . . . . . . . . . . . . . . . . . . 23
3.5 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Experimental results and discussion . . . . . . . . . . . . . . . . . . . . 28
3.8 Complementarity analysis for the EP classifiers and EP AMs in the pro-
posed framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Unsupervised Discovery of Pronunciation Error Patterns . . . . . . . . . . . . 34
4.1 Framework overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Hierarchical Agglomerative Clustering (HAC) and Segment-level Feature
Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Unsupervised Clustering Algorithms for EP Discovery . . . . . . . . . . 39
4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Experimental results (I) – K-means with assumed known number of EPs . 42
4.7 Experimental results (II) – GMM-MDL with automatically estimated num-
ber of EPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.8 Analysis for an example set of automatically discovered EPs . . . . . . . 46
5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54


[1] X. Qian, F. Soong, and H. Meng, “Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT),” in Proc. INTERSPEECH 2010.
[2] B. Granstr &;#776; m, “Towards a virtual language tutor,” in InSTIL/ICALL Symposium 2004, 2004.
[3] P. Wik and A. Hjalmarsson, “Embodied conversational agents in computer assisted language learning,” Speech communication, vol. 51, no. 10, pp. 1024–1037, 2009.
[4] C. Tsurutani, Y. Yamauchi, N. Minematsu, D. Luo, K. Maruyama, and K. Hirose, “Development of a program for self assessment of Japanese pronunciation by English learners,” in Proc. Interspeech 2006.
[5] S. Abdou, S. Hamid, M. Rashwan, A. Samir, O. Abd-Elhamid, M. Shahin, and W. Nazih, “Computer aided pronunciation learning system using speech recognition techniques,” in Proc. Interspeech, 2006, pp. 849–852.
[6] A. Alwan, Y. Bai, M. Black, L. Casey, M. Gerosa, M. Heritage, M. Iseli, B. Jones, A. Kazemzadeh, S. Lee et al., “A system for technology based assessment of language and literacy in young children: the role of multiple information sources,” in Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on. IEEE, 2007, pp. 26–30.
[7] H. Strik, F. Cornillie, J. Colpaert, J. Van Doremalen, and C. Cucchiarini, “Developing a call system for practicing oral proficiency: How to design for speech technology, pedagogy and learners,” in Proceedings of the SLaTE-2009 workshop, Warwickshire, England, 2009.
[8] K. Zechner, D. Higgins, X. Xi, and D. Williamson, “Automatic scoring of non-native spontaneous speech in tests of spoken English,” Speech Communication, vol. 51, no. 10, pp. 883–895, 2009.
[9] B. Yoshimoto, “Rainbow rummy: a web-based game for vocabulary acquisition using computer-directed speech,” Ph.D. dissertation, Massachusetts Institute of Technology, 2009.
[10] J. Duchateau, Y. Kong, L. Cleuren, L. Latacz, J. Roelens, A. Samir, K. Demuynck, P. Ghesqui` re, W. Verhelst et al., “Developing a reading tutor: Design and evaluation of dedicated speech recognition and synthesis modules,” Speech Communication, vol. 51, no. 10, pp. 985–994, 2009.
[11] H. Wang, C. Waple, and T. Kawahara, “Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition,” Speech Communication, vol. 51, no. 10, pp. 995–1005, 2009.
[12] A. Harrison, W. Lo, X. Qian, and H. Meng, “Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training,” in Sig-SLATE, 2009.
[13] S. Witt and S. Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech communication, vol. 30, no. 2-3, pp. 95–108, 2000.
[14] J. Zheng, C. Huang, M. Chu, F. Soong, and W. Ye, “Generalized segment posterior probability for automatic Mandarin pronunciation evaluation,” in Proc. ICASSP 2007, vol. 4, pp. IV–201.
[15] F. Zhang, C. Huang, F. Soong, M. Chu, and R. Wang, “Automatic mispronunciation detection for Mandarin,” in Proc. ICASSP 2008, pp. 5077–5080.
[16] S. Wei, G. Hu, Y. Hu, and R. Wang, “A new method for mispronunciation detection using support vector machine based on pronunciation space models,” Speech Communication, vol. 51, no. 10, pp. 896–905, 2009.
[17] H. You, A. Alwan, A. Kazemzadeh, and S. Narayanan, “Pronunciation variations of Spanish-accented English spoken by young children,” in Proceedings Interspeech, 2005, pp. 749–752.
[18] A. Ito, Y. Lim, M. Suzuki, and S. Makino, “Pronunciation error detection method based on error rule clustering using a decision tree,” in INTERSPEECH, 2005.
[19] H. Meng, Y. Lo, L. Wang, and W. Lau, “Deriving salient learners’ mispronunciations from cross-language phonological comparisons,” in Proc. ASRU2007, pp. 437–442.
[20] C. Cucchiarini, H. Van Den Heuvel, E. Sanders, and H. Strik, “Error selecion for ASR-based English pronunciation training in ’My Pronunciation Coach’,” in Proc. INTERSPEECH 2011.
[21] D. Luo, X. Yang, and L. Wang, “Improvement of segmental mispronunciation detection with prior knowledge extracted form large l2 speech corpus,” in Proc. INTERSPEECH 2011.
[22] T. Cincarek, R. Gruhn, C. Hacker, E. N &;#776; th, and S. Nakamura, “Automatic pronunciation scoring of words and sentences independent from the non-native’s first language,” Computer Speech &; Language, vol. 23, no. 1, pp. 65–88, 2009.
[23] Q. Shi, K. Li, S. Zhang, S. Chu, J. Xiao, and Z. Ou, “Spoken English assessment system for non-native speakers using acoustic and prosodic features,” in Proc. INTERSPEECH 2010.
[24] Y. Zhang and J. Glass, “Towards multi-speaker unsupervised speech pattern discovery,” in Proc. ICASSP 2010. IEEE, pp. 4366–4369.
[25] M. Carlin, S. Thomas, A. Jansen, and H. Hermansky, “Rapid evaluation of speech representations for spoken term discovery,” in Proc. Interspeech 2011.
[26] C.-A. Chan and L.-S. Lee, “Unsupervised hidden markov modeling of spoken queries for spoken term detection without speech recognition,” in Proc. Interspeech 2011, pp. 2141–2144.
[27] A. Harrison, W. Lau, H. Meng, and L. Wang, “Improving mispronunciation detection and diagnosis of learners’ speech with context-sensitive phonological rules based on language transfer,” in INTERSPEECH 2008.
[28] J. Jiang and B. Xu, “Exploring the automatic mispronunciation detection of confusable phones for Mandarin,” in Acoustics, Speech and Signal Processing, 2009.
ICASSP 2009. IEEE International Conference on. IEEE, 2009, pp. 4833–4836.
[29] Y. Chen, C. Huang, and F. Soong, “Improving mispronunciation detection using machine learning,” in Proc. ICASSP 2009, pp. 4865–4868.
[30] Y.-B. Wang and L.-S. Lee, “Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training,” in Proc. ICASSP 2012.
[31] ——, “Error pattern detection integrating generative and discriminative learning for computer-aided pronunciation training,” in Proc. Interspeech 2012.
[32] ——, “Toward unsupervised discovery of pronunciation error patterns using universal phoneme posteriorgram for computer-assisted language learning,” in Proc ICASSP 2013, under review.
[33] NTU Chinese. [Online]. Available: http://chinese.ntu.edu.tw/
[34] H. Bourlard and N. Morgan, Connectionist speech recognition: a hybrid approach. Springer, 1994, vol. 247.
[35] H. Hermansky, D. P. Ellis, and S. Sharma, “Tandem connectionist feature extraction for conventional hmm systems,” in Proc. ICASSP 2000, vol. 3. IEEE, 2000, pp. 1635–1638.
[36] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 30–42, 2012.
[37] H. Meng, W. Lo, A. Harrison, P. Lee, K. Wong, W. Leung, and F. Meng, “Development of automatic speech recognition and synthesis technologies to support Chinese learners of English: The CUHK experience,” in APSIPA Annual Summit and Conference 2011.
[38] D. Imseng, H. Bourlard, and P. Garner, “Using KL-divergence and multilingual information to improve ASR for under-resourced languages,” in Proc. ICASSP 2012. IEEE, 2012, pp. 4869–4872.
[39] C. Yeh, C. Huang, and L. Lee, “Bilingual acoustic model adaptation by unit merging on different levels and cross-level integration,” in Proc. INTERSPEECH 2011.
[40] C. Huang, F. Zhang, and F. Soong, “Improving automatic evaluation of mandarin pronunciation with speaker adaptive training (sat) and mllr speaker adaption,” in Proc. ISCSLP’08, pp. 1–4.
[41] D. Luo, Y. Qiao, N. Minematsu, Y. Yamauchi, and K. Hirose, “Analysis and utilization of mllr speaker adaptation technique for learners’ pronunciation evaluation,” in Proc. INTERSPEECH 2009.
[42] ——, “Regularized-mllr speaker adaptation for computer-assisted language learning system,” in Proc. INTERSPEECH 2010.
[43] L. Chen and J. Jang, “Automatic pronunciation scoring using learning to rank and DP-based score segmentation,” in Proc. INTERSPEECH 2010.
[44] H. Kibishi and S. Nakagawa, “New feature parameters for pronunciation evaluation in English presentations at international conferences,” in Proc. INTERSPEECH 2011, pp. 1149–1152.
[45] M. Black, J. Tepperman, S. Lee, and S. Narayanan, “Estimation of children’s reading ability by fusion of automatic pronunciation verification and fluency detection,” in Proc. Interspeech 2008.
[46] M. Black and S. Narayanan, “Improvements in predicting children’s overall reading ability by modeling variability in evaluator’s subjective judgements,” in Proc. ICASSP 2012.
[47] M. Black, J. Tepperman, A. Kazemzadeh, S. Lee, and S. Narayanan, “Pronunciation verification of English letter-sounds in preliterate children,” in Proc. Interspeech 2008.
[48] L. F. Lamel, R. H. Kassel, and S. Seneff, “Speech database development: Design and analysis of the acoustic-phonetic corpus,” in Speech Input/Output Assessment and Speech Databases, 1989.
[49] C. Bouman, M. Shapiro, G. Cook, C. Atkins, and H. Cheng. (1997) Cluster: An unsupervised algorithm for modeling gaussian mixtures. https://engineering.purdue.edu/&;#8764;bouman/software/cluster/.
[50] C.-A. Chan, “Unsupervised spoken term detection with spoken queries,” Ph.D. dissertation, National Taiwan University, 2012.
[51] C. Manning, P. Raghavan, and H. Sch &;#776; tze, Introduction to information retrieval. Cambridge University Press Cambridge, 2008, vol. 1.
[52] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82–97, 2012.
[53] D. Povey, L. Burget, M. Agarwal, P. Akyazi, K. Feng, A. Ghoshal, O. Glembek, N. K. Goel, M. Karafi &;#769; t, A. Rastrow et al., “Subspace gaussian mixture models for speech recognition,” in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010, pp. 4330–4333.

累计有19527人觉得此论文有用

免责声明

发音偏误模式之督导式侦测与非督导式探勘用于电脑辅助语言学习
本文内容整理自网络,有修改,版权归原作者所有。如有侵权,我们将立即更正或删除相关内容。
联系邮箱 webmaster(#at)woailunwen.com [ (#at)改为@ ]
电脑辅助语言学习 电脑辅助发音训练 偏误模式侦测 偏误模式探勘 宇集音素事后机率

网友回答

还没有人提问发音偏误模式之督导式侦测与非督导式探勘用于电脑辅助语言学习,现在提问沙发就是你的!
点击加载更多