This study was conducted using the following content characteristics and promotion strategies. First, this study introduces a new methodology using a large-scale corpus and the latest statistical techniques. Existing studies have only been able to use ...
This study was conducted using the following content characteristics and promotion strategies. First, this study introduces a new methodology using a large-scale corpus and the latest statistical techniques. Existing studies have only been able to use a very small amount -about 50,000 words- due to the specificity of spoken data. However, in this paper, I intend to use the same amount of written and spoken corpus of 1 million words each. In addition, the learner corpus constructed by the Korean language education institution was used. This is a state of input and transcription, and sufficient analysis has not yet been made, but through this study, multi-angle analysis was performed. This is more difficult to construct than a general spoken corpus, and each volume is 270,000 words in the learner's written corpus and 370,000 words in the learner's spoken corpus. This study aimed at the analysis of parts of speech and the classification of homomorphic words and tagging. It can be said that this is rare in Korean data, and has the precision of analysis. In addition, in order to increase the efficiency of the research, various measurement tools required during the research process were also developed.
Second, in this study, an integrated and organic relationship analysis between evaluation mechanisms was conducted. In the existing vocabulary evaluation, most of the sentence length, the number of appearances, and the amount of vocabulary were measured. However, this study aims to measure vocabulary abundance in earnest, including vocabulary diversity, vocabulary density, number of errors, and vocabulary refinement. And finally, it measures the colloquiality and written language, which can be said to be the stylistic distinction of the text. These evaluation factors are lexical features that should be used appropriately according to the type of text produced by the learner, and quantify them and present them as objective indicators to enable the measurement of learners' progress. Mechanisms for measuring vocabulary abundance have aspects that have not been studied in various ways in Korean. Vocabulary diversity was originally used to measure the development of a child's first language. It is the number of types divided by the number of tokens. This technique has several problems and quantitative characteristics. Therefore, when using TTR, it is necessary to match the text size and sample extraction location. Vocabulary density is a measure of the proportion of content words, and is also related to the readability of text. In the case of Korean, the scope of the content language is roughly followed by the standard of English. English is divided into spoken and written languages in the 40% of the world, and if the density is less than 30%, it is considered to be lexical sparse. However, since there is no result for Korean language, the standard values for foreign learners were suggested by accurately measuring the standard values for each type using the data of the native speaker in the first year. Vocabulary refinement is a measure of the ability to use low-frequency vocabulary such as technical terms and special terms in appropriate places. The measurement of spoken and written language is to measure whether or not a learner can accurately distinguish stylistic texts. If this part is complete, learners can write and speak naturally close to the speaker of their native language, thereby enhancing the completeness of the text.
This study conducted sequential studies with completeness step by step. First, in the first year, theoretical research and data maintenance were intensively conducted. At this time, the analysis methodology for Korean, not English, will be reviewed and established, and in the data maintenance, the organization-axis data was reviewed, and vocabulary abundance was measured for data in the mother tongue. In the process, various lists will be prepared, and tools necessary for measurement have been developed. In the second year, the measurement of the abundance of texts was conducted for learners' data. Part-of-speech analysis and cross-reference work on the list prepared in the first year are carried out at this stage. And in the second year, the work of processing basic data for the measurement of spoken/written speech began. Lastly, in the 3rd year, the learner's colloquial/written ability was measured based on the results of the 2nd year. As for speaking data, the data of academic and general purpose learners were evenly used. And, together with the result of measuring vocabulary abundance in the second year, the evaluation mechanism for the vocabulary ability of the diagnosed learner was completed.