Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Accurate Measurement of Lexical Sophistication in ESL with Reference to Learner Data

Naismith, Ben and Han, Na-Rae and Juffs, Alan and Hill, Brianna and Zheng, Daniel (2018) Accurate Measurement of Lexical Sophistication in ESL with Reference to Learner Data. In: Proceedings of the 11th International Conference on Educational Data Mining, EDM 2018, July 15-18, 2018, Buffalo, NY, USA.

[img]
Preview
PDF
Download (546kB) | Preview

Abstract

One commonly used measure of lexical sophistication is the Advanced Guiraud (AG; [9]), whose formula requires frequency band counts (e.g., COCA; [13]). However, the accuracy of this measure is affected by the particular 2000-word frequency list selected as the basis for its calculations [27]. For example, possible issues arise when frequency lists that are based solely on native speaker corpora are used as a target for second language (L2) learners (e.g., [8]) because the exposure frequencies for L2 learners may vary from that of native speakers. Such L2 variation from comparable native speakers may be due to first language (L1) culture, home country teaching materials, or the text types which L2 learners commonly encounter. This paper addresses the aforementioned problem through an English as a Second Language (ESL) frequency list validation. Our validation is established on two sources: (1) the New General Service List (NGSL; [4]) which is based on the Cambridge English Corpus (CEC) and (2) written data from the 4.2 million-word Pitt English Language Institute Corpus (PELIC). Using open-source data science tools and natural language processing technologies, the paper demonstrates that more distinct measurable lexical sophistication differences across levels are discernible when learner-oriented frequency lists (as compared to general corpora frequency lists) are used as part of a lexical measure such as AG. The results from this research will be useful in teaching contexts where lexical proficiency is measured or assessed, and for materials and test developers who rely on such lists as being representative of known vocabulary at different levels of proficiency. This research applies data-driven exploration of learner corpora to vocabulary acquisition and pedagogy, thus closing a loop between educational data mining and classroom applications.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: Conference or Workshop Item (Paper)
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Naismith, Benbnaismith@pitt.eduBEN250000-0001-8347-3142
Han, Na-Raenaraehan@pitt.edunaraehan
Juffs, Alanjuffs@pitt.edujuffs
Hill, Brianna
Zheng, Daniel
Date: 2018
Date Type: Publication
Journal or Publication Title: Proceedings of the 11th International Conference on Educational Data Mining
Publisher: International Educational Data Mining Society (IEDMS)
Page Range: pp. 259-265
Event Title: Proceedings of the 11th International Conference on Educational Data Mining, EDM 2018
Event Dates: July 15-18, 2018
Event Type: Conference
Schools and Programs: Dietrich School of Arts and Sciences > Linguistics
Refereed: Yes
Official URL: http://educationaldatamining.org/files/conferences...
Date Deposited: 21 Apr 2021 16:07
Last Modified: 21 Apr 2021 16:07
URI: http://d-scholarship.pitt.edu/id/eprint/40665

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item