INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, cilt.0, sa.0, 2025 (AHCI)
This paper provides a detailed account of the Turkish Learner Corpus (TURLEC).
Building on the first author’s doctoral dissertation project, which aimed to identify
proficiency descriptors for four skills (listening, reading, writing, and speaking) for learners of
Turkish as a second language (L2) at various CEFR levels, the main motivation to build a
learner corpus is to outline the language learners actually use at different proficiency levels.
With the written and spoken texts of learners of Turkish L2 at the university level coming
from various countries with numerous L1 backgrounds, TURLEC entails 735 texts and
~104,000 tokens. After rigorous anonymization, annotation, and error-tagging efforts,
TURLEC reveals ~18,000 word forms with 3,584 lemmas, which will further be profiled
based on the CEFR levels. As accessible literature indicates, TURLEC is the first learner
corpus built to offer a vocabulary profile for L2 Turkish, which is an ever-growing field of
study with an increasing number of students.