Correcting writing errors in Turkish with a character-level neural language model


BENLİGİRAY B.

26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Türkiye, 2 - 05 Mayıs 2018 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu.2018.8404505
  • Basıldığı Şehir: İzmir
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: natural language processing, recurrent neural networks, writing errors, character-level language model
  • Anadolu Üniversitesi Adresli: Evet

Özet

A large part of the written content on the Internet is composed of social media posts, articles written for content platforms and user comments. In contrast to the content prepared for print media, these types of texts include a large number of writing errors. Automating the detection and correction of writing errors in content created for commercial purposes would decrease editing costs dramatically. Although word-level language models have performed well in processing analytic languages, they are not ideal for agglutinative languages, which include Turkish. Models built on smaller elements such as morphemes or characters are more suitable for agglutinative languages. In this study, we propose a method that uses a character-level language model to correct writing errors in Turkish. Character-level text generation is used to calculate the probabilities of possible syntaxes. The syntax that is the most probable is inferred to be correct. The proposed method is implemented to correct errors in writing the conjunction "de" and the suffix "-de".