Correcting writing errors in Turkish with a character-level neural language model


26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Turkey, 2 - 05 May 2018 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu.2018.8404505
  • City: İzmir
  • Country: Turkey
  • Keywords: natural language processing, recurrent neural networks, writing errors, character-level language model
  • Anadolu University Affiliated: Yes


A large part of the written content on the Internet is composed of social media posts, articles written for content platforms and user comments. In contrast to the content prepared for print media, these types of texts include a large number of writing errors. Automating the detection and correction of writing errors in content created for commercial purposes would decrease editing costs dramatically. Although word-level language models have performed well in processing analytic languages, they are not ideal for agglutinative languages, which include Turkish. Models built on smaller elements such as morphemes or characters are more suitable for agglutinative languages. In this study, we propose a method that uses a character-level language model to correct writing errors in Turkish. Character-level text generation is used to calculate the probabilities of possible syntaxes. The syntax that is the most probable is inferred to be correct. The proposed method is implemented to correct errors in writing the conjunction "de" and the suffix "-de".