On constituent chunking for Turkish


ASLAN Ö., GÜNAL S., DİNÇER B. T.

INFORMATION PROCESSING & MANAGEMENT, cilt.54, sa.6, ss.1262-1276, 2018 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 54 Sayı: 6
  • Basım Tarihi: 2018
  • Doi Numarası: 10.1016/j.ipm.2018.05.004
  • Dergi Adı: INFORMATION PROCESSING & MANAGEMENT
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Sayfa Sayıları: ss.1262-1276
  • Anahtar Kelimeler: Chunking, Shallow parsing, Turkish, Constituent Conditional random fields, Natural language processing, PARSER
  • Anadolu Üniversitesi Adresli: Hayır

Özet

Chunking is a task which divides a sentence into non-recursive structures. The primary aim is to specify chunk boundaries and classes. Although chunking generally refers to simple chunks, it is possible to customize the concept. A simple chunk is a small structure, such as a noun phrase, while constituent chunk is a structure that functions as a single unit in a sentence, such as a subject. For an agglutinative language with a rich morphology, constituent chunking is a significant problem in comparison to simple chunking. Most of Turkish studies on this issue use the IOB tagging schema to mark the boundaries.