On constituent chunking for Turkish


ASLAN Ö., GÜNAL S., DİNÇER B. T.

INFORMATION PROCESSING & MANAGEMENT, vol.54, no.6, pp.1262-1276, 2018 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 54 Issue: 6
  • Publication Date: 2018
  • Doi Number: 10.1016/j.ipm.2018.05.004
  • Journal Name: INFORMATION PROCESSING & MANAGEMENT
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Page Numbers: pp.1262-1276
  • Keywords: Chunking, Shallow parsing, Turkish, Constituent Conditional random fields, Natural language processing, PARSER
  • Anadolu University Affiliated: Yes

Abstract

Chunking is a task which divides a sentence into non-recursive structures. The primary aim is to specify chunk boundaries and classes. Although chunking generally refers to simple chunks, it is possible to customize the concept. A simple chunk is a small structure, such as a noun phrase, while constituent chunk is a structure that functions as a single unit in a sentence, such as a subject. For an agglutinative language with a rich morphology, constituent chunking is a significant problem in comparison to simple chunking. Most of Turkish studies on this issue use the IOB tagging schema to mark the boundaries.