Statistical structure of printed Turkish, English, German, French, Russian and Spanish


Shamilov A., Yolacan S.

WSEAS Transactions on Mathematics, cilt.5, sa.6, ss.756-762, 2006 (Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 5 Sayı: 6
  • Basım Tarihi: 2006
  • Dergi Adı: WSEAS Transactions on Mathematics
  • Derginin Tarandığı İndeksler: Scopus
  • Sayfa Sayıları: ss.756-762
  • Anahtar Kelimeler: ANOVA, Coding theory, Entropy of language, Optimal language, Regression analysis, Semantic content, Shannon's measure
  • Anadolu Üniversitesi Adresli: Evet

Özet

Interests in the statistical properties of language, the basic tool for communication, has been frequently used for the development of computer sciences such as the construction of efficient binary codes. The language itself may be also regarded as a code for certain conceptual entities. From this point of view, in this study, statistical structures of printed Turkish, English, German, French, Russian and Spanish are examined on the basis of the probability distribution of letters for the same semantic content. Consequently, the optimal language in the sense of coding theory is determined by using Shannon's measure for entropy. During the analysis of the study, we encountered by some known difficulties about the evaluation of Shannon's measure. In order to get over these difficulties, we have established that the regression analysis is a convenient method. So, a regression equation is given for generalization of entropy estimates and related interpretations are given. The main important result of the paper is that the slope of the simple linear regression model gives the approximated value for the entropy of the languages.