Statistical structure of printed Turkish, English, German, French, Russian and Spanish

Shamilov, Aladdin; Yolacan, Senay

Statistical structure of printed Turkish, English, German, French, Russian and Spanish

WSEAS Transactions on Mathematics, cilt.5, sa.6, ss.756-762, 2006 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 5 Sayı: 6
Basım Tarihi: 2006
Dergi Adı: WSEAS Transactions on Mathematics
Derginin Tarandığı İndeksler: Scopus
Sayfa Sayıları: ss.756-762
Anahtar Kelimeler: ANOVA, Coding theory, Entropy of language, Optimal language, Regression analysis, Semantic content, Shannon's measure
Anadolu Üniversitesi Adresli: Evet

Özet

Interests in the statistical properties of language, the basic tool for communication, has been frequently used for the development of computer sciences such as the construction of efficient binary codes. The language itself may be also regarded as a code for certain conceptual entities. From this point of view, in this study, statistical structures of printed Turkish, English, German, French, Russian and Spanish are examined on the basis of the probability distribution of letters for the same semantic content. Consequently, the optimal language in the sense of coding theory is determined by using Shannon's measure for entropy. During the analysis of the study, we encountered by some known difficulties about the evaluation of Shannon's measure. In order to get over these difficulties, we have established that the regression analysis is a convenient method. So, a regression equation is given for generalization of entropy estimates and related interpretations are given. The main important result of the paper is that the slope of the simple linear regression model gives the approximated value for the entropy of the languages.