Statistical structure of printed Turkish, English, German, French, Russian and Spanish

Shamılov, Aladdin; Yolacan, S.

Gelişmiş Arama

Erişim

info:eu-repo/semantics/closedAccess

Tarih

2006

Yazar

Shamılov, Aladdin
Yolacan, S.

Üst veri

Tüm öğe kaydını göster

Özet

Interests in the statistical properties of language, the basic tool for communication, has been frequently used for the development of computer sciences such as the construction of efficient binary codes. The language itself may be also regarded as a code for certain conceptual entities. From this point of view, in this study, statistical structures of printed Turkish, English, German, French, Russian and Spanish are examined on the basis of the probability distribution of letters for the same semantic content. Consequently, the optimal language in the sense of coding theory is determined by using Shannon's measure for entropy. During the analysis of the study, we encountered by some known difficulties about the evaluation of Shannon's measure. In order to get over these difficulties, we have established that the regression analysis is a convenient method. So, a regression equation is given for generalization of entropy estimates and related interpretations are given. The main important result of the paper is that the slope of the simple linear regression model gives the approximated value for the entropy of the languages.