Search for content and authors
 

Internal organization of languages: Decomposing "Ulysses"

Jarosław Kwapień 1Stanisław Drożdż 1,2

1. Polish Academy of Sciences, Institute of Nuclear Physics (IFJ PAN), Radzikowskiego 152, Kraków 31-342, Poland
2. University of Rzeszów, Institute of Physics, Department of Complex Systems, Rejtana 16, Rzeszów 35-310, Poland

Abstract

Languages constitute a basis of social interactions. From a point of view of statistical physics, natural language can be viewed as a system developing complex patterns of behaviour such as a hierarchic structure, long-range correlations and scaling. These properties are universal for a large class of complex systems which can be observed in nature. Thus, studying sample representations of natural language can help understand also the structure and dynamics of complex systems in general. In our work we analyzed the rank-ordered distribution of words in "Ulysses" by James Joyce, revisiting the earlier classic analysis carried out by G.K. Zipf. "Ulysses" is a piece of text which, due to its unique diversity of literary styles and vocabulary, can be considered one of the most representative works of written language and, as such, ideally suits this kind of study. In his work Zipf showed that the frequency of word occurences in text is inversely proportional to the word's rank and this observation allowed him to formulate a law known today after his name as the Zipf law (or, in a modified version, the Zipf-Mandelbrot law). Motivated by the fact that a real language is not an amorfic structure of individual words, as would be, for example, an output of a typewriting
monkey, but it is rather a largely ordered mixture of functionally distinct parts of speech, we decided to take a deeper insight into a word frequency ranking by grouping the words according to their grammatical function. We distinguished two types of words: nouns and verbs, and put all other words (adjectives, adverbs, pronouns etc.) into the third joint group. We found that although the global behaviour of words is described approximately by the Zipf-Mandelbrot law, the words from different parts of speech do not necessarily follow this global picture and one can show significant differences in functional dependence of frequency on rank between nouns, verbs and other types of words. This can be a manifestation of the existence of a non-trivial internal organization of language that cannot be reproduced in full detail by a simple power-law relation of the Zipf-Mandelbrot type.

 

Legal notice
  • Legal notice:
 

Related papers

Presentation: Oral at 3 Ogólnopolskie Sympozjum "Fizyka w Ekonomii i Naukach Społecznych", by Jarosław Kwapień
See On-line Journal of 3 Ogólnopolskie Sympozjum "Fizyka w Ekonomii i Naukach Społecznych"

Submitted: 2007-09-14 16:34
Revised:   2009-06-07 00:44