Search for content and authors |
The probability distributions and the fluctuation scalings of the time series of key-word counts in nation-wide blog data. |
Hayafumi Watanabe 1, Yukie Sano 6, Hideki Takayasu 2,3,4, Misako Takayasu 5 |
1. The institute of Statistical Mathematics, Tokyo 190-8562, Japan |
Abstract |
In analyses of social media data, one of the most important basic objects is the time series representing the appearance of considered keywords. We aim to describe this fluctuation precisely, whereas the majority of previous research has focused on “trends” in the time series (i.e., nonrandom parts of the time series) for practical reasons. To elucidate the nontrivial empirical statistical properties of fluctuations of a typical nonsteady time series representing the appearance of words in blogs, we investigated approximately 3 billion Japanese blog articles over a period of six years and analyse some corresponding mathematical models. First, we introduce a solvable nonsteady extension of the random diffusion model, which can be deduced by modeling the behavior of heterogeneous random bloggers. Next, we deduce theoretical expressions for both the temporal and ensemble fluctuation scalings of this model, and demonstrate that these expressions can reproduce all empirical scalings over eight orders of magnitude. Furthermore, we show that the model can reproduce other statistical properties of time series representing the appearance of words in blogs, such as functional forms of the probability density and correlations in the total number of blogs. As an application, we quantify the abnormality of special nationwide events by measuring the fluctuation scalings of 1771 basic adjectives. [1] Phys. Rev. E 94, 052317 (2016) |
Legal notice |
|
Related papers |
Presentation: Oral at Econophysics Colloquium 2017, Symposium C, by Hayafumi WatanabeSee On-line Journal of Econophysics Colloquium 2017 Submitted: 2017-03-20 03:01 Revised: 2017-03-20 03:44 |