|
Vittorio Loreto |
|
|
Universita' di Roma - La Sapienza |
Abstract
The enormous increase of popularity and use of the WWW has led in the
recent years to important changes in the ways people communicate. An
interesting example of this fact is provided by the now very popular
social annotation systems, through which users annotate resources
(such as web pages or digital photographs) with text keywords dubbed
tags. Collaborative tagging has been quickly gaining ground because of
its ability to recruit the activity of web users into effectively
organizing and sharing vast amounts of information. Understanding the
rich emerging structures resulting from the uncoordinated actions of
users calls for an interdisciplinary effort. In particular concepts
borrowed from statistical physics, such as random walks, and the
complex networks framework, can effectively contribute to the
mathematical modeling of social annotation systems. First I'll
introduce a stochastic model of user behavior embodying two main
aspects of collaborative tagging: (i) a frequency-bias mechanism
related to the idea that users are exposed to each other's tagging
activity; (ii) a notion of memory, or aging of resources, in the form
of a heavy-tailed access to the past state of the system. Remarkably,
this simple modeling is able to account quantitatively for the
observed experimental features with a surprisingly high accuracy. This
points in the direction of a universal behavior of users who, despite
the complexity of their own cognitive processes and the uncoordinated
and selfish nature of their tagging activity, appear to follow simple
activity patterns. Next I'll show how the process of social annotation
can be seen as a collective but uncoordinated exploration of an
underlying semantic space, pictured as a graph, through a series of
random walks. This modeling framework reproduces several aspects, so
far unexplained, of social annotation, among which the peculiar growth
of the size of the vocabulary used by the community and its complex
network structure that represents an externalization of semantic
structures grounded in cognition and typically hard to access.