Saturday, August 19, 2006
This image was generated using graphviz on a Mac, default settings. Word co-occurrances with at least 5000 instances are represented as arcs. (see previous
post for details). This is a portion of the complete image.
The full version is at
The AOL user query strings  were analyzed to count the number of times each word appears with each other word in a query, i.e., counts of"co-occurrences". (Each part of a domain name or URL is treated as a separate word. Words with less than three characters, and the words "for", "the", "www" and "com" where ignored.) The result can be drawn as a graph, where nodes are words and arcs mean they co-occur at least 10000 times in the ~20 million-query database. This image shows the largest connected subgraph of the result, rendered using graphviz.
 G. Pass, A. Chowdhury, C. Torgeson, "A Picture of Search" The First International Conference on Scalable Information Systems, Hong Kong, June, 2006.