May 3, 2009, 9:14 pm
The aim of this project is to characterize, study and model the sources of bias that emerge from the complex network structure of the Web and from the use of search engines. The feedback loops between users searching information, users creating content, and the ranking algorithms of search engines that mediate between them, lead to surprising results. We are studying how all these systems and communities influence and feed on each other in a dynamic information ecology, and how these interactions affect their evolution and their impact on the global processes of information discovery, retrieval, and utilization.
For example, studying the relationship between Web traffic and PageRank, we have shown that given the heterogeneity of topical interests expressed by search queries, search engines mitigate the popularity bias generated by the rich-get-richer structure of the Web graph. These results, dispelling the feared Googlearchy affect, have been published in Proc. Natl. Acad. Sci. USA, presented at the WAW 2006 keynote (slides), and generated some media attention. You can see some movies demonstrating the finding. The result also inspired a robust rank-based model of scale-free network growth, published in Phys. Rev. Lett. (press release).
We also study sources of bias that stem from legal, political, or economic factors. The CENSEARCHIP tool visualizes the differences between results obtained from different search engines, or different country versions of a search engine. This tool, based on a technique described in this paper in First Monday, generated a lot of reactions in the media and the blogosphere (press release).
Project Participants

Fil Menczer

Sandro Flammini

Alex Vespignani

Santo Fortunato

Mark Meiss
Support
Opinions, findings, conclusions, recommendations or points of view of this group are those of the authors and do not necessarily represent the official position of the National Science Foundation, the Volkswagen Foundation, or Indiana University.
March 31, 2009, 4:54 pm
The goal of Sixearch.org is to provide an open-source platform for developing a context aware personalized peer-to-peer (P2P) distributed information retrieval system. The application currently supports collaborative Web search with scalability.
Sixearch uses the idea of modeling neighbor nodes by their content but without assuming the presence of special directory hubs. As shown on the left, each peer is both a (limited) directory hub and a content provider; it has its own topical crawler guided by its user’s information content and local search engine. Peers communication is built on JXTA platform. When a user submits a query, it is first matched against the local engine, and then routed to neighbor peers to obtain more results. Ideally, the peer network should lead to the emergence of a clustered topology by intelligent collaboration between the peers. While traditional search engines such as Google and Yahoo provide access to very large document collections, the Sixearch P2P Web search application provides a complementary way for users to actively and collaboratively share their own document collections. However, the Sixearch framework allows traditional search engines to naturally be included as peers; such peers would quickly emerge as reliable, trustworthy, and general authority nodes.

A screenshot of peer interactions
The right figure displays a screenshot of the queries being sent among peers. Peer interactions are visualized by the applet at viz.sixearch.org. The area of each node is proportional to the size of its Web index. The edges represent the queries exchanged between two peers. The connectivity of each peer is an indirect measure of centrality, authority, and/or reliability of the peer as learned by the other peers.
Our work on Sixearch has been published in AAAI Magazine (preprint), and presented at Hyperterxt 2009 (demo), ACM SAC2009 (paper), RIAO2007 (demo), ACM CIKM P2PIR2006 (paper), WTAS2005 (paper), WWW2005 (poster), and WWW2004 (poster). Unfortunately, we have also been the victims of shameless plagiarism.
Visit Sixearch.org to learn more and download the application or contribute to it!
Members & Collaborators

Fil Menczer

Le-Shin Wu

Ruj Akavipat

Namrata Lele

Rossano Schifanella

Ana Maguitman
Support
February 19, 2009, 7:29 pm
Networks & agents Network
NaN is a research group exploring complex systems, adaptive agents, modeling, simulation, artificial life, and complex (information, biological, and social) networks. We especially focus on the Web as a complex information network in which we leave abundant traces of our social and semantic activities: what we do, what we are interested in, whom we talk to, what knowledge we acquire and contribute. Our research spans from modeling the dynamic processes that occur on the Web (how information networks grow and evolve, how individual and collective traffic patterns emerge, how attention bursts are generated and shaped by social and search tools) to designing tools that mine the Web to build better search, navigation, management, and recommendation tools (where ‘better’ means more intelligent, autonomous, robust, personalized, contextual, scalable, adaptive, and so on).
We have many ongoing collaborations with colleagues in the Complex Networks Lagrange Lab at the Institute for Scientific Interchange (ISI) Foundation in Torino, Italy.
Active NaN projects

GiveALink

Sixearch

Web Traffic

Network Flow Analysis

Web Security

Web Dynamics

Search Bias

Text & Link Modeling
Archived NaN projects

InfoSpiders

Web Topologies

IntelliShopper

ELSA

LEE

BioNets

ACE
March 30, 2008, 7:04 pm
This sabbatical is providing wonderful opportunities for me to present our work and establish/strengthen collaborations with several groups in Italy. Recently I have given invited seminars on social search at the Department of Informatics at the University of Torino (hosts Matteo Sereno and Mino Anglano) and on Web traffic at the Department of Math at the University of Padova (host Massimo Marchiori). In the next few weeks I will give a talk on social search at the Department of Informatics and Information Science at the University of Genova (host Marina Ribaudo) and one on search engine bias and Web modeling at my old stomping ground, the Institute of Cognitive Sciences and Technologies of the National Research Council in Rome (host my undergraduate advisor and mentor Domenico Parisi).
November 18, 2007, 6:27 pm
I just got back from a visit to Yahoo! Research Silicon Valley. I gave two talks presenting our work on social search and web traffic analysis, and met lots of interesting people. They have an amazing group and of course mountains of data to lust after. Hopefully this will lead to collaborations in the future, given the many intersecting research interests.
October 22, 2007, 6:07 pm
I will give a talk on social search at the Workshop on Social Data Mining and Knowledge Building, part III of the Mathematics of Knowledge and Search Engines program. The workshop, organized by IPAM, will be held 5–9 November 2007 at UCLA. Joining me as speakers are Luis Rocha and Stan Wasserman from IU and Santo Fortunato and Jose Ramasco from ISI/CNLL. Should be fun!
September 25, 2007, 5:29 pm
September 30, 2006, 10:07 am
Whether scientific or business, structured or unstructured, streaming or static, the need to effectively search for and use data is paramount. That need will be fulfilled at the Data and Search Institute at the Indiana University School of Informatics. Funded by a planning grant from the National Science Foundation Industry/University Cooperation Center program, the institute will speed the flow of data and search into industry and provide a framework where scientists can engage in industry-relevant research. More…