As adaptive peer network systems becoming an increasingly important development in Web search technology, in this research, an alternative model for peer based Web search is introduced to address the scale problem of centralized search engines. Queries are first matched against the local engine, and then routed to neighbor peers to obtain more results. Initially the network has a random topology (like Gnutella) and queries are routed randomly as in the flood model. However, the protocol includes a learning algorithm by which each peer uses the results of its interactions with its neighbors to refine a model of the other peers. This model is used to dynamically route queries according to the predicted match with other peers’ knowledge. The network topology is thus modified on the fly based on learned contexts and current information needs.
The goal of Sixearch (carl.cs.indiana.edu/6S) is to provide an open-source platform for developing a context aware personalized peer-to-peer (P2P) distributed information retrieval system. The application currently supports collaborative Web search with scalability.
Sixearch uses the idea of modeling neighbor nodes by their content but without assuming the presence of special directory hubs. As shown on the left, each peer is both a (limited) directory hub and a content provider; it has its own topical crawler guided by its user’s information content and local search engine. Peers communication is built on JXTA platform. When a user submits a query, it is first matched against the local engine, and then routed to neighbor peers to obtain more results. Ideally, the peer network should lead to the emergence of a clustered topology by intelligent collaboration between the peers. While traditional search engines such as Google and Yahoo provide access to very large document collections, the Sixearch P2P Web search application provides a complementary way for users to actively and collaboratively share their own document collections. However, the Sixearch framework allows traditional search engines to naturally be included as peers; such peers would quickly emerge as reliable, trustworthy, and general authority nodes.
A screenshot of peer interactions
The right figure displays a screenshot of the queries being sent among peers. Peer interactions are visualized by an applet. The area of each node is proportional to the size of its Web index. The edges represent the queries exchanged between two peers. The connectivity of each peer is an indirect measure of centrality, authority, and/or reliability of the peer as learned by the other peers.
Le-Shin Wu and Ruj Akavipat have released the latest version (v.0.3) of sixearch.org, formerly known as 6S. This collaborative, social Web search network allows intelligent adaptive agents to collaborate in a peer network whose emergent structure evolves to discover semantic relationships between peer interests and knowledge. A review of the system should appear in a forthcoming special issue of the AI Magazine on networks in AI. We hope people will download and use the sixearch tool, as we need critical mass to analyze its behavior and performance. We also ask volunteers to participate in a user study that will further aid in collecting data on usage. Try it!
On a side note, it’s a relief to report that the appalling plagiarism episodes of which we have been victims are finally being acknowledged and that measures are beginning to be taken to partially mitigate the damage we have suffered.
We won an IBM UIMA Innovation Award to incorporate UIMA into 6S. An Apache incubator project, the Unstructured Information Management Architecture is an open, industrial-strength platform for unstructured information analysis and search. It will be used to make it easy to develop different semantic search algorithms and deploy them on customized 6S peers. The award will support one graduate student for one year. Try our latest 6S prototype!