Scholarometer is becoming a more mature tool. The idea behind scholarometer — crowdsourcing scholarly data — was presented at the Web Science 2010 Conference in Raleigh, North Carolina, along with some promising preliminary results. Recently acquired functionality includes a Chrome version, percentile calculations for all impact measures, export of bibliographic data in various standard formats, heuristics to determine reliable tags and detect ambiguous names, etc. Next up: an API to share annotation and impact data, and an interactive visualization for the interdisciplinary network.
The IEEE Spectrum piece Real-Time Search Stumbles Out of the Gate discusses the recent integration of real-time search features, such as Twitter and other microblog entries, into major search engines. Professor Filippo Menczer, CNetS associate director, comments in the article on the challenges posed by real-time search. Here is an excerpt of the interview:
IU’s Menczer suggests that with all this user-generated content, the environment is more complex than the one Google’s PageRank algorithm had to deal with. While search used to be about relationships between pages, he explains, now it’s about relationships between ”people, tags, Web pages, ratings, votes, and direct social links….It may not be that page A points to page B but rather that user John follows Mary and replies to the tweet of Jane and retweets it.” That makes it ”a more complicated ecosystem,” he says, ”but a very rich one,” and search engines will need ”more sophisticated ways to extract data from these relationships” […]
CNetS graduate student Diep Thi Hoang and associate director Filippo Menczer have developed a tool (called Scholarometer, previously Tenurometer in beta version) for evaluating the impact of scholars in their field. Scholarometer uses the h-index, which combines the scholarly output with the influence of the work, but adds the universal h-index proposed by Radicchi et al. to compare the impact of research in different disciplines. This is enabled by a social mechanism in which users of the tool collaborate to tag the disciplines of the scholars. “We have computer scientists, physicists, social scientists, people from many different backgrounds, who publish in lots of different areas,” says Menczer. However, the various communities have different citation methods and different publishing traditions, making it difficult to compare the influence of a sociologist and a computer scientist, for example. The universal h-index controls for differences in the publishing traditions, as well as the amount of research scholars in various fields have to produce to make an impact. Menczer is especially excited about the potential to help show how the disciplines are merging into one another. More from Inside Higher Ed… (Also picked up by ACM TechNews and CACM.)
Fil Menczer is one of the organizers of Hypertext 2009, the 20th ACM Conference on Hypertext an Hypermedia. The conference will be held June 29-July 1 at the Villa Gualino Convention Centre, on the hills overlooking Torino, Italy. Hypertext is the main venue for high quality peer-reviewed research on “linking.” The Web, the Semantic Web, the Web 2.0, and Social Networks are all manifestations of the success of the link. With a 70% increase in submissions, Hypertext 2009 will have a strong and diverse technical program covering all research concerning links: their semantics, their presentation, the applications, as well as the knowledge that can be derived from their analysis and their effects on society. The conference will also feature demos, posters, a student research competition, four workshops, and keynotes by Lada Adamic and Ricardo Baeza-Yates.
We study the structure and dynamics of Web traffic and social media usage patterns. One source of data is a stream of HTTP requests made by users at Indiana University (our Web traffic (click) dataset is available!). Gathering anonymized requests directly from the network allows us to examine large volumes of traffic data while minimizing biases associated with other data sources. However, we also leverage data from server logs and browser instrumentation. Referrer information is used to reconstruct the subset of the Web graph actually traversed by users.
Our goal is to develop a better understanding of user behavior online and creating more realistic models of Web and social media browsing. The potential applications of this analysis include improved designs for networks, sites, and server software; more accurate forecasting of traffic trends; classification of sites based on the patterns of activity they inspire; and improved ranking algorithms for search results.
Among our more intriguing findings are that server traffic (as measured by number of clicks) and site popularity (as measured by distinct users) both follow distributions so broad that they lack any well-defined mean. Actual Web traffic turns out to violate three assumptions of the random surfer model: users don’t start from any page at random, they don’t follow outgoing links with equal probability, and their probability of jumping is dependent on their current location. Search engines appear to be directly responsible for a smaller share of Web traffic than often supposed. These results were presented at WSDM2008 (paper | talk).
Another paper (also here; presented at Hypertext 2009) examined the conventional notion of a Web session as a sequence of requests terminated by an inactivity timeout. Such a definition turns out to yield statistics dependent primarily on the timeout value selected, which we find to be arbitrary. For that reason, we have proposed logical sessions defined by the target and referrer URLs present in a user’s Web requests.
Inspired by these findings, we designed a model of Web surfing able to recreate not only the broad distribution of traffic, but also the basic statistics of logical sessions. Late breaking results were presented at WSDM2009. Our final report in the ABC model was presented at WAW 2010.
Recent efforts aim to develop a general model of information foraging that could help understand how people decide when to browse, search, switch, or stop while consuming information, entertainment, and other resources online.
Mark Meiss was supported by the Advanced Network Management Laboratory, one of the Pervasive Technology Labs established at Indiana University with the assistance of the Lilly Endowment.
This research was also supported in part by the National Science Foundation (under awards 0348940, 0513650, and 0705676) and in part by the Institute for Information Infrastructure Protection research program. The I3P is managed by Dartmouth College and supported under Award Number 2003-TK-TX-0003 from the U.S. DHS, Science and Technology Directorate.
Opinions, findings, conclusions, recommendations or points of view of this group are those of the authors and do not necessarily represent the official position of the U.S. Department of Homeland Security, Science and Technology Directorate, I3P, National Science Foundation, or Indiana University.
The Web Dynamics group worked to build a better understanding of how the Web, the Wikipedia, and similar large information networks, grow and change over their lifetime. Of particular interest is how nodes in these networks gain popularity.
Our work painted a picture of the Web as a place in which popularity is very dynamic and unpredictable. Surges in popularity for topics are similar to earthquakes and avalanches in terms of their unpredictability — both in when they will happen and on what scale. However, we found that spikes in popularity are often correlated with events in the news — as evidenced by positive correlation between Google Trends data and traffic to bursty Wikipedia topics. Work on this project has been presented at SocialCom 2010 Symposium on Social Intelligence and Networking (SIN-10). A review of these issues with an emphasis on the modeling problem was also published in Physical Review Letters.
Finally, we studied the production of information in the attention economy — namely, how the production of new knowledge is associated to significant shifts of collective attention, which we take as proxy for its demand. This is consistent with a scenario in which allocation of attention toward a topic stimulates the demand for information about it, and in turn the supply of further novel information.
The goal of Sixearch (carl.cs.indiana.edu/6S) is to provide an open-source platform for developing a context aware personalized peer-to-peer (P2P) distributed information retrieval system. The application currently supports collaborative Web search with scalability.
Sixearch uses the idea of modeling neighbor nodes by their content but without assuming the presence of special directory hubs. As shown on the left, each peer is both a (limited) directory hub and a content provider; it has its own topical crawler guided by its user’s information content and local search engine. Peers communication is built on JXTA platform. When a user submits a query, it is first matched against the local engine, and then routed to neighbor peers to obtain more results. Ideally, the peer network should lead to the emergence of a clustered topology by intelligent collaboration between the peers. While traditional search engines such as Google and Yahoo provide access to very large document collections, the Sixearch P2P Web search application provides a complementary way for users to actively and collaboratively share their own document collections. However, the Sixearch framework allows traditional search engines to naturally be included as peers; such peers would quickly emerge as reliable, trustworthy, and general authority nodes.
The right figure displays a screenshot of the queries being sent among peers. Peer interactions are visualized by an applet. The area of each node is proportional to the size of its Web index. The edges represent the queries exchanged between two peers. The connectivity of each peer is an indirect measure of centrality, authority, and/or reliability of the peer as learned by the other peers.
Our work on Sixearch has been published in AAAI Magazine (preprint), and presented at Hyperterxt 2009 (demo), ACM SAC2009 (paper), RIAO2007 (demo), ACM CIKM P2PIR2006 (paper), WTAS2005 (paper), WWW2005 (poster), and WWW2004 (poster). Unfortunately, we have also been the victims of shameless plagiarism.
Visit Sixearch to learn more and download the application or contribute to it!
Members & Collaborators
|Sixearch project is based upon work supported by the the National Science Foundation under award No. IIS-0133124 and IIS-0348940 CAREER: Scalable Search Engines Via Adaptive Topic-Driven Crawlers.|
|Recently, Sixearch project has received the IBM 2007 UIMA award on 6S: A Collaborative Web Search Network.|
News | People | Publications | Meetings | Mailing list | Talks | Pics | Computing | Data | Coffee
Networks & agents Network (NaN)
NaN is a research group exploring the modeling, simulation, and analysis of complex social and information networks, and the human and artificial agents who live in these networks. Broadly speaking our research spans network science, data science, web science, and computational social science. Recently our focus has been on modeling the dynamic processes that occur online (how information networks grow and evolve, how memes go viral, how social media can be manipulated for the spread of misinformation, how attention bursts and other traffic patterns emerge, etc.) and on the design of tools to make the Web and social media ‘better’ (more trustworthy, reliable, intelligent, autonomous, robust, personalized, contextual, scalable, adaptive, and so on). We collaborate with colleagues at the IU Network Science Institute (IUNI), ISI Foundation, Yahoo Research, and many other institutions.
Active NaN projects
Archived NaN projects
A PNAS paper on Growing And Navigating The Small World Web By Local Content was announced in press releases by PNAS News and UIowa. A radio interview for the program Science in Action was broadcast by BBC World Service (QuickTime | Flash | MP3). The paper received coverage in Technology Research News, ACM TechNews, Complexity Digest, Insight, @-web, Ascribe, Boston.com, E4, ResearchBuzz
Web pages cluster by content type (TRN News)