Congratulations to Rion Correia, who successfully defended his PhD dissertation on Prediction of Drug Interaction and Adverse Reactions, with data from Electronic Health Records, Clinical Reporting, Scientific Literature, and Social Media, using Complexity Science Methods. Dr. Correia’s research used network science, machine learning, and data science to uncover population-level associations of drugs and symptoms, useful for public health surveillance. His findings show that Social Media (Instagram and Twitter) and Electronic Health Records of an entire city in Southern Brazil, are very useful to reveal how the Drug interaction phenomenon varies across distinct groups. For instance, he identifying gender biases and specific communities of interest in chronic disease (e.g. Epilepsy and Depression). In addition to Complex Networks and Systems, his dissertation contributes to the fields of biomedical informatics and precision public health by leveraging heterogeneous data sources at multiple levels to understand population and individual pharmacology differences and other public health problems.
Congratulations to Dimitar Nikolov, who successfully defended his PhD dissertation on Information Exposure Biases in Online Behaviors. Dr. Nikolov’s research explored the unintentional biases introduced by filtering, ranking, and recommendation algorithms that mediate our online consumption of information. His findings show that our reliance on modern online technologies limits exposure to diverse points of view and makes us vulnerable to misinformation. In particular, he analyzed two massive Web traffic datasets to quantify the popularity and homogeneity bias of several popular online platforms including social media, email, personalized news, and search engines. He also leveraged Twitter data to characterize the link between political partisanship and vulnerability to online pollution, such as fake news, conspiracy theories, and junk science. His dissertation contributes to the field of computational social science by putting the study of bias in information consumption and derived phenomena like political polarization, echo chambers, and online pollution on a more firm quantitative foundation.
To foster the study of the structure and dynamics of Web traffic networks, we are making available to the research community a large Click Dataset of
13 53.5 billion HTTP requests collected at Indiana University. Between 2006 and 2010, our system generated data at a rate of about 60 million requests per day, or about 30 GB/day of raw data. We hope that this data will help develop a better understanding of user behavior online and create more realistic models of Web traffic. The potential applications of this data include improved designs for networks, sites, and server software; more accurate forecasting of traffic trends; classification of sites based on the patterns of activity they inspire; and improved ranking algorithms for search results.
On December 16, Mark Meiss presented our paper “Modeling Traffic on the Web Graph” (with Bruno, José, Sandro, and Fil) at the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010), at Stanford. In this paper we introduce an agent-based model that explains many statistical features of aggregate and individual Web traffic data through realistic elements such as bookmarks, tabbed browsing, and topical interests.
Online popularity can be thought of as analogous to an earthquake; it is sudden, unpredictable, and the effects are severe. While shifts in online popularity are not inherently destructive – consider the unprecedented magnitude of online giving via Twitter following the disaster in Haiti – they indicate radical swings in society’s collective attention. Given the increasingly profound effect that large-scale opinion formation has on important phenomena like public policy, culture, and advertising profits, understanding this behavior is essential to understanding how the world operates.
In this paper by Ratkiewicz and colleagues, the authors put forth a web-wide analysis that includes large-scale data sets of the online behaviors of millions of people. The paper offers a novel model that is is capable of reproducing all of the observed dynamics of online popularity through a mechanism that causes sudden, nonlinear bursts of collective attention. These results have been mentioned in the APS and PhysOrg websites.
NaN had a strong presence at Hypertext 2009 in Torino:
- Mark’s paper What’s in a session: tracking individual behavior on the web was nominated for the Best Paper Award.
- Heather presented the demo Incentives for social annotation about the prototype Firefox extension for GiveALink.org, and the tagging game. (Heather is also demoing at SIGIR’09 in Boston.)
- I presented the demo Sixearch.org 2.0 peer application for collaborative web search about the latest release of Sixearch.
- Ben presented his poster on A scalable, collaborative similarity measure for social annotation systems.
This week at NaN I’ll be running through a very preliminary version of my thesis talk, “Structural Mining of Large-Scale Behavioral Data from the Internet,” which is all about the things you can discover using network flow data and Web clicks. The big things that I’ll be looking to get as feedback have to do with organization and pruning — I’m not terribly confident of the order in which I present the material, and I have a LOT more things to say than time to say it in, so I can use some suggestions on what to keep and what to just point to the actual document for.
(And, yes, there will be cookies.)
This sabbatical is providing wonderful opportunities for me to present our work and establish/strengthen collaborations with several groups in Italy. Recently I have given invited seminars on social search at the Department of Informatics at the University of Torino (hosts Matteo Sereno and Mino Anglano) and on Web traffic at the Department of Math at the University of Padova (host Massimo Marchiori). In the next few weeks I will give a talk on social search at the Department of Informatics and Information Science at the University of Genova (host Marina Ribaudo) and one on search engine bias and Web modeling at my old stomping ground, the Institute of Cognitive Sciences and Technologies of the National Research Council in Rome (host my undergraduate advisor and mentor Domenico Parisi).
Mark Meiss presented our work on Web click traffic analysis at the First International Conference on Web Search and Data Mining (WSDM 2008) on the beautiful Stanford campus. His talk was very well received, as reported in Greg Linden’s blog. The quality of the conference itself was very good, with several excellent presentations. Those I found most interesting were Qiaozhu Mei’s talk on search log entropy, Nick Craswell on click position bias, Carlos Castillo on social media, and Paul Heymann on social bookmarks. I think future WSDM conferences should have an award for the best-delivered talk. This year I would have voted for Carlos Castillo. The next WSDM will be in Barcelona.
I just got back from a visit to Yahoo! Research Silicon Valley. I gave two talks presenting our work on social search and web traffic analysis, and met lots of interesting people. They have an amazing group and of course mountains of data to lust after. Hopefully this will lead to collaborations in the future, given the many intersecting research interests.