Congratulations to Rion Correia, who successfully defended his PhD dissertation on Prediction of Drug Interaction and Adverse Reactions, with data from Electronic Health Records, Clinical Reporting, Scientific Literature, and Social Media, using Complexity Science Methods. Dr. Correia’s research used network science, machine learning, and data science to uncover population-level associations of drugs and symptoms, useful for public health surveillance. His findings show that Social Media (Instagram and Twitter) and Electronic Health Records of an entire city in Southern Brazil, are very useful to reveal how the Drug interaction phenomenon varies across distinct groups. For instance, he identifying gender biases and specific communities of interest in chronic disease (e.g. Epilepsy and Depression). In addition to Complex Networks and Systems, his dissertation contributes to the fields of biomedical informatics and precision public health by leveraging heterogeneous data sources at multiple levels to understand population and individual pharmacology differences and other public health problems.
Congratulations to Dimitar Nikolov, who successfully defended his PhD dissertation on Information Exposure Biases in Online Behaviors. Dr. Nikolov’s research explored the unintentional biases introduced by filtering, ranking, and recommendation algorithms that mediate our online consumption of information. His findings show that our reliance on modern online technologies limits exposure to diverse points of view and makes us vulnerable to misinformation. In particular, he analyzed two massive Web traffic datasets to quantify the popularity and homogeneity bias of several popular online platforms including social media, email, personalized news, and search engines. He also leveraged Twitter data to characterize the link between political partisanship and vulnerability to online pollution, such as fake news, conspiracy theories, and junk science. His dissertation contributes to the field of computational social science by putting the study of bias in information consumption and derived phenomena like political polarization, echo chambers, and online pollution on a more firm quantitative foundation.
Speaker: Ricardo Baeza-Yates, Universitat Pompeu Fabra, Spain & Universidad de Chile
Title: Data and Algorithmic Bias in the Web
Room: Info East 122
Abstract: The Web is the largest public big data repository that humankind has created. In this overwhelming data ocean we need to be aware of the quality and in particular, of biases that exist in this data, such as redundancy, spam, etc. These biases affect the algorithms that we design to improve the user experience. This problem is further exacerbated by biases that are added by these algorithms, especially in the context of search and recommendation systems. They include ranking bias, presentation bias, position bias, etc. We give several examples and their relation to sparsity, novelty, and privacy, stressing the importance of the user context to avoid these biases.
Bio: Ricardo Baeza-Yates areas of expertise are information retrieval, web search and data mining, data science and algorithms. He was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from January 2006 to February 2016. He is part time Professor at DTIC of the Universitat Pompeu Fabra, in Barcelona, Spain. Until 2004 he was Professor and founding director of the Center for Web Research at the Dept. of Computing Science of the University of Chile. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. Since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions.
We are excited to announce that the ACM Web Science 2014 Conference will be hosted by our center on the beautiful IUB campus June 23–26, 2014. Web Science studies the vast information network of people, communities, organizations, applications, and policies that shape and are shaped by the Web, the largest artifact constructed by humans in history. Computing, physical, and social sciences come together, complementing each other in understanding how the Web affects our interactions and behaviors. Previous editions of the conference were held in Athens, Raleigh, Koblenz, Evanston, and Paris. The conference is organized on behalf of the Web Science Trust by general co-chairs Fil Menczer, Jim Hendler, and Bill Dutton. Follow us on Twitter and see you in Bloomington!
To foster the study of the structure and dynamics of Web traffic networks, we are making available to the research community a large Click Dataset of
13 53.5 billion HTTP requests collected at Indiana University. Between 2006 and 2010, our system generated data at a rate of about 60 million requests per day, or about 30 GB/day of raw data. We hope that this data will help develop a better understanding of user behavior online and create more realistic models of Web traffic. The potential applications of this data include improved designs for networks, sites, and server software; more accurate forecasting of traffic trends; classification of sites based on the patterns of activity they inspire; and improved ranking algorithms for search results.
We welcome the Web Science Lab to our center! This underscores our ongoing collaborations in the emerging discipline of Web Science. Since February 2012, CNetS is a member of WSTNet, an international network bringing together world-class research laboratories to support the Web Science research and education program. The Web Science Network of Laboratories combines some of the world’s leading academic researchers in Web Science, with academic programs that enhance the already growing influence of Web Science. The member labs, from institutions that also include USC, MIT, Northwestern, Oxford, and Southampton among others, provide valuable support for the ongoing development of Web Science. Contributions from the labs include the organization and hosting of summer schools, workshops and meetings, including the WebSci conference series.
On December 16, Mark Meiss presented our paper “Modeling Traffic on the Web Graph” (with Bruno, José, Sandro, and Fil) at the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010), at Stanford. In this paper we introduce an agent-based model that explains many statistical features of aggregate and individual Web traffic data through realistic elements such as bookmarks, tabbed browsing, and topical interests.
Online popularity can be thought of as analogous to an earthquake; it is sudden, unpredictable, and the effects are severe. While shifts in online popularity are not inherently destructive – consider the unprecedented magnitude of online giving via Twitter following the disaster in Haiti – they indicate radical swings in society’s collective attention. Given the increasingly profound effect that large-scale opinion formation has on important phenomena like public policy, culture, and advertising profits, understanding this behavior is essential to understanding how the world operates.
In this paper by Ratkiewicz and colleagues, the authors put forth a web-wide analysis that includes large-scale data sets of the online behaviors of millions of people. The paper offers a novel model that is is capable of reproducing all of the observed dynamics of online popularity through a mechanism that causes sudden, nonlinear bursts of collective attention. These results have been mentioned in the APS and PhysOrg websites.
Scholarometer is becoming a more mature tool. The idea behind scholarometer — crowdsourcing scholarly data — was presented at the Web Science 2010 Conference in Raleigh, North Carolina, along with some promising preliminary results. Recently acquired functionality includes a Chrome version, percentile calculations for all impact measures, export of bibliographic data in various standard formats, heuristics to determine reliable tags and detect ambiguous names, etc. Next up: an API to share annotation and impact data, and an interactive visualization for the interdisciplinary network.