The project, dubbed “Truthy,” makes use of complex computer models to analyze the sharing of information on social media to determine how popular sentiment, user influence, attention, social network structure, and other factors affect the manner in which information is disseminated. Additionally, an important goal of the Truthy project is to better understand how social media can be abused.
In recent weeks, the Truthy project has come under criticism from some, who have misrepresented its goals. Contrary to these claims, the Truthy project is not designed and has not been used to create a database of political misinformation to be used by the federal government to monitor the activities of those who oppose its policies.
Truthy is not intended and is not capable to determine whether a statement constitutes “misinformation.” The target is the study of the structural patterns of information diffusion.
For example, an email sent simultaneously to a million addresses is likely spam, even if we have no automatic way to determine whether its content is true or false. The assumption behind the Truthy effort is that an understanding of the spreading patterns may facilitate the identification of abuse, independent from the nature or political color of the communication.
The Truthy platform is not informed by political partisanship. While it provides support to study the evolution of communication in all portions of the political spectrum, the machine learning algorithms used to identify suspicious patterns of information diffusion are entirely oblivious to the possibly political partisanship of the messages.
8/28/2014 Update: Despite the clarifications in this post, Fox News and others continued to perpetrate their attacks to our research project and to the PI personally. Their accusations are based on false claims, supported by bits of text and figures selectively extracted from our writings and presented completely out of context, in misleading ways. They did not bother to contact any of the researchers for comments before publishing these outlandish conspiracy theories. There is a good dose of irony in a research project that studies the diffusion of misinformation becoming the target of such a powerful disinformation machine.
9/3/2104 Update: David Uberti wrote an accurate account of recent events in Columbia Journalism Review.
First, a few words about what the Truthy research project is not:
- a political watchdog
- a government probe of social media
- an attempt to suppress free speech
- a way to define “misinformation”
- a partisan political effort
- a database tracking hate speech
- Truthy is a research project of the Center for Complex Networks and Systems Research at the IU School of Informatics and Computing. It aims to study how information spreads on social media, such as Twitter.
- The project has focused on domains such as news, politics, social movements, scientific results, and trending social media topics. Researchers develop theoretical computer models and validate them by analyzing public data, mainly from the Twitter streaming API.
- Social media posts available through public APIs are processed without human intervention or judgment to visualize and study the spread of memes. We aim to build a platform to make these analytic tools easily accessible to social scientists, reporters, and the general public.
- An important goal of the project is to help mitigate misuse and abuse of social media by helping us better understand how social media can be potentially abused. For example: when social bots are used to create the appearance of human-generated communication (hence the name “truthy”). We study whether it is possible to automatically differentiate between organic content and so-called “astroturf.”
- Examples of research to date include analyses of geographic and temporal patterns in movements like Occupy Wall Street, societal unrest in Turkey, the polarization of online political discourse, the use of social media data to predict election outcomes and stock market movements, and the geographic diffusion of trending topics.
- On the more theoretical side, we have studied how individuals’ limited attention span affects what information we propagate and what social connections we make, and how the structure of social networks can help predict which memes are likely to become viral.
- Hundreds of researchers across the U.S. and the world are studying similar issues based on the same data and with analogous goals — these topics were studied well before the advent of social media.
Congratulations to Onur Varol, Emilio Ferrara, Chris Ogan, Fil Menczer, and Sandro Flammini for winning the ACM Web Science 2014 Best Paper Award with their paper Evolution of online user behavior during a social upheaval (preprint). In the paper, the authors study the pivotal role played by Twitter during the political mobilization of the Gezi Park movement in Turkey. By analyzing over 2.3 million tweets produced during 25 days of protest in 2013, the authors show that similarity in trends of discussion mirrors geographic cues. The analysis also reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. Finally, the study highlights how real-world events, such as political speeches and police actions, affect social media conversations and trigger changes in individual behavior.
Congratulations also go to Luca Aiello and Rossano Schifanella, both former visitors and members of CNetS, who won the Best Presentation Award with their talk on Reading the Source Code of Social Ties (preprint).
The DESPIC team at the Center for Complex Systems and Networks Research (CNetS) presented a demo of a new tool named BotOrNot at a DoD meeting held in Arlington, Virginia on April 23-25, 2014. BotOrNot (truthy.indiana.edu/botornot) is a tool to automatically detect whether a given Twitter user is a social bot or a human. Trained on Twitter bots collected by our lab and the infolab at Texas A&M University, BotOrNot analyzes over a thousand features from the user’s friendship network, content, and temporal information in real time and estimates the degree to which the account may be a bot. In addition to the demo, the DESPIC team (including colleagues at the University of Michigan) presented several posters on Scalable Architecture for Social Media Observatory, Meme Clustering in Streaming Data, Persuasion Detection in Social Streams, High-Resolution Anomaly Detection in Social Streams, and Early Detection and Analysis of Rumors. See more coverage of BotOrNot on PCWorld, IDS, BBC, Politico, and MIT Technology Review.
Congratulations to Lilian Weng, who successfully defended her Informatics PhD dissertation titled Information diffusion on online social networks. The thesis provides insights into information diffusion on online social networks from three aspects: people who share information, features of transmissible content, and the mutual effects between network structure and diffusion process. The first part delves into the limited human attention. The second part of Dr. Weng’s dissertation investigates properties of transmissible content, particularly into the topic space. Finally, the thesis presents studies of how network structure, particularly community structure, influences the propagation of Internet memes and how the information flow in turn affects social link formation. Dr. Weng’s work can contribute to a better and more comprehensive understanding of information diffusion among online social-technical systems and yield applications to viral marketing, advertisement, and social media analytics. Congratulations from her colleagues and committee members: Alessandro Flammini, YY Ahn, Steve Myers, and Fil Menczer!
On August 11, 2013, the New York Times published an article by Ian Urbina with the headline: I Flirt and Tweet. Follow Me at #Socialbot. The article reports on how socialbots (software simulating people on social media) are being designed to sway elections, to influence the stock market, even to flirt with people and one another. Fil Menczer is quoted: “Bots are getting smarter and easier to create, and people are more susceptible to being fooled by them because we’re more inundated with information.” The article also mentions the Truthy project and some of our 2010 findings on political astroturf.
Inspired by this, the writers of The Good Wife consulted with us on an episode in which the main character finds that a social news site is using a socialbot to bring traffic to the site, defaming her client. The episode aired on November 24, 2013, on CBS (Season 5 Episode 9, “Whack-a-Mole”). Good show!
A story in Nature discusses a recent paper (preprint) from CNetS members Jasleen Kaur, Filippo Radicchi and Fil Menczer on the universality of scholarly impact metrics. In the paper, we present a method to quantify the disciplinary bias of any scholarly impact metric. We use the method to evaluate a number of established scholarly impact metrics. We also introduce a simple universal metric that allows to compare the impact of scholars across scientific disciplines. Mohsen JafariAsbagh integrated this metric into Scholarometer, a crowdsourcing system developed by our group to collect and share scholarly impact data. The Nature story highlight how one can use normalized impact metrics to rank all scholars, as illustrated in the widget shown here.
Findings by CNetS researchers on social media indicators of election results received significant coverage in the national press. The paper More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior by Joseph Digrazia, Karissa McKelvey, Johan Bollen, and Fabio Rojas was presented at the 2013 Meeting of the American Sociological Association in NYC. It was covered by NPR, The Wall Street Journal, MSNBC, C-SPAN, The Washington Post, The Atlantic, and many other media.
Congratulations to Przemyslaw Grabowicz, Luca Aiello, and Fil Menczer for winning the WICI Data Challenge. A prize of $10,000 CAD accompanies this award from the Waterloo Institute for Complexity and Innovation at the University of Waterloo. The Challenge called for tools and methods that improve the exploration, analysis, and visualization of complex-systems data. The winning entry, titled Fast visualization of relevant portions of large dynamic networks, is an algorithm that selects subsets of nodes and edges that best represent an evolving graph and visualizes it either by creating a movie, or by streaming it to an interactive network visualization tool. The algorithm is deployed in the movie generation tool of the Truthy system, which allows users to create, in near-real time, YouTube videos that illustrate the spread and co-occurrence of memes on Twitter. Przemek and Luca worked on this project while visiting CNetS in 2011 and collaborating with the Truthy team. Bravo!
UPDATE: With legal review completed, we re-launched Kinsey Reporter V.2!
CNetS, in collaboration with The Kinsey Institute, has released Kinsey Reporter, a global mobile survey platform for collecting and sharing anonymous data about sexual and other intimate behaviors. The pilot project allows citizen observers around the world to use free applications now available for Apple and Android mobile platforms to not only report on sexual behavior and experiences, but also to share, explore and visualize the accumulated data.
This new platform will allow us to explore issues that have been challenging to study until now, such as the prevalence of unreported sexual violence in different parts of the world, or the correlation between various sexual practices like condom use, for example, and the cultural, political, religious or health contexts in particular geographical areas.
The Kinsey Institute’s longstanding seminal studies of sexual behaviors created a perfect synergy with research going on at CNetS related to mining big data crowd-sourced from mobile social media. The sensitive domain — sexual relations — added an intriguing challenge in finding a way to share useful data with the community while protecting the privacy and anonymity of the reporting volunteers.
To foster the study of the structure and dynamics of Web traffic networks, we are making available to the research community a large Click Dataset of
13 53.5 billion HTTP requests collected at Indiana University. Between 2006 and 2010, our system generated data at a rate of about 60 million requests per day, or about 30 GB/day of raw data. We hope that this data will help develop a better understanding of user behavior online and create more realistic models of Web traffic. The potential applications of this data include improved designs for networks, sites, and server software; more accurate forecasting of traffic trends; classification of sites based on the patterns of activity they inspire; and improved ranking algorithms for search results.