Tag Archives: network

CNetS team winner in LinkedIn Economic Graph Challenge

The CNetS team
The CNetS team

LinkedIn announced that YY Ahn and his team of Ph.D. students from the Center for Complex Networks and Systems Research, including Yizhi Jing, Adazeh Nematzadeh, Jaehyuk Park, and Ian Wood, is one of the 11 winners of the LinkedIn Economic Graph Challenge.

Their project, “Forecasting large-scale industrial evolution,” aims to understand the macro-evolution of industries to track businesses and emerging skills. This data would be used to forecast economic trends and guide professionals toward promising career paths.

“This is a fascinating opportunity to study the network of industries and people with unprecedented details and size. All of us are very excited to collaborate with LinkedIn and our LinkedIn mentor, Mike Conover, who is a recent Informatics PhD alumnus, on this topic,” said Ahn. Read more…

Indiana University Network Science Institute

iuni
IUNI announcement in Science magazine

The new Indiana University Network Science Institute (IUNI) unites 100+ researchers at IU — building on their world-renowned multidisciplinary expertise toward further scientific understanding of the complex networked systems of our world. Through pioneering new approaches in mapping, representing, visualizing, modeling, and analyzing diverse complex networks across levels and disciplines, IUNI will lead the way. We keep track of the big picture — ever-changing and interconnected. We’re laying the groundwork for innovative research and discovery in the area of network science.

Truthy Team Wins WICI Data Challenge

WICI Data Challenge AwardCongratulations to Przemyslaw Grabowicz, Luca Aiello, and Fil Menczer for winning the WICI Data Challenge. A prize of $10,000 CAD accompanies this award from the Waterloo Institute for Complexity and Innovation at the University of Waterloo. The Challenge called for tools and methods that improve the exploration, analysis, and visualization of complex-systems data. The winning entry, titled Fast visualization of relevant portions of large dynamic networks, is an algorithm that selects subsets of nodes and edges that best represent an evolving graph and visualizes it either by creating a movie, or by streaming it to an interactive network visualization tool. The algorithm is deployed in the movie generation tool of the Truthy system, which allows users to create, in near-real time, YouTube videos that illustrate the spread and co-occurrence of memes on Twitter. Przemek and Luca worked on this project while visiting CNetS in 2011 and collaborating with the Truthy team. Bravo!

Truthy elections analytics tool

Science News cover

Truthy elections diffusion network

Research by our Truthy team was recently featured in New Scientist, USA Today, and the cover story of Science News. The Truthy project, developed by CNetS researchers and doctoral students, aims to study the factors affecting the spread of information — and misinformation — in social media.

The Truthy site charts tweet sentiment and volume related to themes such as social movements and news. It also monitors Twitter  activity to build interactive networks that let visitors visualize the diffusion networks of memes, identify the most influential information spreaders, and explore those influential  feeds and other information about their online activity, such as sentiment and language. Other tools let you map the geo-temporal diffusion of memes, generate YouTube movies that display how hashtags emerge and connect, and download data directly from Twitter. With these analytics, one can begin to ask question such as: How does sentiment change in response to events and memes? What memes survive over time? Who are the most influential users on a particular topic?

For more press coverage go to the Truthy press page.

Folks in Folksonomies at WSDM 2010

Fil will present the paper Folks in folksonomies: Social link prediction from shared metadata (authored with Rossano Schifanella, Alain Barrat, Ciro Cattuto, and Ben Markines) at WSDM 2010 in New York on February 5. The paper discusses homophily, or more specifically the relationship between social connections and social tagging in folksonomies. We show that social similarity measures based on annotations can be effective predictors of friendship relationships. For the occasion, we are making our Last.fm dataset publicly available.

Network Flow Analysis

degree vs. strength for different classes of network traffic
Network flow records are high-level descriptions of Internet connections that offer information about the endpoints and volume of data involved but not access to the actual data transferred. Because the collection and analysis of flow data is much more tractable for high-speed networks than deep packet inspection, flow data has become popular for a variety of applications in network management, especially anomaly and intrusion detection.

Most flow-based security products and research efforts use a relational model of flow data, in which each flow is simply another tuple. Our group’s research takes a different approach: we instead regard each network flow as contributing weight to a directed edge in a graph. For various forms of analysis, the nodes in these graphs may represent clients, servers, entire autonomous systems, or even individual TCP ports. We examine the structural properties and dynamics of graphs derived from flow data in an attempt to develop more robust forms of anomaly detection and improved models of Internet traffic. We believe this structural approach to flow analysis reveals patterns within Internet traffic difficult to discern through more traditional analysis.

Our research has yielded a number of surprising results, chief among them the finding that the distribution of network traffic per host is so broad and so well characterized by a power law that standard threshold-based anomaly detection systems must choose their thresholds arbitrarily, making them inherently susceptible to either over- or under-reporting. We have also found that Web traffic exhibits superlinear scaling between degree and strength: the more servers contacted by a Web client, the more data that client tends to exchange with each server. Finally, we have been successful in using properties of traffic graphs to construct a taxonomy of network applications based on the behavior of their users, allowing us to classify unknown applications without resorting to packet inspection.

Our more recent efforts in this area relate to understanding statistical biases in traffic graphs caused by the use of packet sampling in the routers that generate flow data; and to using spectral analysis of the connectivity matrices associated with traffic graphs to identify anomalous hosts.

Results from this project have been presented at WWW2005 (paper), the Workshop on Structure and Function of Complex Networks (slides), the  IPAM Workshop on Random and Dynamic Graphs and Networks (slides), the Statphys 23 satellite meeting on Complex Networks: from Biology to Information Technology (paper), and NGDM2007. For an archival publication see our paper Properties and Evolution of Internet Traffic Networks from Anonymized Flow Data in ACM TOIT.

Active Participants

Mark Meiss
Mark Meiss
Fil Menczer, PI
Fil Menczer
Alex Vespignani
Alex Vespignani

Support

Pervasive Technology Labs at Indiana University Mark Meiss is supported by the Advanced Network Management Laboratory, which is one of the Pervasive Technology Labs established at Indiana University with the assistance of the Lilly Endowment.
Nsf_logo This research is also supported in part by the National Science Foundation under awards 0348940, 0513650.
Internet2 logo Our primary source of anonymized network flow data is the Internet2 (Abilene) network.

Opinions, findings, conclusions, recommendations or points of view of this group are those of the authors and do not necessarily represent the official position of the National Science Foundation, Internet2, or Indiana University.

GiveALink

givealinkUPDATE: As of 2015 the GiveAlink project has been archived and the GiveALink.org website is no longer operational.

Link analysis algorithms leverage hyperlinks created by authors as semantic endorsements between pages, while social bookmarks provide a way to leverage annotations by information consumers as a source of information about pages. This project explores a novel approach that is a synergy of the two: soliciting annotations from users about the content of pages, in a way that implicitly forms networks of relationships between and among resources and tags. These socially generated relationships are then aggregated to build bottom-up, global semantic similarity networks. Algorithms are developed to construct, analyze, and mine these networks in support of search and recommendation applications, exploratory navigation interfaces, resource management utilities, tag spam detection, and incentive games to accelerate the achievement of critical mass.

To extrapolate both annotations about content (tags) and semantic relationships (similarity) from single users to the “wisdom of the crowd,” the project investigates an information-theoretic model that extracts semantic assessments from information structures that many users are already maintaining, namely the bookmarks and tags they manage on their browsers or online. This entails the design and evaluation of several network-based measures and algorithms, such as similarity, novelty, centrality, and focus. Among the aims of this model are the exploration of the duality between resources (URLs) and concepts (tags or categories) and the integration of social annotation and collaborative filtering. One way to provide users with immediate value is to integrate client-based taxonomies and server-based folksonomies for social bookmark management. Both traditional users of browser bookmarks and social users of online bookmarks can take advantage of the same semantic maps while retaining the convenience of intuitive browser interfaces and centralized storage.

Strategic collaborations to share data, accelerate evaluation, and maximize impact are under way with key groups in Europe through the TAGora Project and its partners at Rome Sapienza, Sony Paris, the ISI Foundation in Torino, and the BibSonomy group at Kassel University. GiveALink.org (supported by a wonderful computing and storage infrastructure) is an open social bookmarking platform developed to experiment with and demonstrate the ideas of this project. The algorithms and data generated by the project are made available to the Web community to facilitate analysis, the development of improved network algorithms, and integration with other Internet applications. Early results of this project have been presented at various conferences and workshops including LinkKDD2005, AAAI2006, and HT2008. More recent publications are listed below. To learn more, donate your bookmarks, play with our system, and download our data and applications please visit GiveALink.org.

Project Members

Fil Menczer, PI
Fil Menczer (PI)
Lilian
Lilian Weng
Dimitar
Dimitar Nikolov

Collaborators & Alumni:

Rossano Schifanella
Rossano Schifanella
Jacob Ratkiewicz
Jacob Ratkiewicz
Heather Roinestad
Heather Roinestad
Ben Markines
Ben Markines
ciro cattuto
Ciro Cattuto
Katrina Panovich
Katrina Panovich
Wouter Van den Broeck
Wouter Van den Broeck
John Burgoon
John Burgoon
Mira Stoilova
Mira Stoilova


We should also acknowledge Todd Holloway for his contributions to the early search engine; Luis Rocha and Ana Maguitman for suggesting the idea of ranking and searching by novelty; Mark Meiss, who thought of the catchy name for GiveALink; and Rob Henderson, quite possibly the greatest sysadmin around.

Dataset

Related Publications

Wiki (team only)

Support

This project is supported by the National Science Foundation under award IIS-0811994: Social Integration of Semantic Annotation Networks for Web Applications. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.