LinkedIn announced that YY Ahn and his team of Ph.D. students from the Center for Complex Networks and Systems Research, including Yizhi Jing, Adazeh Nematzadeh, Jaehyuk Park, and Ian Wood, is one of the 11 winners of the LinkedIn Economic Graph Challenge.
Their project, “Forecasting large-scale industrial evolution,” aims to understand the macro-evolution of industries to track businesses and emerging skills. This data would be used to forecast economic trends and guide professionals toward promising career paths.
“This is a fascinating opportunity to study the network of industries and people with unprecedented details and size. All of us are very excited to collaborate with LinkedIn and our LinkedIn mentor, Mike Conover, who is a recent Informatics PhD alumnus, on this topic,” said Ahn. Read more…
The new Indiana University Network Science Institute (IUNI) unites 100+ researchers at IU — building on their world-renowned multidisciplinary expertise toward further scientific understanding of the complex networked systems of our world. Through pioneering new approaches in mapping, representing, visualizing, modeling, and analyzing diverse complex networks across levels and disciplines, IUNI will lead the way. We keep track of the big picture — ever-changing and interconnected. We’re laying the groundwork for innovative research and discovery in the area of network science.
Research by our Truthy team was recently featured in New Scientist, USA Today, and the cover story of Science News. The Truthy project, developed by CNetS researchers and doctoral students, aims to study the factors affecting the spread of information — and misinformation — in social media.
The Truthy site charts tweet sentiment and volume related to themes such as social movements and news. It also monitors Twitter activity to build interactive networks that let visitors visualize the diffusion networks of memes, identify the most influential information spreaders, and explore those influential feeds and other information about their online activity, such as sentiment and language. Other tools let you map the geo-temporal diffusion of memes, generate YouTube movies that display how hashtags emerge and connect, and download data directly from Twitter. With these analytics, one can begin to ask question such as: How does sentiment change in response to events and memes? What memes survive over time? Who are the most influential users on a particular topic?
Network flow records are high-level descriptions of Internet connections that offer information about the endpoints and volume of data involved but not access to the actual data transferred. Because the collection and analysis of flow data is much more tractable for high-speed networks than deep packet inspection, flow data has become popular for a variety of applications in network management, especially anomaly and intrusion detection.
Most flow-based security products and research efforts use a relational model of flow data, in which each flow is simply another tuple. Our group’s research takes a different approach: we instead regard each network flow as contributing weight to a directed edge in a graph. For various forms of analysis, the nodes in these graphs may represent clients, servers, entire autonomous systems, or even individual TCP ports. We examine the structural properties and dynamics of graphs derived from flow data in an attempt to develop more robust forms of anomaly detection and improved models of Internet traffic. We believe this structural approach to flow analysis reveals patterns within Internet traffic difficult to discern through more traditional analysis.
Our research has yielded a number of surprising results, chief among them the finding that the distribution of network traffic per host is so broad and so well characterized by a power law that standard threshold-based anomaly detection systems must choose their thresholds arbitrarily, making them inherently susceptible to either over- or under-reporting. We have also found that Web traffic exhibits superlinear scaling between degree and strength: the more servers contacted by a Web client, the more data that client tends to exchange with each server. Finally, we have been successful in using properties of traffic graphs to construct a taxonomy of network applications based on the behavior of their users, allowing us to classify unknown applications without resorting to packet inspection.
Our more recent efforts in this area relate to understanding statistical biases in traffic graphs caused by the use of packet sampling in the routers that generate flow data; and to using spectral analysis of the connectivity matrices associated with traffic graphs to identify anomalous hosts.
This research is also supported in part by the National Science Foundation under awards 0348940, 0513650.
Our primary source of anonymized network flow data is the Internet2 (Abilene) network.
Opinions, findings, conclusions, recommendations or points of view of this group are those of the authors and do not necessarily represent the official position of the National Science Foundation, Internet2, or Indiana University.
UPDATE: As of 2015 the GiveAlink project has been archived and the GiveALink.org website is no longer operational.
Link analysis algorithms leverage hyperlinks created by authors as semantic endorsements between pages, while social bookmarks provide a way to leverage annotations by information consumers as a source of information about pages. This project explores a novel approach that is a synergy of the two: soliciting annotations from users about the content of pages, in a way that implicitly forms networks of relationships between and among resources and tags. These socially generated relationships are then aggregated to build bottom-up, global semantic similarity networks. Algorithms are developed to construct, analyze, and mine these networks in support of search and recommendation applications, exploratory navigation interfaces, resource management utilities, tag spam detection, and incentive games to accelerate the achievement of critical mass.
To extrapolate both annotations about content (tags) and semantic relationships (similarity) from single users to the “wisdom of the crowd,” the project investigates an information-theoretic model that extracts semantic assessments from information structures that many users are already maintaining, namely the bookmarks and tags they manage on their browsers or online. This entails the design and evaluation of several network-based measures and algorithms, such as similarity, novelty, centrality, and focus. Among the aims of this model are the exploration of the duality between resources (URLs) and concepts (tags or categories) and the integration of social annotation and collaborative filtering. One way to provide users with immediate value is to integrate client-based taxonomies and server-based folksonomies for social bookmark management. Both traditional users of browser bookmarks and social users of online bookmarks can take advantage of the same semantic maps while retaining the convenience of intuitive browser interfaces and centralized storage.
Strategic collaborations to share data, accelerate evaluation, and maximize impact are under way with key groups in Europe through the TAGora Project and its partners at Rome Sapienza, Sony Paris, the ISI Foundation in Torino, and the BibSonomy group at Kassel University. GiveALink.org (supported by a wonderful computing and storage infrastructure) is an open social bookmarking platform developed to experiment with and demonstrate the ideas of this project. The algorithms and data generated by the project are made available to the Web community to facilitate analysis, the development of improved network algorithms, and integration with other Internet applications. Early results of this project have been presented at various conferences and workshops including LinkKDD2005, AAAI2006, and HT2008. More recent publications are listed below. To learn more, donate your bookmarks, play with our system, and download our data and applications please visit GiveALink.org.
Collaborators & Alumni:
We should also acknowledge Todd Holloway for his contributions to the early search engine; Luis Rocha and Ana Maguitman for suggesting the idea of ranking and searching by novelty; Mark Meiss, who thought of the catchy name for GiveALink; and Rob Henderson, quite possibly the greatest sysadmin around.