All posts by tgholbro

Talk by Alessio Cardillo

Speaker: Alessio Cardillo, École Polytechnique Fédérale de Lausanne
Title: Automatic identification of relevant concepts in scientific publications
Date: 02/10/2017
Time: 12:15pm
Room: Informatics East 322

Abstract: Recently, scientists have devoted many efforts to study the organization and evolution of science by exploiting the textual information contained in the articles like: keywords and terms extracted from title/abstract. However, only few studies focus on the analysis of the core of an article, i.e., its body. The access to the whole text of documents allows to study, instead, the organization of scientific knowledge using networks of similarity between articles based on their whole content.

I use the concepts extracted from the documents/articles available within the ScienceWISE platform to build the network of similarity between them. However, such network possesses a remarkably high link density (36%). As a consequence, attempts of associating groups of documents (communities) to a given topic are of limited success. The reason is that not all the concepts are equally informative and may not be useful to discriminate the articles. The presence of “generic concepts” gives rise to spurious similarities responsible for a large amount of connections in the system.

To get rid of such concepts, I will introduce a method to gauge their relevance according to an information-theoretic approach. The significance of a concept $c$ is encoded by the distance between its maximum entropy, $S_{\max}$, and the observed one, $S_c$. After removing concepts having an entropy within a certain distance from the maximum, I rebuild the similarity network and analyze its community structure (topics). The consequences of this are twofold: the number of links decreases, as well as the noise present in the strength of similarities between articles. Hence, the filtered network displays a more well defined community structure, where each community contains articles related to a specific topic. Finally, the method can be applied to any kind of documents, and works also in a coarse-grained mode since it is able to identify the relevant concepts for a certain set of articles, allowing the study of a documents corpus at different scales.

Bio: Alessio Cardillo is currently postdoc research fellow at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. His research interests focus on the analysis of the structure of networked systems like: urban mobility and street patterns, scientific collaborations, collections of documents and multiplex networks. He is also interested in the emergence of collective behaviours such as cooperation or synchronization by means of coevolutionary dynamics.



Talk by Orion Penner


Speaker: Orion Penner, École polytechnique fédérale de Lausanne
Title: The Returns to Scientific Specialization
Date: 11/16/2016
Time: 12:30pm
Room: Informatics East 322

Abstract: While it is well established that researchers specialize, the extent to which they specialize has gone, largely, unexamined. We have developed an approach for measuring the extent to which a researcher is specialized, and in turn, use it to quantify the returns to specialization. In this we exploit a longitudinal dataset 50,000+ researchers, each starting his or her career 1975 or later. Analyzing this dataset we find there are significant returns to specialization. For example, at mean career age and publishing rate, a one-standard deviation increase in specialization leads to a 20 per cent increase in citations. We further show that the returns to specialization are greatest early in a researcher’s career and decrease as a researcher ages. Similarly, returns are greater when publishing at a lower rate and decrease at higher rates of publishing.

Bio: Orion Penner is a Postdoctoral Researcher in the Chair of Innovation and IP Policy at the École polytechnique fédérale de Lausanne. His recent, current and future research largely focuses on the academic career trajectory. Prior to Switzerland, he spent three years at IMT Lucca carrying out research, broadly speaking, on the Economics of Science and Innovation. He earned his PhD in Physics from the University of Calgary, working on problems in complex systems, networks and bioinformatics. He currently holds a Swiss National Science Foundation Ambizione grant, having previously held a SSHRC Postdoctoral Fellowship at IMT Lucca and an NSERC Canada Graduate Scholarship during his PhD.


Talk by Woo Seong Jo


Speaker: Woo Seong Jo, Sungkyunkwan University
Title: Ph.D. Candidate
Date: 10/12/2016
Time: 11am
Room: Informatics East 322
Abstract: We use user-accessible profiles from a web-based well-known social networking service specialized in business and employment. Users often provide information of their work experiences and positions in firms they worked, and also write what are their work skills. We first construct a bipartite network of users who work or worked at a specific company in 2013 and their skills. After then we make projection to the network of skills. From the time evolution of the skill network constructed for a company, we find that an interesting pattern emerges when the company starts a new business sector. As well as the business strategies, we observe how skills are fused with others in skill network.
Bio: Wooseong Jo is a Ph.D. Candidate in Statistical Physics in Sungkyunkwan University (supervisor: Beom Jun Kim). His interests are in modeling and visualization of complex systems such as society, human dynamics as well as the equilibrium system in statistical physics. He has researched on various subjects: analysis fragility in world-bank networks, dynamics of spreading pests, and traditional problems such as spin system and percolation.

Talk by Lucas Jeub

Speaker: Lucas Jeub, Postdoctoral Fellow, School of Informatics & Computing, Indiana University
Title: Local Communities, Mesoscopic Structure, and Multilayer Networks
Date: 09/12/2016
Time: 11:30 am
Room: Informatics East 322
Abstract: There are many methods to detect dense “communities” of nodes in networks,  and there are now several methods to detect communities in multilayer networks. One way to define a community is as a set of nodes that trap a diffusion-like dynamical process (usually a random walk) for a long time. In this view, communities are sets of nodes that create bottlenecks to the spreading of a dynamical process on a network. We analyze the local behavior of different random walks on synthetic and empirical monolayer and multiplex networks (the latter are multilayer networks in which different layers correspond to different types of edges). We show that bottlenecks to random walks can reveal interesting mesoscale structure in networks that go beyond classical communities. There are different ways to generalize a random walk to multilayer networks.  We show that they have very different bottlenecks that hence correspond to rather different notions of what it means for a set of nodes to be a good community. This has direct implications for the behavior of community-detection methods that are based on these random walks. The ill-defined nature of the community-detection problem makes it crucial to develop generative models of networks to use as a common test of community-detection tools. For mono-layer networks different types of benchmark models are available. We develop a family of benchmarks for detecting mesoscale structures in multilayer networks by introducing a generative model that can explicitly incorporate dependency structure between layers. Our benchmark provides a standardized set of null models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. We discuss the parameters and properties of our generative model, and we illustrate its use by comparing a variety of community-detection methods.


Talk by Pan-Jun Kim

PJKim_photoSpeaker: Pan-Jun Kim, Leader of the Junior Research Group at the Asia Pacific Center for Theoretical Physics
Title: I Am My Genes, Wire, and Microbes
Date: 03/09/2016
Time: 1:00pm
Room: Informatics East 122
Abstract: A primary challenge in biology is to explain how complex phenotypes arise from individual molecules encoded in genes. Molecular interaction networks offer a key to understand how genotypes are translated into phenotypes. For example, sleep/wake cycles in animals are generated by molecular circuits of interacting genes and gene products, called circadian clocks. Circadian clocks are important for plant life as well, and surprisingly, the plant clock circuitry is overwhelmingly composed of inhibitory, rather than activating, interactions between genes. We found that this unique structure facilitates the coordination of temporally-distant clock events that are sharply peaked at very specific times of day, suggesting a design principle of the plant clock machinery. However, considering only genes in a given organism and its own molecular interaction networks may not be enough to understand the holistic picture of the organism’s phenotypes. For example, our resident gut microbial community, or gut microbiome, provides us with a variety of biochemical capabilities not encoded in our genes. This human gut microbiome is linked not only to our health, but also to various disorders such as obesity, cancer, and diabetes. We constructed the first literature-curated global interaction network of the human gut microbiome mediated by various chemicals. Using our network, we conducted a systematic analysis of the microbiomes in type 2 diabetes patients, and revealed the fundamental metabolic infrastructure of the entire gut ecosystem contributing to the pathology of type 2 diabetes. Our network framework shows promise for investigating complex microbe-microbe and host-microbe chemical cross-talk, and identifying disease-associated features.

Bio: Pan-Jun Kim is a Leader of the Junior Research Group at the Asia Pacific Center for Theoretical Physics (APCTP) and an Adjunct Professor in the Department of Physics at Pohang University of Science and Technology (POSTECH). He is enthusiastic about raising a broad range of scientific questions to tackle the complexity behind nature and society, and applies mathematical and computational methods to answer those questions. His recent work covers a new framework to explore the human body’s microbial ecosystem linked to our health and disease, a design principle of biological clockwork for circadian rhythms, technology evolution in society, and even nutritional structure of various foods. Kim received the Young Physicist Award from the Korean Physical Society, and worked as an Institute for Genomic Biology Fellow at the University of Illinois at Urbana-Champaign after earning his PhD in Statistical Physics from KAIST in 2008.

Talk by Michal B. Paradowski

Speaker: Michal B. Paradowski, Assistant Professor
Institute of Applied Linguistics, University of Warsaw
Title: Complexity phenomena in linguistics
Date: 03/11/2016
Time: 3:30pm
Room: Informatics East 122
Abstract: Throughout history language sciences have been dealing with numerous phenomena that are either inherently complex/dynamic systems, or which display characteristic properties of such systems. Within an individual, one can point to perceptual dynamics and categorisation in speech, the emergence of phonological templates, or word and sentence processing; across society, think variations and typology, the rise of new grammatical constructions, semantic bleaching, language evolution in general, and the spread and competition of both individual expressions, and entire languages.
A handful of language phenomena will be depicted which have been known to exhibit such properties as hysteresis, phase transition, bifurcation, attractor states, or power law distribution. The multifaceted dynamism and complexity will also be discussed of the process of language acquisition, highlighting the importance of adopting designs with different timescales in order to trace language development as a process of change over time, of the utility of time-series analyses, and of the ability to determine optimal temporal integration windows, e.g. in analyses of dynamic motifs in human communication.
The talk will conclude with a presentation of the results of two small-scale projects applying social network analysis (SNA) to language phenomena. One involved exploring the social propagation of neologisms in a microblogging service, the other investigating the impact of peer influence on second-language learning outcomes. Using the methods of complexity science, from local, low-level interactions between individuals verbally communicating with one another we can describe the processes underlying the emergence of more global systemic order and dynamics. Hypotheses will be presented which account for the novel findings.

Bio: Michał B. Paradowski is an assistant professor at the Institute of Applied Linguistics, University of Warsaw, a teacher and translator trainer, and an ELT consultant for television, and currently a visiting scholar at the Department of Second Language Studies, Indiana University, Bloomington. His interests include issues relating to second and third language acquisition research, cross-linguistic influence, bi- and multilingualism, psycholinguistics, embodied cognition, and complexity science. His recent edited volumes are Teaching Languages off the Beaten Track (2014) and Productive Foreign Language Skills for an Intercultural World (2015).

Talk by Minsu Park

Speaker: Minsu Park, Cornell University
Title: Understanding Musical Diversity via Online Media
Date: 08/07/15
Time: 2:00 pm
Room: Informatics East 122
Abstract: Musicologists and sociologists have long been interested in patterns of music consumption and their relation to socioeconomic status. In particular, the Omnivore Thesis examines the relationship between these variables and the diversity of music a person consumes. Using data from social media, and Twitter, we design and evaluate a measure that reasonably captures diversity of musical tastes. We use that measure to explore association between musical diversity and variables that capture socioeconomic status, demographics, and personal traits such as openness and degree of interest in music (intones). Our musical diversity measure can provide a useful means for studies of musical preferences and consumption. Also, our study of the Omnivore Thesis provides insights that extend previous survey and interview-based studies.


Talk by Jisun An

Speaker: Jisun An, PhD candidate, University of Cambridge
Title: Analyzing Social Media for Designing Fit-For-Purpose Systems: From Politics to Business
Date: 11/19/2013
Time: 11am
Room: Informatics East 122
Abstract: Researchers in different disciplines have been studying human behavior in a variety of contexts, and have largely done so upon small-scale data coming from surveys and ethnographic observations. Social media sites now offer a unique opportunity to study individual and social characteristics at scale for a long period of time in unobtrusive ways. In this talk, I will focus on analyses done in two different contexts – political news sharing and micro-investment  – and will show how to translate the corresponding insights into practical implications for the design of fit-for-purpose systems.
Biography: Jisun An is a PhD candidate in a Computer Laboratory at the University of Cambridge and a member of the NetOS group. Her research interest is in analyzing online social media and social network with large-scale data and leveraging its properties to a platform that supports people to make improved choices in social, economic and political domains. Her research lies at the intersection of machine learning, network science, social science, and human computer interaction. For her study, she was funded by EPSRC and she is now an honorable recipient of Google European Scholarship in social computing. Since starting her PhD, she has been fortunate to have opportunities to collaborate with pioneers in social network analysis (e.g., MPI-SWS (Germany), KAIST (South Korea), PARC (USA), and Yahoo! Barcelona (Spain)).

Talk by Zhong-Yuan Zhang

Speaker: Zhong-Yuan Zhang
Title: Semi-Supervised Community Structure Detection in Social Networks Based on Matrix De-noising
Date: 10/15/2012
Time: 1pm
Room: Informatics East 122
Abstract: Constrained clustering has been well-studied in the unsupervised learning society. However, how to encode constraints into community detection process of the complex social networks remains a challenging problem. We propose a semi-supervised learning framework for community structure detection. This framework implicitly encodes the must-link and cannot-link constraints by modifying the adjacency matrix of the network, which can also be regarded as the de-noising process of the consensus matrix of the community structures. Our proposed method gives consideration to both the topology and the functions (background information) of the complex network, which improves the interpretability of the results. The comparisons performed on the synthetic benchmarks and the real-world networks show that the framework can significantly improve the detection performance with few constraints, which makes it an attractive methodology in the analysis of complex social networks.


Talk by Cosma Shalizi

Speaker: Cosma Shalizi, Carnegie Mellon University
Title: Homophily, Contagion, Confounding: Pick Any Three
Date: 11/27/2012
Time: 1pm
Room: Informatics East 130
Abstract: A person’s behavior can often be predicted from that of their neighbors in a social network. This is sometimes explained by homophily, the tendency to form social ties with others because we resemble them.  It is also sometimes explained by social contagion or social influence, the tendency to act like someone because they are our neighbor.  We show that, generically, these two mechanisms are confounded with each other, and with the causal effect of an individual’s attributes on their behavior. Distinguishing them requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular, simple examples show that asymmetries in regression coefficients cannot identify causal effects, and that imitation (a form of social contagion) can produce substantial correlations between an individual’s enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these non-identifiability results.  (Joint work with Andrew Thomas)