Luis Rocha

Luis Rocha

Luis M. Rocha is Associate Professor of Informatics and Computing in Informatics and Computing, member of the Center for Complex Networks and Systems, and core faculty of the Cognitive Science Program at Indiana University, Bloomington, USA. He is also the director of the Computational Biology Collaboratorium and in the Direction of PhD program in Computational Biology at the Instituto Gulbenkian da Ciencia, Portugal. His research is on complex systems, computational biology, artificial life, embodied cognition and bio-inspired computing. He received his Ph.D in Systems Science in 1997 from the State University of New York at Binghamton. From 1998 to 2004 he was a permanent staff scientist at the Los Alamos National Laboratory, where he founded and led a Complex Systems Modeling Team during 1998-2002, and was part of the Santa Fe Institute research community. He has organized major conferences in the field such as the Tenth International Conference on the Simulation and Synthesis of Living Systems (Alife X) and the Ninth European Conference on Artificial Life (ECAL 2007). He has published many articles in scientific and technology journals, and has been the recipient of several scholarships and awards. At Indiana University, he has received the School of Informatics and Computing Teaching Excellence Award for 2006 after developing a new syllabus for an introductory undergraduate course on Informatics and a new graduate course on biologically-inspired computing.

Information about my research group meetings, projects, and ongoing work is available on the CASCI Website. Additional information about my research, academic and personal activities is available on my web site. Contact Information:

In The USA:
Center for Complex Networks and Systems Research
School of Informatics & Computing
Indiana University, 919 E. 10th St
Bloomington IN, 47408, USA

In Portugal:
FLAD Computational Biology Collaboratorium
Instituto Gulbenkian de Ciência
Rua da Quinta Grande, 6
Apartado 14, P-2781-901 Oeiras, Portugal

Course and research-related blogs

Back to CASCI Research

The Agent-Based T-Cell Cross-regulation Model for Document Classification

The Agent-Based T-Cell Cross-regulation Model for Document Classification.

We have developed a bio-inspired solution for binary classification of textual documents inspired by T-cell cross-regulation in the vertebrate adaptive immune system, which is a complex adaptive system of millions of cells interacting to distinguish between self and nonself substances. In analogy, automatic document classification assumes that the interaction and co-occurrence of thousands of words in text can be used to identify conceptually-related classes of documents—at a minimum, two classes with relevant and irrelevant documents for a given concept (e.g. articles with protein-protein interaction information). Our agent-based method for document classification expands the analytical model of Carneiro et al, by allowing us to deal simultaneously with many distinct populations of antigen-specific T-Cells and their collective dynamics. We have extended this model to produce a spam-detection system. We have also developed our agent-based model further to apply it to biomedical article classification, testing it on a dataset of biomedical articles provided by the BioCreative 2.5 challenge. Our results are useful for biomedical text mining, but they also help us understand T-cell cross-regulation as a potential general principle of classification available to collectives of molecules without a central controller. While there is still much to know about the specifics of T-cell cross-regulation in adaptive immunity, Artificial Life allows us to explore alternative emergent classification principles while producing useful bio-inspired tools. Recently, we started expanding this algorithm to other forms of classification such as sensor data from human-robot interactions under an IUCRG project.


Project Members

Luis Rocha

Luis Rocha

Al Abi-Haidar

Al Abi-Haidar

Ian Wood

Ian Wood



Funding

Project partially funded by:


Selected Project Publications

Back to CASCI Research


Pathway modules in the Canalizing Dynamics of the Drosophila Segment Polarity Network

Pathway modules in the Canalizing Dynamics of the Drosophila Segment Polarity Network

The paradigmatic example of a complex system is the web of biochemical interactions that make up life. We still know very little about the organization of life as a dynamical, interacting network of genes, proteins and biochemical reactions. How do biochemical networks—containing many regulatory, signaling, and metabolic processes—achieve reliability and robustness? Cells function reliably despite noisy dynamic environments, which is all the more impressive given that control strategies implemented by intra and inter-cellular processes cannot rely on a centralized, global view of the relevant networks. Are the resulting complex dynamics made up of relatively autonomous modules? If so, what is their functional role and how can they be identified? How robust is the collective computation performed by intra-cellular networks to mutations, delays and stochastic noise? To address these questions, we are focused on developing both novel methodologies and informatics tools to study control and collective computation in automata networks used to model gene regulation and biochemical signaling.

Modeling of biochemical signaling, regulation, modularity, robutsness and emergent computation in the dynamics of complex networks. Our methodology identifies canalizing control patterns in discrete automata models of biochemical networks. Currently working with models of genetic regulation in yeast , flowering of Arabidopsis thaliana; body segmentation in Drosophila, intracellular signal transduction in fibroblasts, biochemical pathways in granular leukemic lymphocytes, an integrated genome-scale transcriptional and metabolic network for E-Coli, and others. Our approach allows us to model biochemical signaling, regulation, modularity [Kolchinsky and Rocha, 2011], robutsness and emergent computation in the dynamics of complex networks modeled as automata networks.

Two-symbol Schemata Redescription of the Transition Look-Up tables of Automata

Two-symbol Schemata Redescription of the Transition Look-Up tables of Automata

Schema Redescription in Cellular Automata

Schema redescription with two symbols is a method to eliminate redundancy in the transition tables of Boolean automata. One symbol is used to capture redundancy of individual input variables, and another to capture permutability in sets of input variables: fully characterizing the canalization present in Boolean functions. Two-symbol schemata explain aspects of the behavior of automata networks that the characterization of their emergent patterns does not capture. We have used our method to show that despite having very different collective behavior, CA rules can be very similar at the local interaction level [Marques-Pita and Rocha, 2011]—leading us to question the tendency in complexity research to pay much more attention to emergent patterns than to local interactions. We have also used schema redescription to obtain more amenable search spaces of CA rules for the Density Classification Task—obtaining some of the best known rules for this task. [Marques-Pita and Rocha, 2008, Marques-Pita, Mitchell, and Rocha, 2008].


Emergent Computation in the AND Rule

Emergent Computation in the AND Rule

Origin of Representations in Evolving Cellular Automata

We have been interested on the problem of how information, symbols, representations and the like can arise from a purely dynamical system of many components. This is a topic of particular interest in Cognitive Science, where the notions of representation and symbol often divide the field into opposing camps. Often, in the area of Embodied Cognition the idea of self-organization in dynamical systems leads many researchers to reject representational or semiotic elements in their models of cognition. This attitude seems not only excessive, but indeed absurd as it ignores the informational processes so important for biological organisms. Therefore, we have been working both on a re-formulation of the concept of representation for embodied cognition, as well as on simulations of dynamical systems (using Celular Automata) where one can study the origin of representations.

The Evolving Cellular Automata experiments of Crutchfield, Mitchell et al, in the late 1990′s were very exciting, as the ability of evolved cellular automata to solve non-trivial computation tasks seemed to provide clues about the origin of representations and information from dynamical systems [Mitchell, 1998] [Rocha ,1998b]. We conducted additional experiments which extended the density classification task with more difficult logical tasks [Rocha ,2000; Rocha, 2004]. Later, we proposed a re-formulation of the concept of representation in cognitive science and artificial life which is based on this work, but argues that the type of emergent computations observed in these experiments do not produce representations quite as rich as those as observed in biology and cognition [Rocha and Hordijk ,2005]. These experiments allow us to think about how to evolve symbols from artificial matter in computational environments. The figure above, depicts a space-time diagram and particle model of a CA rule evolved to solve the AND task . Some additional Figures and experiment details of CA rules for logical tasks in our experiments are also available.



Project Members and Collaborators

Luis Rocha

Luis Rocha

Wim

Wim Hordijk

 Melanie Mitchell

Melanie Mitchell

Santiago Schnell

Santiago Schnell

  Manuel Marques-Pita

Manuel Marques-Pita

Artemy Kolchinsky

Artemy Kolchinsky

Santosh Manicka

Santosh Manicka

Marcio Mourao

Marcio Mourao


Funding

Project partially funded by .

  • Fundacao para a Ciencia e Tecnologia, Portugal. PTDC/EIA-CCO/114108/2009. Project title: “Collective Computation and Control in Complex Biochemical Systems
  • Fundação Luso-Americana para o Desenvolvimento (Portugal) and National Science Foundation (USA), 2012-2014. Project title: “Network Mining For Gene Regulation And Biochemical Signaling.” (171/11)

Selected Project Publications

Back to CASCI Research

Subnetwork of word co-occurrence proximity (with 34 words) for a specific document from the first BioCreative competition. The red nodes denote the words retrieved from a s specific GO annotation (0007266: Rho, protein, signal, transduce). The blue nodes denote the words that co-occur very frequently with at least one of the red nodes: the co-occurrence neighborhood of the GO words. The green nodes denote the additional words discovered by our network algorithm as described in (Verspoor et al,2005).

Much of the research presently conducted in the biomedical domain relies on the induction of correlations and interactions from data. Because we ultimately want to increase our knowledge of the biochemical and functional roles of genes and proteins in organisms, there is a clear need to integrate the associations and interactions among biological entities that have been reported and accumulate in the literature and databases. Biomedical literature mining is an important informatics methodology for large scale information extraction from repositories of textual documents, as well as for integrating information available in various domain-specific databases and ontologies, ultimately leading to knowledge discovery. It helps us tap into the biomedical collective knowledge, and uncover relationships and interactions buried in the literature and databases, and even those inferred from global information but unreported in individual experiments. Our approach to literature mining is based on bottom-up, data-driven or bio-inspired methods, which we have applied to automatic discovery, classification and annotation of protein-protein and drug-drug interactions, pharmacokinetic data, protein sequence family and structure prediction, functional annotation of transcription data, enzyme annotation publications, and so on. Examples of these are shown below, together with links to additional resources and publications.

Decision structure on the protein-protein interaction article test data of Biocreative II, as produced by our Variable Trigonometric Threshold model.Abi Haidar, A et al. (2008)

PPI task- Decision structure on the protein-protein interaction article test data of Biocreative II, as produced by our Variable Trigonometric Threshold model.Abi Haidar, A et al. (2008)

Protein-Protein Interaction Discovery (PPI): Until now, literature mining has been applied essentially to help annotate and characterize molecular entities such as genes and proteins. In the next few years the field is expected to move to aid the discovery and automatic annotation of relationships among such entities, e.g. protein-protein and gene-disease interactions. Indeed, the Biocreative challenges II, II.5, and III, which we participated in [Abi-Haidar et al,2008], [Kolchinsky et al, 2010], [Lourenco et al, 2011]), includes a series of tasks on extraction of protein-protein interaction information from the literature. As the field moves to uncovering relations rather than entities, our complex network approach to biomedical literature mining [Verspoor et al,2005], which we tried on the first BioCreative competition, makes all the more sense. Additionally, since literature mining hinges on the quality of available sources of literature as well as their linkage to other electronic sources of biological knowledge, it is particularly important to study the quality of the inferences it can provide. We were among most competitive teams in the PPI tasks of BioCreative II, II.5 and III. See our PIARE (Protein Interaction Abstract Relevance Evaluator) web tool for classification of documents relevant for protein-protein interaction, as well as supplementary materials for publications.

Estimated PK clearance parameter data from literature.Wang, Z., et al (2009)

Estimation of pharmacokinetics numerical data from literature and Drug-Drug interaction extraction. Our objective is to mine drug-specific (e.g. Midazolam (MDZ)) pharmokinetic (PK) clearance data (systemic and oral) from the literature. We obtained 88% precision rate and 92% recall rate are achieved, with an F-score = 90%. Out-performs support vector machine (F-score of 68.1%). Further investigation on 7 other drugs showed comparable performance [Wang et al, 2009]. This is a collaboration with Indiana University’s Medical School and the group of Dr, Lang Li. Recently, we received funding for a project on “Drug-Drug Interaction Prediction from Large-scale Mining of Literature and Patient Records” by Indiana University Collaborative Research Grants 2011.

 

proteins voting in proportion to their cosine similarity to the target protein. Maguitman, A. et al (2006)

proteins voting in proportion to their cosine similarity to the target protein. Maguitman, A. et al (2006)

Protein Family Prediction (PFP):Since literature mining hinges on the quality of available sources of literature as well as their linkage to other electronic sources of biological knowledge, it is particularly important to study the quality of the inferences it can provide. We have been working in the large-scale validation of bibliome algorithms , and proposed a method that predict a protein’s Pfam family correctly 76% of the time and 89% of the time issue a prediction that will be among top 5 families [Maguitman et al,2006].

 

Our novel combined method performs significantly better than either the  original structure predictionor keyword based prediction methods alone. The keyword method performs  well even though the literature comes from sequences with little (BLAST) detectable sequence homology.

PSP task- Our combined method performs significantly better than either the original structure predictionor keyword based prediction methods alone. Rechtsteiner, A., et al (2006)

Protein Structure Prediction (PSP): Literature-mining prediction comparable to best ab-initio methods in lack of sequence homology. Combining text-mining with ab-initio method leads to 35% improvement over ab-initio method alone. See [Rechtsteiner et al, 2006]

Rechtsteiner, A. [2005]. PhD Dissertation.

Rechtsteiner, A. (2005). PhD Dissertation.

characterizing gene regulation: SVD (“eigen-clustering”) of microarray data produces sets of co-expressed genes, which were then characterized with annotations automatically extracted from literature [Rechtesteiner, 2005].

 

Project Members

Luis Rocha

Luis M. Rocha, PI

Jon Duke

Jon Duke

Lang Li

Lang Li

Predrag Radivojac

Predrag Radivojac

Hagit Shatkay

Hagit Shatkay

Analia Lourenco

Analia Lourenco

Ana Maguitman

Ana Maguitman

Al Abi-Haidar

Al Abi-Haidar

Michael Conover

Michael Conover

Mohsen JafariAsbagh

Mohsen JafariAsbagh

Jasleen Kaur

Artemy Kolchinsky

Artemy Kolchinsky

Azadeh Nematzadeh

Azadeh Nematzadeh

Andreas Rechtsteiner

Andreas Rechtsteiner

Tiago Simas

Tiago Simas

Zhiping (Paul) Wang

Zhiping (Paul) Wang


Funding

Project partially funded by

  • Indiana University Collaborative Research Grants 2011. Project title: “Drug-Drug Interaction Prediction from Large-scale Mining of Literature and Patient Records”.
  • Fundação Luso-Americana para o Desenvolvimento (Portugal) and National Science Foundation (USA), 2012-2014. Project title: “Network Mining For Gene Regulation And Biochemical Signaling.” (171/11)

 

Selected Project Publications

Complex Networks Collaboratory

lanet-viCx-Nets  is a virtual collaboratory of three research groups that despite their far apart geographical locations pursue the same research agenda in close collaboration. Active research areas include:

  • Network theory, structure and models
  • Information Networks
  • Epidemic modeling
  • Social systems
  • Infrastructures
  • Biological networks

The Cx-Nets website is also intended as an information exchange point with links to conferences, tools and references useful for the network science community.

Alex Vespignani (PI)

Alex Vespignani (PI)

Sandro Flammini

Sandro Flammini

Fil Menczer

Fil Menczer

Research | People | Academics | News and Meetings | Publications-online | Media Mentions | Relevant Conferences

Complex Adaptive Systems and Computational Intelligence

We are a research group at Indiana University and the Instituto Gulbenkian de Ciencia working on complex systems. We are particularly interested in the informational properties of natural and artificial systems which enable them to adapt and evolve. This means both understanding how information is fundamental for the evolutionary capabilities of natural systems, as well as abstracting principles from natural systems to produce adaptive information technology.

Our research projects (see below) are on computational and systems biology, complex networks, text and literature mining, evolutionary systems, adaptive search and recommendation, cognitive science, artificial life, and biosemiotics. Additional information available on Luis Rocha’s Website and our group page at the Instituto Gulbenkian de Ciencia.

For information on joining our group see our Academics page. As a group, we are seriously interconnected with other research groups and networks: The Center for Complex Networks and Systems (CNets), Alife@IU, Biocomplexity Institute, Cognitive Science Program, Complex Systems & Networks, FLAD Computational Biology Collaboratorium, InfoVis Lab, Instituto Gulbenkian de Ciencia, Networks an Agents (NAN).

You are welcome to join our mailing list CASCI-L by either:

  • sending an e-mail to listserv@indiana.edu with subscribe CASCI-L in the body (with no subject), or
  • via the LISTSERV web interface: https://listserv.indiana.edu/cgi-bin/wa-iub.exe?HOME ; Click Subscriber’s Corner at the top of the page. Search for “CASCI-L” select it and click Submit.

CASCI projects

Literature Mining

Biomedical Literature Mining

Collective Dynamics in Complex Biochemical Networks

Collective Dynamics in Complex Biochemical Networks

Models of RNA Editing

Models of RNA Editing

Artificial Immune Systems

 Semi-metric Network Analysis

Network Analysis of Weighted and Fuzzy Graphs

 The Adaptive Web and Bio-inspired designs for Recommendation Systems

The Adaptive Web

Microarray Analysis

Genomic Multivariate Analysis

 Biosemiotics: interplay between self-organization and selection

Biosemiotics

Agent-based modeling

Agent-based modeling

Uncertainty and Generalized Information Theory

Uncertainty and Generalized Information Theory

CNetS faculty manage informal research groups and labs of various size and scope, including faculty, graduate students, postdocs, and visitors. Given the interdisciplinary and collaborative nature of CNetS, many researchers belong to more than one group.

On Friday, April 18 the ISI Foundation and CRT Foundation announced the winners of the First Lagrange Prize: Brian Arthur and Yakov Sinai for their contributions to the science of complex systems, and Philip Ball for his contributions to the promotion of complexity as a popular science writer. The award ceremony took place in the gorgeous Stupinigi Royal Hunting Palace. In the photo, Brian Arthur is giving his acceptance address; the panel included Angelo Miglietta, Giovanni Ferrero and Andrea Comba of CRT Foundation, Mario Rasetti and Tullio Regge (General Secretary and President of ISI Foundation), and Enrico Bellone of Le Scienze. Following the ceremony there was a delicious dinner and a fun dance.

No, it’s not an Italian spin-off of the popular TV show. CSI Piemonte is organizing a meeting on Understanding Complexity: a Journey through Science to be held November 22-23 at the Lingotto Convention Center here in Torino. We will have demos and posters on 6S, GiveALink, and the egalitarian effect of search engines. I look forward in particular to seeing my good old friend Dario and my mentor, Domenico.