Luis Rocha

Luis Rocha

Luis M. Rocha is Associate Professor of Informatics and Computing in Informatics and Computing, member of the Center for Complex Networks and Systems, and core faculty of the Cognitive Science Program at Indiana University, Bloomington, USA. He is also the director of the Computational Biology Collaboratorium and in the Direction of PhD program in Computational Biology at the Instituto Gulbenkian da Ciencia, Portugal. His research is on complex systems, computational biology, artificial life, embodied cognition and bio-inspired computing. He received his Ph.D in Systems Science in 1997 from the State University of New York at Binghamton. From 1998 to 2004 he was a permanent staff scientist at the Los Alamos National Laboratory, where he founded and led a Complex Systems Modeling Team during 1998-2002, and was part of the Santa Fe Institute research community. He has organized major conferences in the field such as the Tenth International Conference on the Simulation and Synthesis of Living Systems (Alife X) and the Ninth European Conference on Artificial Life (ECAL 2007). He has published many articles in scientific and technology journals, and has been the recipient of several scholarships and awards. At Indiana University, he has received the School of Informatics and Computing Teaching Excellence Award for 2006 after developing a new syllabus for an introductory undergraduate course on Informatics and a new graduate course on biologically-inspired computing.

Information about my research group meetings, projects, and ongoing work is available on the CASCI Website. Additional information about my research, academic and personal activities is available on my web site. Contact Information:

In The USA:
Center for Complex Networks and Systems Research
School of Informatics & Computing
Indiana University, 919 E. 10th St
Bloomington IN, 47408, USA

In Portugal:
FLAD Computational Biology Collaboratorium
Instituto Gulbenkian de Ciência
Rua da Quinta Grande, 6
Apartado 14, P-2781-901 Oeiras, Portugal

Course and research-related blogs

Back to CASCI Research

The Agent-Based T-Cell Cross-regulation Model for Document Classification

The Agent-Based T-Cell Cross-regulation Model for Document Classification.

We have developed a bio-inspired solution for binary classification of textual documents inspired by T-cell cross-regulation in the vertebrate adaptive immune system, which is a complex adaptive system of millions of cells interacting to distinguish between self and nonself substances. In analogy, automatic document classification assumes that the interaction and co-occurrence of thousands of words in text can be used to identify conceptually-related classes of documents—at a minimum, two classes with relevant and irrelevant documents for a given concept (e.g. articles with protein-protein interaction information). Our agent-based method for document classification expands the analytical model of Carneiro et al, by allowing us to deal simultaneously with many distinct populations of antigen-specific T-Cells and their collective dynamics. We have extended this model to produce a spam-detection system. We have also developed our agent-based model further to apply it to biomedical article classification, testing it on a dataset of biomedical articles provided by the BioCreative 2.5 challenge. Our results are useful for biomedical text mining, but they also help us understand T-cell cross-regulation as a potential general principle of classification available to collectives of molecules without a central controller. While there is still much to know about the specifics of T-cell cross-regulation in adaptive immunity, Artificial Life allows us to explore alternative emergent classification principles while producing useful bio-inspired tools. Recently, we started expanding this algorithm to other forms of classification such as sensor data from human-robot interactions under an IUCRG project.


Project Members

Luis Rocha

Luis Rocha

Al Abi-Haidar

Al Abi-Haidar

Ian Wood

Ian Wood



Funding

Project partially funded by:


Selected Project Publications

Back to CASCI Research

Agent with separate codotype and editype components of their genotype in our Evolutionary Model of Genotype Editing.

Agent with separate codotype and editype components of their genotype in our Evolutionary Model of Genotype Editing. Rocha, et al (2007)

Evolutionary models in theoretical biology at large, and computational biology and artificial life in particular, rarely deal with ontogenetic, non-inherited alteration of genetic information because they are based on a direct genotype-phenotype mapping. In contrast, in Nature several processes have been discovered which alter genetic information encoded in DNA before it is translated into amino-acid chains. Ontogenetically altered genetic information is not inherited but extensively used in regulation and development of phenotypes, giving organisms the ability to, in a sense, re-program their genotypes according to environmental clues. An example of post-transcriptional alteration of gene-encoding sequences is the process of RNA Editing. Our latest agent-based model of genotype editing presents a novel architecture for evolving agents in which coding and non-coding genetic components are allowed to coevolve. Our goal is twofold: (1) to study the role of RNA Editing regulation in the evolutionary process, and (2) to investigate the conditions under which genotype edition improves the optimization performance of evolutionary algorithms. We have shown that genotype edition allows evolving agents to perform better in several classes of fitness functions, both in static and dynamic environments. We are also investigating the ways in which the indirect genotype/phenotype mapping resulting from genotype editing lead to a better exploration/exploitation compromise in the search process. In the past year we developed an entirely new modeling platform in Python to run experiments to explore the evolutionary advantages of RNA editing.

Some characteristics of our model of RNA Editing:

Genome contains both coding and non- coding portions: Codome and Editome (Editosome)

  • Agents with editome perform better in changing environments

Study of regulation via non-coding DNA

  • Observe emergence of regulation with promoter signals
  • Memory of previous environments

Bio-inspired algorithm for optimization

  • Outperfoms traditional evolutionary algorithms on many classes of functions

This research is described in greater detail in the separate Evolutionary Models of Genotype Editing page.

 

Project Members

Luis Rocha

Luis Rocha

Ana Maguitman

Ana Maguitman

Chien-Feng Huang

Chien-Feng Huang

Jonathan Frankel

Jasleen Kaur

Artemy Kolchinsky

Artemy Kolchinsky

 

Selected Project Publications

Back to CASCI Research

Subnetwork of word co-occurrence proximity (with 34 words) for a specific document from the first BioCreative competition. The red nodes denote the words retrieved from a s specific GO annotation (0007266: Rho, protein, signal, transduce). The blue nodes denote the words that co-occur very frequently with at least one of the red nodes: the co-occurrence neighborhood of the GO words. The green nodes denote the additional words discovered by our network algorithm as described in (Verspoor et al,2005).

Much of the research presently conducted in the biomedical domain relies on the induction of correlations and interactions from data. Because we ultimately want to increase our knowledge of the biochemical and functional roles of genes and proteins in organisms, there is a clear need to integrate the associations and interactions among biological entities that have been reported and accumulate in the literature and databases. Biomedical literature mining is an important informatics methodology for large scale information extraction from repositories of textual documents, as well as for integrating information available in various domain-specific databases and ontologies, ultimately leading to knowledge discovery. It helps us tap into the biomedical collective knowledge, and uncover relationships and interactions buried in the literature and databases, and even those inferred from global information but unreported in individual experiments. Our approach to literature mining is based on bottom-up, data-driven or bio-inspired methods, which we have applied to automatic discovery, classification and annotation of protein-protein and drug-drug interactions, pharmacokinetic data, protein sequence family and structure prediction, functional annotation of transcription data, enzyme annotation publications, and so on. Examples of these are shown below, together with links to additional resources and publications.

Decision structure on the protein-protein interaction article test data of Biocreative II, as produced by our Variable Trigonometric Threshold model.Abi Haidar, A et al. (2008)

PPI task- Decision structure on the protein-protein interaction article test data of Biocreative II, as produced by our Variable Trigonometric Threshold model.Abi Haidar, A et al. (2008)

Protein-Protein Interaction Discovery (PPI): Until now, literature mining has been applied essentially to help annotate and characterize molecular entities such as genes and proteins. In the next few years the field is expected to move to aid the discovery and automatic annotation of relationships among such entities, e.g. protein-protein and gene-disease interactions. Indeed, the Biocreative challenges II, II.5, and III, which we participated in [Abi-Haidar et al,2008], [Kolchinsky et al, 2010], [Lourenco et al, 2011]), includes a series of tasks on extraction of protein-protein interaction information from the literature. As the field moves to uncovering relations rather than entities, our complex network approach to biomedical literature mining [Verspoor et al,2005], which we tried on the first BioCreative competition, makes all the more sense. Additionally, since literature mining hinges on the quality of available sources of literature as well as their linkage to other electronic sources of biological knowledge, it is particularly important to study the quality of the inferences it can provide. We were among most competitive teams in the PPI tasks of BioCreative II, II.5 and III. See our PIARE (Protein Interaction Abstract Relevance Evaluator) web tool for classification of documents relevant for protein-protein interaction, as well as supplementary materials for publications.

Estimated PK clearance parameter data from literature.Wang, Z., et al (2009)

Estimation of pharmacokinetics numerical data from literature and Drug-Drug interaction extraction. Our objective is to mine drug-specific (e.g. Midazolam (MDZ)) pharmokinetic (PK) clearance data (systemic and oral) from the literature. We obtained 88% precision rate and 92% recall rate are achieved, with an F-score = 90%. Out-performs support vector machine (F-score of 68.1%). Further investigation on 7 other drugs showed comparable performance [Wang et al, 2009]. This is a collaboration with Indiana University’s Medical School and the group of Dr, Lang Li. Recently, we received funding for a project on “Drug-Drug Interaction Prediction from Large-scale Mining of Literature and Patient Records” by Indiana University Collaborative Research Grants 2011.

 

proteins voting in proportion to their cosine similarity to the target protein. Maguitman, A. et al (2006)

proteins voting in proportion to their cosine similarity to the target protein. Maguitman, A. et al (2006)

Protein Family Prediction (PFP):Since literature mining hinges on the quality of available sources of literature as well as their linkage to other electronic sources of biological knowledge, it is particularly important to study the quality of the inferences it can provide. We have been working in the large-scale validation of bibliome algorithms , and proposed a method that predict a protein’s Pfam family correctly 76% of the time and 89% of the time issue a prediction that will be among top 5 families [Maguitman et al,2006].

 

Our novel combined method performs significantly better than either the  original structure predictionor keyword based prediction methods alone. The keyword method performs  well even though the literature comes from sequences with little (BLAST) detectable sequence homology.

PSP task- Our combined method performs significantly better than either the original structure predictionor keyword based prediction methods alone. Rechtsteiner, A., et al (2006)

Protein Structure Prediction (PSP): Literature-mining prediction comparable to best ab-initio methods in lack of sequence homology. Combining text-mining with ab-initio method leads to 35% improvement over ab-initio method alone. See [Rechtsteiner et al, 2006]

Rechtsteiner, A. [2005]. PhD Dissertation.

Rechtsteiner, A. (2005). PhD Dissertation.

characterizing gene regulation: SVD (“eigen-clustering”) of microarray data produces sets of co-expressed genes, which were then characterized with annotations automatically extracted from literature [Rechtesteiner, 2005].

 

Project Members

Luis Rocha

Luis M. Rocha, PI

Jon Duke

Jon Duke

Lang Li

Lang Li

Predrag Radivojac

Predrag Radivojac

Hagit Shatkay

Hagit Shatkay

Analia Lourenco

Analia Lourenco

Ana Maguitman

Ana Maguitman

Al Abi-Haidar

Al Abi-Haidar

Michael Conover

Michael Conover

Mohsen JafariAsbagh

Mohsen JafariAsbagh

Jasleen Kaur

Artemy Kolchinsky

Artemy Kolchinsky

Azadeh Nematzadeh

Azadeh Nematzadeh

Andreas Rechtsteiner

Andreas Rechtsteiner

Tiago Simas

Tiago Simas

Zhiping (Paul) Wang

Zhiping (Paul) Wang


Funding

Project partially funded by

  • Indiana University Collaborative Research Grants 2011. Project title: “Drug-Drug Interaction Prediction from Large-scale Mining of Literature and Patient Records”.
  • Fundação Luso-Americana para o Desenvolvimento (Portugal) and National Science Foundation (USA), 2012-2014. Project title: “Network Mining For Gene Regulation And Biochemical Signaling.” (171/11)

 

Selected Project Publications

Research | People | Academics | News and Meetings | Publications-online | Media Mentions | Relevant Conferences

Complex Adaptive Systems and Computational Intelligence

We are a research group at Indiana University and the Instituto Gulbenkian de Ciencia working on complex systems. We are particularly interested in the informational properties of natural and artificial systems which enable them to adapt and evolve. This means both understanding how information is fundamental for the evolutionary capabilities of natural systems, as well as abstracting principles from natural systems to produce adaptive information technology.

Our research projects (see below) are on computational and systems biology, complex networks, text and literature mining, evolutionary systems, adaptive search and recommendation, cognitive science, artificial life, and biosemiotics. Additional information available on Luis Rocha’s Website and our group page at the Instituto Gulbenkian de Ciencia.

For information on joining our group see our Academics page. As a group, we are seriously interconnected with other research groups and networks: The Center for Complex Networks and Systems (CNets), Alife@IU, Biocomplexity Institute, Cognitive Science Program, Complex Systems & Networks, FLAD Computational Biology Collaboratorium, InfoVis Lab, Instituto Gulbenkian de Ciencia, Networks an Agents (NAN).

You are welcome to join our mailing list CASCI-L by either:

  • sending an e-mail to listserv@indiana.edu with subscribe CASCI-L in the body (with no subject), or
  • via the LISTSERV web interface: https://listserv.indiana.edu/cgi-bin/wa-iub.exe?HOME ; Click Subscriber’s Corner at the top of the page. Search for “CASCI-L” select it and click Submit.

CASCI projects

Literature Mining

Biomedical Literature Mining

Collective Dynamics in Complex Biochemical Networks

Collective Dynamics in Complex Biochemical Networks

Models of RNA Editing

Models of RNA Editing

Artificial Immune Systems

 Semi-metric Network Analysis

Network Analysis of Weighted and Fuzzy Graphs

 The Adaptive Web and Bio-inspired designs for Recommendation Systems

The Adaptive Web

Microarray Analysis

Genomic Multivariate Analysis

 Biosemiotics: interplay between self-organization and selection

Biosemiotics

Agent-based modeling

Agent-based modeling

Uncertainty and Generalized Information Theory

Uncertainty and Generalized Information Theory