Summary

Systems biology is an emerging field aiming at the investigation and understanding of biology at a system and multi-scale level. After biological entities have been identified in a specific environment, it remains to elucidate how they interact with each other in order to carry out a particular biological function. Therefore the construction of mathematical and predictive models is a fundamental goal of this field. In this context, regulatory, metabolic and signaling networks are crucial for the understanding of complex biological systems. Importantly, they are involved in bio-medical, bio-energy, bio-mining and agriculture processes. Hence, their control has a crucial impact on drug target identification, diagnosis, cancer research, bio-fuels and drought tolerance among other applications. During the last decade, many efforts have been made to develop relevant formalisms and modeling frameworks to take into account the specificities of such biological systems. In the lack of quantitative details, qualitative approaches based on graph theory or (Boolean) logical networks have become very popular.

Nowadays, some major challenges related to the identification, control and pruning of biological networks remain open. Further, dealing with uncertainty and its consequences is a fundamental issue on systems biology, and particularly in the context of qualitative modeling approaches. In general, due to factors including experimental error, limitations in the amount of data available, incompleteness of our prior knowledge, and inherent mathematical properties of the models, there are multiple “right” answers to the same question. Hence, how to turn a possibly very large number of answers into valuable insights for biologists is a fundamental and critical aspect on this subject.

In this course we will investigate these issues from the point of view of Knowledge Representation and Reasoning, and more specifically Answer Set Programming (ASP). ASP is a declarative problem-solving paradigm in which a problem is encoded as a set of logic rules such that its models (answer sets) represent solutions to the problem. In general, ASP can be used to solve hard combinatoria

l search and optimization problems. The distinct features of ASP are its rich yet simple modeling language and the highly efficient solvers publicly available. Further, modern ASP solvers support several reasoning modes for assessing the multitude of solutions, among them, regular and projective enumeration, intersection, union, and multi-criteria optimization. Therefore, after a decade of research and development, ASP is a very attractive computational approach to answer relevant questions arising in systems biology as the ones described above. In fact, some of these questions have been successfully addressed with ASP in various settings over the last few years. However, it remains to be elucidated whether ASP technology is mature enough to cope with real-life problem instances, and more importantly, bring new insights for biologists.

**Contents**

Motivation (2h)

- Bioinformatics / Systems Biology / Synthetic Biology
- Bioinformatics: aims at understanding the isolated parts
- Systems Biology: aims at understanding the systems
- Synthetic Biology: aims at designing the systems

- Large amount of heterogeneous and noisy experimental data:
- Transcriptomics
- Metabolomics
- Proteomics

- Large multi-scale biological systems
- Regulatory networks
- Metabolic networks
- Signaling networks

- Experiments are time consuming and expensive
- Gene expression
- Metabolites profiles
- Phosphorylation activity

- Applications
- Medicine: cancer, vaccines, drug discovery
- Energy: bio-diesel, bio-ethanol
- Agriculture: drought tolerance
- Mining: bioleaching copper

- Approach
- Reasoning over a complete family of feasible models instead of selecting one model
- Identify robust information from the family of models which deserves a detailed dynamical study

Knowledge Representation and Reasoning in Systems Biology (1h)

- Knowledge
- Incomplete
- Contradictory
- Incorrect
- Vague

- Representation
- Biological networks:
- (un)signed and (un)directed graphs
- Qualitative relations/interactions: activates, inhibits, regulates, co-expressed, produces, consumes

- Data discretization
- Boolean: present/absent; up/down; active/inactive
- (un) signed: {-1, 0, 1}
- Linear: integer interval, e.g. [0,100]
- Logarithmic: several levels, e.g. {0, 10, 100, 1000}

- Biological networks:
- Reasoning
- (in) consistency detection (Diagnosis & Repairing)
- Models refinement / inference (Combinatorial optimization)
- Robust predictions
- Hypothesis generation
- Experimental design

Answer Set Programming (1h)

- Logic Programming – AI – KRR
- General purpose problem solving paradigm
- Fully declarative
- Highly expressive
- High performance dedicated solvers
- Deduction, Abduction and Induction
- Defaults (closed-world assumption) and recursive definitions
- Applications: Robotics, Scheduling, Planning, Diagnosis, Configuration, …
- Related methods:
- SAT: Boolean Satisfiability
- ILP: Integer Linear Programming
- CP: Constraint Programming
- CLP(FD): Constraint Logic Programming over Finite Domains

ASP in a nutshell (3h):

- Workflows: declarative versus imperative programming
- Modeling methodology: Guess & Check (NP problems)
- Basic syntax and semantics
- The language gringo
- Basic rules
- Choice rules
- Integrity constraints
- Optimization

- The solver clasp:
- Basic usage
- Common options
- Configurations
- Reasoning modes

- Examples:
- Graph coloring
- N queens
- Travel Salesman Problem

- Potsdam Answer Set Solving Collection:
- gringo
- clasp
- hclasp
- unclasp
- clingo
- iclingo
- clingcon
- claspD

ASP for Systems Biology: problems description and their ASP encodings (3h)

- Sign consistency detection in regulatory networks
- Learning Boolean logic models of signaling networks
- Intervention strategies in logic models of signaling networks
- Metabolic network completion
- Precursor sets in metabolic networks
- The BioASP python environment