The C++ sources of CANTATA can be downloaded here.
A precompiled Windows executable is available here.
A precompiled Mac OS X universal binary (Power PC/Intel) is available here.
Windows and Mac OS X users can skip this step by downloading the above precompiled binary and extracting it to the desired location.
We recommend compiling the source code using the GCC compiler or MinGW for Windows. On Mac OS, the GCC compiler delivered with XCode Tools. The code also compiles with Microsoft Visual C++.
For older versions of GCC (prior to 4.2) or for Visual C++, you may additionally require the Boost libraries.
If using GCC/MinGW, the code can be compiled using the included Makefile:
The following examples outline the usage of the program based on the mammalian cell cycle network [1]. The files used in this tutorial can be downloaded here. The ZIP archive contains three files:
CycB, (! Cdc20 & ! Cdh1)to
CycB, (! Cdc20)That is, CycB lacks the dependency on Cdh1, which changes the dynamic behaviour of the model.
# The steady-state attractor Attractor: Initial condition: !CycD State specifications: !CycD Rb !E2F !CycE !CycA p27 !Cdc20 Cdh1 !UbcH10 !CycBEach rule starts with the keywords Attractor: or Chain: specifying the type of the rule (attractor or time series).
# The 7-state cycle Attractor: Initial condition: CycD State specifications: CycD !Rb !E2F CycE CycA !p27 !Cdc20 !Cdh1 !UbcH10 !CycB CycD !Rb !E2F !CycE CycA !p27 !Cdc20 !Cdh1 UbcH10 CycB CycD !Rb !E2F !CycE CycA !p27 Cdc20 !Cdh1 UbcH10 CycB CycD !Rb !E2F !CycE !CycA !p27 Cdc20 Cdh1 UbcH10 !CycB CycD !Rb E2F !CycE !CycA !p27 !Cdc20 Cdh1 UbcH10 !CycB CycD !Rb E2F CycE !CycA !p27 !Cdc20 Cdh1 !UbcH10 !CycB CycD !Rb E2F CycE CycA !p27 !Cdc20 Cdh1 !UbcH10 !CycB
In the following, we assume that these three files have been extracted to a directory, and that cantata is in your search path.
First, we can check whether the network models match these expectations. We first verify the original ("true") network model. Open a shell/command prompt, change to the directory where the network files are located, and type:
cantata --validate -n cellcycle.txt -r cellcycle_rules.txt
Here, --validate tells the program to validate a model specified in parameter -n according to a list of rules specified in parameter -r.
Input network: CycD = CycD Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD)) E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB)) CycE = (E2F & !Rb) CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10))) p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD)) Cdc20 = CycB Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB)) UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB))) CycB = (!Cdc20 & !Cdh1)
Input network file: cellcycle.txt Rule file: cellcycle_rules.txt Random seed: 1315472969 Max. number of start states: 100000 Max. number of transitions: 1000
Violations of the rule set:
Rule 1: (no violations) Rule 2: (no violations) Finished
As expected, the true network obeys to both rules, i.e. it has the two specified attractors.
We now validate the perturbed network draft:
cantata --validate -n cellcycle_truncated.txt -r cellcycle_rules.txt -c 5
Input network: CycD = CycD Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD)) E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB)) CycE = (E2F & !Rb) CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10))) p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD)) Cdc20 = CycB Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB)) UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB))) CycB = !Cdc20
Input network file: cellcycle_truncated.txt Rule file: cellcycle_rules.txt Random seed: 1315484034 Max. number of start states: 100000 Max. number of transitions: 1000
Violations of the rule set:
Rule 1:
Attractor matching 1 (using alternative 1): Attractor => Specifications 0 0 0 0 0 0 1 1 1 0 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1) 0 1 1 0 0 1 0 1 1 0 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1) 0 1 0 0 0 1 0 1 0 1 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1) 0 0 0 0 0 0 1 0 0 1 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1)
Violations caused by this matching: State specification 1: Rb != 1 State specification 1: E2F != 0 State specification 1: p27 != 1 State specification 1: Cdc20 != 0 State specification 1: Cdh1 != 1 State specification 1: UbcH10 != 0 State specification 1: CycB != 0
Start states yielding this matching: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 (... further 507 states ...)
Rule 2:
Attractor matching 1 (using alternative 1): Attractor => Specifications 1 0 0 0 0 0 1 1 1 0 => 1 0 0 0 0 0 1 1 1 0 (State spec. 4) 1 0 1 0 0 0 0 1 1 0 => 1 0 1 0 0 0 0 1 1 0 (State spec. 5) 1 0 1 1 0 0 0 1 0 1 => 1 0 1 1 0 0 0 1 0 0 (State spec. 6) => 1 0 1 1 1 0 0 1 0 0 (State spec. 7) => 1 0 0 1 1 0 0 0 0 0 (State spec. 1) => 1 0 0 0 1 0 0 0 1 1 (State spec. 2) 1 0 0 1 1 0 1 0 0 1 => 1 0 0 0 1 0 1 0 1 1 (State spec. 3)
Violations caused by this matching: State specification 3: CycE != 0 State specification 3: UbcH10 != 1 State specification 6: CycB != 0 Start states yielding this matching: 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 (... further 507 states ...) Finished
Here, the program first prints the attractors of the network and their optimal matchings with the rules. We can see that the perturbed network draft has two 4-state attractors that are matched with the 1-state attractor and the 7-state attractor of the original network. Below each matching, the violations are listed. In this example, there are many violations. CANTATA prints the the number of the state specification and the violating gene for each violation. Furthermore, the start states that lead to the matching and cause the violations are printed. By specifying -c 5, we tell the program to print only the first 5 violating states.
Now, let us try to reconstruct the true model from the disrupted network draft using the CANTATA optimization algorithm. This algorithm is started using the main option --optimize:
cantata --optimize -n cellcycle_truncated.txt -r cellcycle_rules.txt -o result.txt -ni 1000
The parameter -o result.txt tells the program to write the results to a file result.txt. We set the number of iterations to 1000 (which is the default value) using -ni 1000. When the optimization process is complete, the file result.txt contains a header summarizing the algorithm's configuration, followed by a list of candidate networks with their three objective values, e.g.
Input network file: cellcycle_truncated.txt
Rule file: cellcycle_rules.txt
Random seed: 1316096208
Population size: 100
Number of offspring: 200
Fract. of injected nets: 0.1
Neg. every i-th offspring: 50
Number of generations: 1000
Number of restarts: 1
Initial mutations: 1
Epsilon: 0.0005
Weights of topology scores: 0.25/0.25/0.5
Max. number of start states: 200
Max. number of transitions: 100
Best candidate networks:
CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE | CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)
Fitness: 0 0.24914 0.098 Run: 1 Generation: 419
CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (!CycB & !CycD & p27))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((!Rb & !Cdc20 & !(Cdh1 & UbcH10) & E2F) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)
Fitness: 0 0.24914 0.108 Run: 1 Generation: 641
Finished
In the example printed here, the first fitness value of both resulting candidate network models is 0, which indicates that the networks match the rules perfectly. In this case, the second resulting network is equivalent to the true network, which means that the deleted dependency of CycB on Cdh1 was reconstructed and no further changes were applied. The first candidate also recovers this dependency, but changes an & to a | in the function for p27. Depending on the random initialization of the algorithm, you might get a different result when running the example.
The result file is not readable directly by BoolNet, as it contains multiple candidate network models and additional annotation (the header and objective scores). If the candidate networks should be analyzed in BoolNet, CANTATA can write them to separate network files. For example,
cantata --optimize -n cellcycle_truncated.txt -r cellcycle_rules.txt -o result.txt -on candidate_%d.txt -me 0
writes all candidates that match the rules perfectly to files candidate_0.txt, candidate_1.txt, candidate_2.txt, ..., i.e. the %d marker is replaced by a running number. The parameter -me sets a threshold for the first objective, i.e. only files with a score in the first objective that is less than or equal to this value are written to files. As we set this error to 0 (which is the default value), only candidate network models that match the rule set perfectly are written to files.
This tutorial covers only parts of the options available in CANTATA. A full description of the command line options and the file formats is available here.
[1] Fauré A., Naldi, A., Chaouiya, C., and Thieffry, D. (2006). Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics, 22(14), e124-e131.
Wissenschaftlicher Mitarbeiter (m/w/d)
Our paper "A systems biology approach to define mechanisms, phenotypes, and drivers in PanNETs with a personalized perspective" has been published in npj systems biology and applications.
"Supporting SURgery with GEriatric Co-Management and AI (SURGE-Ahead): A study protocol for the development of a digital geriatrician" has been published in PLoS One.
"Self-Assessment of Having COVID-19 With the Corona Check Mhealth App" has been published in IEEE Journal of Biomedical and Health Informatics.
Our first quantum computing paper "Leveraging quantum computing for dynamic analyses of logical networks in systems biology" has been published in Patterns.
Our paper "Unsupervised domain adaptation for the detection of cardiomegaly in cross-domain chest X-ray images" has been published in Frontiers in Artificial Intelligence.
"Vaccine Side Effects in Health Care Workers after Vaccination against SARS-CoV-2: Data from TüSeRe:exact Study" has been published in Viruses-Basel.
"PREDICT-juvenile-stroke: PRospective evaluation of a prediction score determining individual clinical outcome three months after ischemic stroke in young adults – a study protocol" has been published in BMC Neurology.
Our paper "Federated Electronic Data Capture (fEDC): Architecture and Prototype" has been accepted for publiaction in the Journal of Biomedical Informatics.
Our paper "Efficient cross-valdation traversals in feature subset selection" has been published in Scientific Reports.
Our paper "CANTATA - prediction of missing links in Boolean networks using genetic programming" has been published in Bioinformatics.
Our paper "Interaction Empowerment in Mobile Health: Concepts, Challenges, and Perspectives" has been published in the Journal of Medical Internet Research mhealth and uhealth.
Our paper "Identification of dynamic driver sets controlling phenotypical landscapes" has been published in the Computational and Structural Biotechnology Journal.