Newer
Older
# FVA Biomarkers Project
This project consists of scripts that allow the user to run Flux Variability Analysis knockouts on a model of the human metabolic network, and create graphical representations of the results.
## Getting Started
### Prerequisites
- Python 3.6 or higher
- Cobrapy
- imgkit, xvfb and wkhymltopdf to create the tables using create_table.py (Linux only)
### Installation
- **Cobrapy:**
- If you're in a conda environment:
`conda install cobra` ([link](https://anaconda.org/bioconda/cobra))
`pip install cobra` ([link](https://pypi.org/project/cobra/))
- If you're using an IDE like Pycharm, you can install the package using the IDE interface.
--> *Ex: With Pycharm:*
1. File > Settings
2. Project > Project Interpreter
3. Click on the + on the right
4. Type cobra in the search bar (may take a while to load)
5. Click on install
- **imgkit, xvfb and wkhymltopdf**:
- `pip install imgkit`
- `pip install xvfbwrapper`
- `sudo apt install wkhtmltopdf` (requires root)
- **CPLEX:**
- [Free license (Community version)](https://www.ibm.com/analytics/cplex-optimizer): this version has a model limit size, so it's unusable for large metabolic networks such as Recon 2.2.
- [Academic license](https://my15.digitalexperience.ibm.com/b73a5759-c6a6-4033-ab6b-d9d4f9a6d65b/dxsites/151914d1-03d2-48fe-97d9-d21166848e65/technology/data-science): You will need to create an IBM account with an academic email address.
1. Once logged in, the site will redirect you to the previous page and you can scroll down to find "ILOG CPLEX Optimization Studio" with a download button.

2. You will be taken to a list of downloads. Scroll down until you find the file for your OS.


3. Check the box(es) next to the download(s) you want, scroll down to the bottom of the page, check "I agree" and click on "Download Now". Your download will open with IBM Download Director (it may ask for permission: select "Yes").
4. You will now have a file called "cplex_studio1210.win-x86-64.exe" if you're using Windows, or "cplex_studio1210.linux-x86-64.bin" for Linux. For Windows, just double click on the .exe. For Linux, open a terminal and move to the folder where the .bin is located. Type the command: `./ cplex_studio1210.linux-x86-64.bin`, and then follow the instructions on the terminal.
6. Finally, you need to add CPLEX to Python for Cobrapy to recognize it as a solver. Instructions can be found [here](https://www.ibm.com/support/knowledgecenter/SSSA5P_12.7.1/ilog.odms.cplex.help/CPLEX/GettingStarted/topics/set_up/Python_setup.html), but you can follow my instructions:
- **Windows**: if you're using Pycharm to run Interactive Python:
- File > Settings
- Build, Execution, Deployment > Console > Python Console
- In Environment variables, click on the little rectangle with lines in it
- Click on the + (on the right) to add a PATH: type PYTHONPATH in the name column, and the path to your CPLEX Python installation in the second column (Ex: `C:\Program Files\IBM\ILOG\CPLEX_Studio1210\cplex\python\3.6\x64_win64`)
- Use the setup.py script located in the CPLEX python directory (didn't work for me) `python setup.py install`
- Or set the PYTHONPATH:
- Temporarily: type `export PYTHONPATH="/opt/ibm/ILOG/CPLEX_Studio1210/cplex/python/3.7/x86-64_linux"` into the console
- Permanently: add `export PYTHONPATH="/opt/ibm/ILOG/CPLEX_Studio1210/cplex/python/3.7/x86-64_linux"` to your `~/.bashprofile` via: `cat 'export PYTHONPATH="/opt/ibm/ILOG/CPLEX_Studio1210/cplex/python/3.7/x86-64_linux"' >> ~./bashprofile`, or by opening the file with your favourite text editor (Vim, Vi, Nano...) and adding the line to the end of the file. You may need to restart your terminal.
## Test files
- Model: currently the scripts only work with Recon 2.2, and Recon 1.
- [Recon 2.2](http://www.ebi.ac.uk/biomodels-main/MODEL1603150001) (SBML)
- [Recon 1](http://bigg.ucsd.edu/models/RECON1) (SBML, JSON)
A clean\* json version of Recon 2.2 can be found in the test files provided with this project.
\*clean: blocked reactions removed, repaired reactions names (removal of `_LPAREN` and `_RPAREN_`), linked pathways from the notes section to the subsystem section.
## Usage examples
### knockout_multiple.py:
**Parameters:**
- `-m <input model file>`: *required*: A metabolic network model file (SBML or json).
- `-k <KO reactions file | List of KO reactions>`: *default=None*: List of reaction IDs to knockout separately. If empty, uses all reactions, apart from exchange and transport reactions, to KO. Can be a file: IDs to be KO'd separately need to be on a newline, and simultaneous KOs need to be on the same line separated by spaces.
- `-g`: Genes: Use this flag to knockout genes instead of reactions (supply gene IDs instead of reaction IDs with -k).
- `-x <List of exchange reactions>`: *default=None*: List of exchange reactions to measure fluxes for, separated by spaces. If empty, uses all exchange reactions from the model.
- `-f <float>`: *0 >= -f >= 1, default=0*: Fraction of optimum: the percentage of the optimum objective function to achieve.
- `-e <int>`: *default=10*: Epsilon: The flux value to force through the reactions to knockout in the WT case.
- `-t <float>`: *default=0.1*: Threshold: Significance threshold for flux change between WT and KO conditions.
- `-u`: Unite: Use this flag to unite the forward and backward WT flux ranges to form the WT fluxes. Otherwise, it uses the forward ranges.
- `-p <int>`: *default=4*: Processors: Number of processors to use.
- `-s`: *default=','*: Separator to use in output file.
- `-o`: *required*: Outfile path and name to write to.
- `-b`: Progress bar: Use this flag to show the progress bar in the console.
- `-c`: Clean SBML: Use this flag to clean the SBML file: imports using modified cobrapy function, removes blocked reactions, links the NOTES section correctly, links the subsystem from NOTES to the subsystem in the reaction object, fixes name errors: _LPAREN_ _RPAREN_.
- `-n`: Metabnames: Use this flag to use metabolites names instead of IDs in the output table.
- `-r`: Reduce: Instead of KOing (flux = 0), use a value (0<\r<1) to decrease the flux through the reaction in the mutant case. For ex, a value of 0.3 will reduce the flux to 30% of the WT flux.
- `-recon3D`: Use this flag to indicate that you're using Recon3D (used only for importing and cleaning the model if SBML).
- `-write`: Test feature: Use this flag to write the raw data to a file.
- `-notjustexchanges`: Use this flag to signal that you're inputting non-exchange reactions to be analysed.
**Basic test replicating Shlomi's run (using a different model version):**
```bash
python fvabiomarkers/scripts/knockout_multiple.py \
-m fvabiomarkers/test_files/fva_test/input/model/Recon2.2_reimported2.json \
-k fvabiomarkers/test_files/fva_test/input/recon2v2_shlomi_test/all_aa_ko_rxn_ids.txt \
-x "EX_his_L_e EX_ile_L_e EX_leu_L_e EX_lys_L_e EX_met_L_e EX_phe_L_e EX_thr_L_e EX_trp_L_e EX_val_L_e EX_cys_L_e EX_glu_L_e EX_tyr_L_e EX_ala_L_e EX_asp_L_e EX_gly_e EX_arg_L_e EX_gln_L_e EX_pro_L_e EX_ser_L_e EX_asn_L_e" \
-o results/recon2v2_shlomi_ko.csv \
**Parameters:**
- `-i <input csv file>`: *required*: Input csv table filename from knockout_multiple.py.
- `-o <output prefix>`: *required*: Output filename prefix.
- `-s`: Use this flag to recreate Shlomi's table (with OMIM +'s and -'s).
- `-r <recon1 or recon2v2>`: *required*: Recon version.
- `-f <html or png>`: Output file format.
- `-n`: Use this flag to show the metabolite names.
- `-c`: Use this flag to use different colours for (++) or (+), and for (--) and (-).
- `-w`: Use this flag to write the data table to csv.
- `-W`: Use this flag to write the data table to csv, and not output a png or html file.
- `-u`: Use this flag to use (; and # separators) and write unique metabolite IDs instead of names.
The output csv file from knockout_multiple.py can then be turned into a graphical representation similar to Figure 2 in Shlomi *et al.*'s article using create_table.py:
**Example:**
```bash
python fvabiomarkers/scripts/create_table.py \
-i results/recon2v2_shlomi_ko.csv \
-o results/recon2v2_shlomi_ko_fig2 \
-s \
-r "recon2v2" \
-f "png" \
-n
```
### only_create_table.R:
Edit the file to change the input file paths, and the output pdf heatmap path.
This script requires:
- The raw data output from knockout_multiple.py
- The table created from using create_table.py with the `-w` or `-W` parameter
- pathway_names.csv: a file containing every possible pathway from the results and their corresponding category (manually extracted from Kegg): this file is provided in the test files, but may not contain all of the pathways your results have.
- compartments.csv: a file containing every possible compartment, and a corresponding colour (manually chosen): this file is also included in the project test files.
- category_names_col.csv: a file containing the pathway categories, and a corresponding colour (manually chosen): this file is also included in the project test files.
- rxn_compartments.csv: a file containing the compartment for each reaction: this file is also included in the project test files.
- the `pheatmap` R library.
Juliette Cooke
committed
You will then be able to run the script and it will output a pdf of the heatmap.
## Acknowledgements
Examples based on Shlomi, Tomer, Moran N Cabili, and Eytan Ruppin. ‘Predicting Metabolic Biomarkers of Human Inborn Errors of Metabolism’. Molecular Systems Biology 5, no. 1 (January 2009): 263.
Certain sections of code from [ReScience](https://github.com/ReScience-Archives/Mondeel-Ogundipe-Westerhoff-2018/tree/b5f423cb1fc62b77de99d5827f8158c182d41e74) by Thierry D.G.A. Mondeel1, Vivian Ogundipe2, and Hans V. Westerhoff. '[Re] Predicting metabolic biomarkers of human inborn errors of metabolism'