Reactant/Product Pair Prediction Using FindPrimaryPairs

This tutorial will go over how to use the primarypairs function in PSAMM. This functions can be used to predict reactant/product pairs in metabolic models.

Materials

For information on how to install PSAMM and the associated requirements, as well how to download the materials required for this tutorial, you can reference the Installation and Materials section of the tutorial.

Note

Graphviz download: https://www.graphviz.org/download/

Graphviz python bindings: https://pypi.org/project/graphviz/ or (psamm-env) $ pip install graphviz

For this part of the tutorial, we will be using a modified version of the E. coli core metabolic model that has been used in the other sections of the tutorial. This model has been modified to add in a new pathway for the utilization of mannitol as a carbon source. To access this model and the other files needed you will need to go into the tutorial-part-4 folder located in the psamm-tutorial folder.

(psamm-env) $ cd <PATH>/tutorial-part-4/

Once in this folder, you should see a folder called E_coli_yaml which contains all of the files required to run the commands in this tutorial.

To run the following tutorials, go into the E_coli_yaml/ directory:

(psamm-env) $ cd E_coli_yaml/

Reactant/Product Pair Prediction using PSAMM

Metabolism can be broken down into individual metabolic reactions, which transfer elements between different metabolites. Take the following reaction as an example:

Acetate + ATP <=> Acetyl-Phosphate + ADP

This reaction is catalyzed by the enzyme Acetate Kinase, which can convert acetate to acetyl-phosphate through the addition of a phosphate group from ATP. A basic understanding of phosphorylation and the biological role of ATP makes it possible to manually predict that the primary element transfers for non hydrogen elements are as follows:

Reactant/Product Pair Element Transfer
Acetate -> Acetyl-Phosphate carbon backbone
ATP -> ADP carbon backbone and phosphates
ATP -> Acetyl-Phosphate phosphate group
Acetate -> ADP None

While manually inferring thi for one or two simple reactions is possible, genome scale models often contain hundreds or thousands of reactions, making manual reactant/product pair prediction impractical. In addition to this, reaction mechanisms are often not known, nor are patterns of element transfer within reactions available for most metabolic reactions.

To address this problem the FindPrimaryPairs algorithm [Steffensen17] was developed and implemented within the PSAMM function primarypairs.

The FindPrimaryPairs is an iterative algorithm which is used to predict element transferring reactant/product pairs in genome scale models. FindPrimaryPairs relies on two sources of information, which are generally available in genome scale models: reaction stoichiometry and metabolite formulas. From this information, FindPrimaryPairs can make a global prediction of element transferring reactant/product pairs without any additional information about reaction mechanisms.

Basic Use of the primarypairs Command

The primarypairs command in PSAMM can be used to perform an element transferring pair prediction using the FindPrimaryPairs algorithm. The basic command can be run as the following:

(psamm-env) $ psamm-model primarypairs --exclude @../additional_files/exclude.tsv

This function often requires a file to be provided through the --exclude option. This file is a single column list of reaction IDs of any reactions the user wants to remove from the model when doing the reactant/product pair prediction. the file path should be included in the command with a ‘@’ preceding it. Typically, this file should contain any artificial reactions that might be in the model such as Biomass objective reactions, macromolecule synthesis reactions, etc. While these reactions can be left in the model, the fractional stoichiometries and presence of artificial metabolites in the reaction can cause the algorithm to take a much longer time to find a solution. In this example of the E. coli core model the only reaction like this is the biomass reaction Biomass_Ecoli_core_w_GAM, which this is the only reaction listed in the exclude.tsv file.

Note

The FindPrimaryPairs algorithm relies on metabolite formulas to make its reactant/product pair predictions. If any reaction contains a metabolite that does not have a formula then it will be ignored.

The output of the above command will look like the following:

INFO: Model: Ecoli_core_model
INFO: Model version: 3ac8db4
INFO: Using default element weights for fpp: C=1, H=0, *=0.82
INFO: Iteration 1: 79 reactions...
INFO: Iteration 2: 79 reactions...
INFO: Iteration 3: 8 reactions...
GLNS    nh4_c[c]        h_c[c]  H
FBA     fdp_c[c]        g3p_c[c]        C3H5O6P
ME2     mal_L_c[c]      nadph_c[c]      H
MANNI1PDEH      manni1p[c]      nadh_c[c]       H
PTAr    accoa_c[c]      coa_c[c]        C21H32N7O16P3S
....

Basic information about the model name and version is provided in the first few lines. In the next line, the element weights used by the FindPrimaryPairs algorithm are listed. Then, as the algorithm goes through multiple iterations, it will print out the iteration number and how many reactions it is still figuring out the pairing for. A four column table is then printed out that contains the following columns from left to right: Reaction ID, reactant ID, product ID, and elements transferred.

From this output, the Acetate Kinase reaction from the above example can be compared to the manual prediction of the element transfer. The reaction ID for this reaction is ACKr:

ACKr    atp_c[c]        adp_c[c]        C10H12N5O10P2
ACKr    atp_c[c]        actp_c[c]       O3P
ACKr    ac_c[c] actp_c[c]       C2H3O2

From this result it can be seen that the prediction contains the same three element transferring pairs as the above manual prediction; ATP -> ADP, ATP -> Acetyl-Phosphate, Acetate to Acetyl-Phosphate.

This basic usage of the primarypairs command allows for quick and accurate prediction of element transferring pairs in any of the reactions in a genome scale model. Additionally, the function also has a few other options that can be used to refine and adjust how the pair prediction work.

Modifying Element Weights

The metabolite pair prediction relies on a parameter called element weight to inform the algorithm about what chemical elements should be considered more or less important when determining metabolite similarity. An example of how this might be used can be seen in the default element weights that are reported when running primarypairs.

INFO: Using default element weights for fpp: C=1, H=0, *=0.82

These element weights are the default weights used when running primarypairs with the FindPrimaryPairs algorithm. In this case, a weight of 1 is given to carbon. Because carbon forms the structural backbone of many metabolites this element is given the most weight. In contrast, hydrogen is not usually a major structural element within metabolites. This leads to a weight of 0 being given to hydrogen, meaning that it is not considered when comparing formulas between two metabolites. By default, all other elements are given an intermediate weight of 0.82.

These default element weights can be adjusted using the --weights command line argument. For example, to adjust the weight of the element nitrogen while keeping the other elements the same as the default settings, you could use the following command:

(psamm-env) $ psamm-model primarypairs --weights "N=0.2,C=1,H=0,*=0.82" --exclude @../additional_files/exclude.tsv

In the case of a small model like the E. coli core model, the results of primarypairs will likely not change unless the weights are drastically altered. However, changes could be seen in larger models, especially if the models include many reactions related to non-carbon metabolism such as sulfur or nitrogen metabolism.

Report Element

By default, the primarypairs result is not filtered to show transfers of any specific element. In certain situations it might be desirable to only get a subset of these results based on if the reactant/product pair transfers a target element. To do this, the option --report-element can be used. In many cases, it might be desirable to only report carbon transferring reactant/product pairs, to do this run the following on the E. coli model.

(psamm-env) $ psamm-model primarypairs --report-element C --exclude @../additional_files/exclude.tsv

If the predicted pairs are looked at for one of the mannitol pathway reactions, MANNIDEH, the following can be seen:

MANNIDEH        manni[c]        fru_c[c]        C6H12O6
MANNIDEH        nad_c[c]        nadh_c[c]       C21H26N7O14P2

If this result is compared to the results without the --report-element C option, it can be seen that when there are additional transfers in this reaction, but they only involve hydrogen.

MANNIDEH        manni[c]        nadh_c[c]       H
MANNIDEH        manni[c]        h_c[c]  H
MANNIDEH        manni[c]        fru_c[c]        C6H12O6
MANNIDEH        nad_c[c]        nadh_c[c]       C21H26N7O14P2

Pair Prediction Methods

Two reactant/product pair prediction algorithms are implemented in the PSAMM primarypairs command. The default algorithm is the FindPrimaryPairs algorithm. The other algorithm that is implemented is the Mapmaker algorithm. These algorithms can be chosen through the --method argument.

$ psammm-model primarypairs --method fpp
or
$ psamm-model primarypairs --method mapmaker