ProQuant® is precise, but is it accurate as well?
Our patented DDA bottom-up proteomics protocol makes several changes to a traditional bottom-up proteomics method, in particular in the way in which the MS and bioinformatics methods combine to quantify precursor ions. Throughout the development of our method we were primarily focussed on the precision (reproducibility) of the method, which is exceptional compared with comparable DDA methods.
But there’s a difference between precision and accuracy. There’s a good summary on Wikipedia if you want to delve a bit deeper, but put simply…
Precision is a measure of how close repeat measurements are to each other
Accuracy is a measure of how close measurements are to the true value
By being extremely precise, our patented DDA method gives you reproducible and reliable measurements of protein and peptide abundances, PTM fractional modifications and fractional cleavage. But how close are these measurements to the true value? This is where things get difficult of course, because knowing the true value of anything we attempt to measure in science is notoriously difficult. The approach we took was to compare the results from one of our non-hypothesis driven analyses of human serum with the circulating concentrations of those proteins we obtained from the scientific literature.

It’s not a surprise to find that there is an association between these two datasets, when they do of course aim to measure the same things. What this data does show, however, is the superior performance of the data obtained using our bottom-up DDA method compared with the data in two established online databases:
a comprehensive absolute protein abundance database maintained by the Bioinformatics / Systems Biology group at the University of Zurich.
a compendium of results from MS proteomics datasets published by the human proteome organisation.


Details of the methods used
-
- A non-hypothesis driven analysis of human serum using ProQuant® following depletion of seven abundant serum proteins. Using our standard methodology only proteins with at least two unique peptides were included.
- The “H.sapiens – Serum, SC (Peptideatlas,jul,2021)” dataset downloaded from the PAXdb protein abundance database on 06 Dec 2022.
- The “Human Plasma 2021-07 build” dataset downloaded from the PeptideAtlas protein abundance database on 06 Dec 2022.
The concentrations of proteins in the serum proteome from the scientific literature was collated by a scientist who had no access to any of the above datasets (Thank you Becca!). One of the key sources of data used was the Geigy Scientific Tables (vol 3) but where data was not available there a brief review of the scientific literature was undertaken. Where multiple sources gave similar concentrations for the serum concentration of a given protein the median value was taken, but where it was clear that there were considerable discrepancies in the literature or the data was hard to find that protein was excluded.
Firstly, proteins that are composed of multiple polypeptide chains, or can be found in multiple forms comprising different polypeptide chains were excluded. For example, proteomics data for haemoglobin are output as HBA_HUMAN and HBB_HUMAN, whereas the literature combines this into a single ‘haemogloblin’ protein concentration. Similarly, members of the complement and clotting cascades are found in multiple forms in plasma, with the predominant form often comprising only some of the regions of the protein detected by proteomics methods.
Secondly, the ProQuant® analysis of the human serum proteome was carried out following abundant protein depletion, so those proteins were excluded from the analysis.
At this point the data was filtered to only complete cases – where we have ‘predicted concentrations’ from the literature and data from all three proteomics datasets. In this analysis we were left with 137 proteins.
r² shown on the graphs are Pearson’s r² from analysis of the logged data.