| Introduction |
|
Mass spectrometry (ms) is an analytical procedure for proteomics data for studying protein structure and posttranslational modifications. Raw data produced by a mass spectrometer is analyzed in a multistep process that yields a list of identified entities and their quantification. The protocol followed at the Complex Carbohydrate Research Center (CCRC) for protein identification from ms data is typical in proteomics research. This high-throughput process may generate more than 500 data files from a single sample. This analytical procedure was originally conducted manually by transferring data across distributed systems and then invoking software tools. The scientists, who were responsible for keeping track of each result file across multiple projects, often spent frustratingly long hours searching for a previous result or trying to correlate results using handwritten notes. We completely automated this analytical process as a scientific workflow using semantic Web services (Web services annotated with ontological concepts) that were orchestrated using the Taverna workflow engine [http://taverna.sourceforge.net/]. Many prior efforts have automated scientific protocols and workflow automation in itself is not novel; what is new is the support for semantic provenance. To help the scientists manage the large volumes of data using provenance information, as the next step, we developed the ProPreO proteomics provenance ontology (described in the next section). Next we implemented a set of semantic provenance creation services that are plugged in at each intermediate step of the workflow. This infrastructure is called the Semantic Provenance Annotation of Data in protEomics (SPADE). |
|
|
Back