The Samuel Roberts Noble Foundation, Inc.   The Sumner Group: MSFACTs
 

MSFACTs
Duran, A.L., Yang, J., Wang, L. and Sumner, L.W. (2003). Metabolomics Spectral Formatting, Alignment and Conversion Tools (MSFACTs). Bioninformatics 19(17): 2283-2293.

Abstract
Motivation: The amplified interest in metabolic profiling has generated the need for additional tools to assist in the rapid analysis of complex data sets.

Results: A new program; metabolomics spectral formatting, alignment and conversion tools, (MSFACTs) is described here for the automated import, reformatting, alignment, and export of large chromatographic data sets to allow more rapid visualization and interrogation of metabolomic data.

MSFACTs incorporates two tools: one for the alignment of integrated chromatographic peak lists and another for extracting information from raw chromatographic ASCII formatted data files. MSFACTs is illustrated in the processing of GC/MS metabolomic data from different tissues of the model legume plant, Medicago truncatula. The results document that various tissues such as roots, stems, and leaves from the same plant can be easily differentiated based on metabolite profiles. Further, similar types of tissues within the same plant, such as the first to eleventh internodes of stems, could also be differentiated based on metabolite profiles.

Availability: Freely available upon request for academic and non-commercial use.

To access the software package (~80MB), please direct your browser to bioinfo.noble.org/download/ and complete the registration process. Upon completion an automated reply will be sent to you with another URL for downloading the software. Please feel free to direct your colleagues to this site as well; however we ask that you not distribute the software directly. This will allow us to better track usage and impact.

MSFACTs is a standard Java/Swing application that imports, aligns, and reformats spectral and chromatographic data using two applications; RTAlign and RICExtract. MSFACTs accepts and converts integrated peak lists, composed of chromatographic retention times and peak areas, using the tool entitled RTAlign. Alternatively, raw spectral or chromatographic data exported as ASCII formatted text can be processed via the RICExtract tool. This tool also allows for the binning of ASCII formatted data from most UV, IR, and NMR.

About MSFACTs

Software Agreement

Introduction

System Requirements

Installation

Start Up Menu Commands

RTAlign

RICExtract

Troubleshooting

Statistics

MSFACTs
(Metabolomics Spectral Formatting, Alignment, and Conversion Tools)

v1.0

Version date: August 21, 2002

The Samuel Roberts Noble Foundation, Inc.
2510 Sam Noble Parkway
Ardmore OK 73402

http://www.noble.org

Credit:
Jian Yang
Liangjiang Wang
Anthony L. Duran
Lloyd W. Sumner

http://www.noble.org/PlantBio/MS/index.html

The Noble Foundation Bioinformatics group is responsible for software development lwang@noble.org

The Noble Foundation Metabolomics group is responsible for applications lwsumner@noble.org

Send comments, and recommendations to lwsumner@noble.org

Return To Index

MSFACTs

MSFACTs is available through a licensing agreement and at no cost to academic and nonprofit organizations. MSFACTs is also available to for profit and commercial organizations through a similar licensing agreement.

Return To Index

1.0 INTRODUCTION

MSFACTs v1.0 is a collection of tools created to assist researchers in converting and formatting metabolite profile data for further analysis. Version 1.0 contains two tools, RTAlign and RICExtract.

Return To Index

2.0 SYSTEM REQUIREMENTS

Download and install the appropriate Java 2 runtime environment (JRE) for your platform http://www.java.sun.com. JREs are available for Windows, Linux, and Solaris. J2SETMv 1.4.0_01 is the current release. Note: MSFACTs requires version 1.4.0 or higher.

Return To Index

3.0 INSTALLATION - Windows

Double click the executable file

Read the license agreement

Use the browse button to select the destination folder. Note: We recommend the default installation folder for optimum performance and ease of troubleshooting.

Press the install button to begin installation

 

Locate the directory folder program where installed. Hint: Add a shortcut to the desktop for quick access.

Double click the msfacts.bat file to run program.

 

Return To Index

4.0 Start up Menu Commands

File Menu

RTAlign - Chooses the RTAlign tool setup window.

RICExtract - Chooses the RICExtract tool setup window.

Exit - Exits the program

Help Menu

Content - Contains information to assist in running the tools.

About - Contains information about the program.

Return To Index

5.0 RTAlign - Note: Test data, labeled RTAlign*.txt, are located on the installation cd.

RTAlign is a tool that aligns integrated peak lists or report area counts based upon a user defined retention time.

Source Files - The source files are input data in the form of chromatogram integration reports in ASCII format and can be from many different instrument types as in the examples below. The software requires a TIC or FILENAME heading followed by the filename. The PEAK NUMBER, RETENTION TIME, and AREA COUNT data columns must be in sequential order. Any number and type of columns are allowed between these columns.

HP ChemStation Integration Report

Bruker Integration Report

Add - Input files are chosen for processing by selecting the add button. Note: Integration reports can be in individual files (a single chromatogram integration report in one file) and/or in a large batch file (multiple chromatogram integration reports in one file).

Delete - Deletes the highlighted source files.

Output File - The reformatted output is saved as tab delimited data that is easily read by Microsoft® Excel (spreadsheet software), Infometrix Pirouette® (chemometrics modeling software), or SAS (Statistical Software)®. Note: Be sure and select a destination directory, otherwise the file will be saved to the default installation directory.

Select - Output is saved in the default installation directory unless a directory is chosen with the Select button.

Start - Begins data processing.

Runtime Parameters - These are processing parameters that can be adjusted to optimize clustering and alignment.

Interval - Sets the window width (default 0.08 minutes) for alignment.

Hint: Try running multiple windows to get a feel for how data sets will respond. This will also provide a better understanding of the collision handling technique describe below.

Collision Handling - Provides a tactic for dealing with multiple peaks that fall within a single time window resulting in ambiguity. This type of condition occurs commonly due to co-eluting peaks, etc. The collision handling algorithm highlights these problematic areas, which are described below in the Output Control section.

Neighbor - If a collision occurs, this algorithm attempts to first push the lower retention time and area to the left (low retention time) cell. If cell is occupied, an attempt is then made to push the higher retention time and area into the right (high retention time) cell. If a push is successful, the cell is marked with a Forced Fit marker. If both the low and high cells are already occupied, areas remain combined and they are marked with a collision cell marker.

Split - Attempts to split the collided retention time and areas into two individual cells. Values that cannot be split will be given a symbol defined by the Collision option of the Output Control.

None - Noattempt is made to find individual cells for values that fall within the same window. Colliding cells are marked with symbol defined by the Collision option of the Output Control.

OutputControl

Select Item - For choosing the type of output data to save.

Retention - Saves only the retention time to file.

Area - Saves only the peak area to file.

Both - Saves both the retention time and associated area to file.

Orientation - For orienting output data.

Horizontal - Saves data in row orientation.

Vertical - Saves data in column orientation.

Oversize - For working with large data sets.

One - Saves all output in one file.

Split/254 - Saves output in multiple files with a maximum of 254 columns. The user provided output filename will automatically be appended with sequential numbering (i.e. Out.out becomes Out_0.out; Out_1.out). Hint: use this split function when the data will be viewed or processed in software such as Microsoft® Excel which has a limitation of 255 columns.

Forced Fit - Annotates areas that are forced into left (low retention time) or right (higher retention time) cell by selecting the collision handling Neighbor button under runtime parameters.

Mark with - Allows user to chose a symbol to mark forced fit values.Note: Be aware that markers may interfere with downstream processing. Hint: Try using marker as a delimiter in Microsoft® Excel to avoid downstream interference.

Collision - Marks areas that have been combined to fit within a single cell because their retention times fall within the same interval window.

Mark with - Allows user to choose a symbol to highlight collided values. Note: Be aware that markers may interfere with downstream processing.

Empty Field - Inserts a user specified value into fields that are absent.

Filled with - Allows user to define an empty field filler. Hint: We suggest a value that is approximately one-half the baseline noise.

Cluster Cutoff - Allows user to choose whether or not to remove data that fails to meet a defined threshold.

Enabled - Data will be deleted that does not meet the threshold value. For example, if user processes 30 integrated runs and the cutoff is set to 20, if only 10 runs have areas at a retention time of 15.001 minutes, then all values are discarded at 15.001 minutes. Note: Default value of zero means all values will be written to file.

 

Status Bar - Provides details about the files being processed.

Status - Provides information about the processed data.

Runs Processed - Displays the number of integrated chromatograms that have been successfully processed.

Peaks - Displays the number of integrated peaks that have been processed.

Clusters - Displays the number of clusters or groups created by the alignment process.

Return To Index

6.0 RICExtract - Note: Test data, labeled RIC*.txt, are located on the installation cd.

RICExtract is a tool that extracts reconstructed ion chromatographic (RIC) information from standard system chromatographic data files. The extracted information can be used to reconstruct chromatograms in data analysis software.

Source Files - The source files are ASCII converted chromatographic data files. This is a standard format conversion from most chromatographic system files. A commercial package such as MASSTransit® can also carry out the conversion.

Add - Input files are chosen for processing by selecting the add button. Note: Integration reports can be in individual files (a single chromatogram integration report in one file) and/or in a large batch file (multiple chromatogram integration reports in one file).

Delete - Deletes the highlighted source files.

Output File - The reformatted output is saved as tab delimited data that is easily read by Microsoft® Excel (spreadsheet software), Infometrix Pirouette® (chemometrics modeling software), or SAS (Statistical Software)®. Note: Be sure and select a destination directory, otherwise the file will be saved to the default installation directory.

Select - Output is saved in the default installation directory unless a directory is chosen with the Select button.

Start - Begins data processing

Status Bar - Provides details about the files being processed.

Status - Provides information about the current condition of the program and processed data.

Files Processed - Displays the number of chromatograms that have been successfully processed.

RIC Extracted - Displays the total number of data points extracted.

Return To Index

7.0 Troubleshooting - Check website for additional readme.html updates. http://www.noble.org/PlantBio/MS/index.html

Problem: MSFACTs fails to start or program crashes whith specific Run Parameters or Ouput Controls.

Solution: Make sure you are using jre 1.4.0 or higher. Change system path so that jre 1.4.0 precedes older jre's.

Return To Index

 

8.0 Statistics-  MSFACTs RTAlign allows adjustment of the retention time interval used for classification; however it is obviously impossible that a single size interval would be large enough to account for variations in retention time shifts and yet small enough to achieve total separation of consecutive peaks within a single run . A collision occurs when more than one peak (Pt) from the same run (Rx) are determined to be in the same cluster. 

Lt < RxPt; RxPt' <Ht - collision  

While there is no simple way to avoid such collisions, RTAlign provides facilities to help resolve the problems with subsequent data processing. The collision resolution is not a remedy to the limitation of the classification process, but is merely an improvement in output. MSFACTs includes the following approaches for resolving collisions during data alignment.

A major factor affecting collisions is interval time width.  An appropriate interval size can be approximated through statistical analyses of representative data. For example, 267 consecutive GC/MS analyses acquired over 17 days contained approximately 32,270 data points that yielded a standard deviation (?) in retention time of 0.0146 minutes for all peaks. Applying a confidence level of 99.7%, we can statistically assume that 99.7% of measurements would be within ±3.0? or 0.0877 minutes and set our window accordingly to 0.0877 minutes. Similarly 95% confidence levels would be set at ±1.96? and 99.9% would be set at ±3.29? based on user preference.

As the interval time increases so does the frequency of collisions. This is illustrated in the plot below. Collision frequencies for the above mention dataset were 0.14% at the 95% confidence level window setting and 0.75% at the 99.7% confidence level window setting. Most collisions that remain after this step are generally comprised of two closely adjacent peaks.

 

Return To Index

© 1997-2008 by The Samuel Roberts Noble Foundation, Inc.