In all expression data files, any whitespace (spaces and/or tabs) is
considered a delimiter between adjacent fields. Every line of text is
either the header line or contains all the measurements for a
particular gene. No name conversion is applied to expression data
files.
The names given in the first column of the expression data file should
match exactly the names used elsewhere (i.e. in SIF or GML files).
The first line is a header line with one of the following three formats:
<text> <text> cond1 cond2 ... cond1 cond2 ... [NumSigConds]
<text> <text> cond1 cond2 ...
<tab><tab>RATIOS<tab><tab>...LAMBDAS
The first format specifies that both expression ratios and
significance values are included in the file. The first two text
tokens contain names for each gene. The next token set specifies the
names of the experimental conditions; these columns will contain ratio
values. This list of condition names must then be duplicated exactly,
each spelled the same way and in the same order. Optionally, a final
column with the title NumSigConds may be present. If present, this
column will contain integer values indicating the number of conditions
in which each gene had a statistically significant change according to
some threshold.
The second format is similar to the first except that the duplicate
column names are omitted, and there is no NumSigConds fields. This
format specifies data with ratios but no significance values.
The third format specifies an MTX header, which is a commonly used
format. Two tab characters precede the RATIOS token. This token is
followed by a number of tabs equal to the number of conditions,
followed by the LAMBDAS token. This format specifies both ratios and
significance values.
Each line after the first is a data line with the following format:
FormalGeneName CommonGeneName ratio1 ratio2 ... [lambda1 lambda2 ...]
[numSigConds]
The first two tokens are gene names. The names in the first column are
the keys used for node name lookup; these names should be the same as
the names used elsewhere in Cytoscape (i.e. in the SIF or GML files).
Traditionally in the gene expression microarray community, who defined
these file formats, the first token is expected to be the formal name
of the gene (in systems where there is a formal naming scheme for
genes), while the second is expected to be a synonym for the gene
commonly used by biologists, although Cytoscape does not make use of
the common name column. The next columns contain floating point values
for the ratios, followed by columns with the significance values if
specified by the header line. The final column, if specified by the
header line, should contain an integer giving the number of
significant conditions for that gene.
Missing values are not allowed and will confuse the parser. For
example, using two consecutive tabs to indicate a missing value will
not work; the parser will regard both tabs as a single delimiter and
be unable to parse the line correctly.
Optionally, the last line of the file may be a special footer line
with the following format:
NumSigGenes int1 int2 ...
This line specified the number of genes that were significantly
differentially expressed in each condition. The first text token must
be spelled exactly as shown; the rest of the line should contain one
integer value for each experimental condition.