High-throughput RNAseq experiments provide quantitative readouts as count data. This read count has been found to be approximately linearly related to the amount of target transcript. Therefore, in RNAseq experiments comparing read counts between different biological conditions is of special interest. In the simplest case read counts are compared for each gene class by class as a means of looking for high fold-changes.
Under the assumption that reads were sampled from a population with fixed fraction of genes, the read counts would follow a multinomial distribution, which can be approximated by the negative binomial distribution. The negative binomial distribution is uniquely determined by its mean and variance. However, if the number of replicates in a data set is too small, a reliable estimation for both parameters is hardly accessible. Our script can be used to calculate a list of differentially expressed genes, even if you do not have biological replicates in one or even both conditions. Yet, one may not want to draw strong conclusions from such an analysis, but may still find useful results for exploration and hypothesis generation.
RNASeq Count Analyzer
In order to use our script for count analysis you have to preprocess your count data from each sample in the following way (an example data set can be found
here):
- Load count data into Excel
- Select two columns containing a unique identifier for each gene and the count (or frequency) values.
- Make a new table from these columns.
- Further columns containing e.g. gene symbols may be added to this table.
- Save this table as tab-delimited txt file.
- Choose your tables in the web-frontend and select the group each sample came from (A or B).
- Upload your data.
After this the calculation will start. Please wait until the script has finished. The result from the analysis will be shown automatically. In the result viewer you can filter the results for minimum FC (based on log2), p-value (from the nbinom-test, corrected by Benjamini-Hochberg method), or minimum count in at least one sample. You can finally save the results by clicking on the download link.
By using the web application you agree with the processing of your data. You have taken note of the aforementioned notice in the Contact / Imprint (privacy policy).
Inference about differential gene expression is based on the ratio of measured expression levels in two conditions. But a given fold change in measured expression may have a different interpretation for a gene whose absolute count is low as compared to a gene that has high counts in both conditions.
Implementation Details
This software is implemented using the following tools:
- R (version 3.3)
- PHP (version 7.0)
- DESeq (version 1.24.0 [Anders and Huber, Genome Biology 2010])
Contact
Prof. Dr.rer.nat. Hans A. Kestler (Dipl.-Ing.)
Institute of Medical Systems Biology
University of Ulm
For internal use only!