“School of Biological Sciences”
Back to Papers HomeBack to Papers of School of Biological Sciences
Paper IPM / Biological Sciences / 14424 |
|
||||||
Abstract: | |||||||
RNA-seq technology has been widely used as an alternative approach to traditional
microarrays in transcript analysis. Sometimes gene expression by sequencing, which
generates RNA-seq data set, may have missing read counts. These missing values can
adversely affect downstream analyses. Most of the methods for analysing the RNA-seq data
sets require a complete matrix of RNA-seq data. In the past few years, researchers have
been putting a great deal of effort into presenting evaluations of the different imputation
algorithms in microarray gene expression data sets, However, these are limited works
for RNA-seq data sets and a comparative study for investigating the performance of the
missing value imputation for RNA-seq data is essential. In this paper, we propose the use
of some parametric models such as Regression imputation, Bayesian generalized linear
model, Poisson mixture model, EM approach , Bayesian Poisson regression, Bayesian
quasi-Poisson regression and the Bootstrap version of two latter for single imputation of
missing values in RNA-seq count data sets. The approaches are also applied for identifying
differentially expressed genes in the presence of missing values. Multiple imputation,
proposed by Rubin (1978), is also used for multiple imputation of missing RNA-seq counts.
This approach allows appropriate assessment of imputation uncertainty for missing values.
The performance of the single and multiple imputations are investigated using some
simulation studies. Also, some real data sets are analyzed using the proposed approaches.
Download TeX format |
|||||||
back to top |