A multivariate prediction model for microarray cross-hybridization

View/ Open
Date
2006Author
Chen, Yian A.
Chou, Cheng-Chung
Lu, Xinghua
Slate, Elizabeth H.
Peck, Konan
Xu, Wenying
Voit, Eberhard O.
Almeida, Jonas S.
Metadata
Show full item recordAbstract
Background: Expression microarray analysis is one of the most popular molecular diagnostic
techniques in the post-genomic era. However, this technique faces the fundamental problem of
potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA
microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate
predictive modeling has been performed to understand how multiple variables contribute to
(cross-) hybridization.
Results: We propose a systematic search strategy using multiple multivariate models [multiple
linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an
effective set of predictors for hybridization. We validate this approach on a set of DNA
microarrays with cytochrome p450 family genes. The performance of our multiple multivariate
models is compared with that of a recently proposed third-order polynomial regression method
that uses percent identity as the sole predictor. All multivariate models agree that the 'most
contiguous base pairs between probe and target sequences,' rather than percent identity, is the
best univariate predictor. The predictive power is improved by inclusion of additional nonlinear
effects, in particular target GC content, when regression trees or ANNs are used.
Conclusion: A systematic multivariate approach is provided to assess the importance of multiple
sequence features for hybridization and of relationships among these features. This approach can
easily be applied to larger datasets. This will allow future developments of generalized hybridization
models that will be able to correct for false-positive cross-hybridization signals in expression
experiments.