Quality assessment of preprocessed signals

by Henrik Bengtsson on 2013-06-18 16:10:37 CEST.

Data

Data file containing raw scan signals

TabularTextFile:
Name: Nelly qPCR_full data_1902 expt_new settings
Tags: 
Full name: Nelly qPCR_full data_1902 expt_new settings
Pathname: src/Nelly qPCR_full data_1902 expt_new settings.csv
File size: 821.21 kB (840922 bytes)
RAM: 0.03 MB
Number of data rows: 4608
Columns [24]: 'row', 'column', 'assayID', 'label', 'conc', 'ct', 'normInit', 'normFinal', 'normRange', 'r2', 'chi2', 'straightR2', 'efficiency', 'outlier', 'f0', 'flags', 'asymmetry', 'a', 'b', 'c', 'd', 'e', 'ampRatio', 'ctMinusMean'
Number of text lines: 4609
File checksum: 17619f4f2a208fc7f87872271c0c0c8a

Data overview

cycle thresholds:

 num [1:8, 1:2, 1:32, 1:3, 1:3] 21.8 20.8 22.1 22.4 22.4 ...
 - attr(*, "dimnames")=List of 5
  ..$ sample       : chr [1:8] "Negative ctrl" "siRNA anti-FOXA1" "siRNA anti-GATA2" "siRNA anti-MLL2" ...
  ..$ condition    : chr [1:2] "R" "V"
  ..$ assayID      : chr [1:32] "AB GapDH" "ABCC4" "ABHD2" "AR" ...
  ..$ vialReplicate: chr [1:3] "R1" "R2" "R3"
  ..$ scanReplicate: chr [1:3] "S1" "S2" "S3"

Quality assessment

Spatial representation

Spatial image of raw cycle counts.

Figure: Spatial image of artifically colored cycle thresholds (Ct).

Scan replicate

X vs Y plot. M vs A plot.

Figure: Pairwise comparison of scan replicate i and j across 8 samples and 32 genes in 2 conditions with 3 replicates each (in total number 1536 data points) . Top panels: pairwise cycle thresholds (Ct). Bottom panels: Pairwise log-ratio (M) versus log-intensity cycle (A).

Vial replicate

X vs Y plot. M vs A plot.

Figure: Pairwise comparison of vial replicate i and j across 8 samples and 32 genes in 2 conditions with 3 replicates each (in total number 1536 data points) . Top panels: pairwise cycle thresholds (Ct). Bottom panels: Pairwise log-ratio (M) versus log-intensity cycle (A).

Averaging replicated signals

We first average scan replicates (S1, S2, S3) and then vial replicates (R1, R2, R3). We average using robust PCA, which is done by first adjusting for small differences in offset and scale factors between scans and then calculating the median cycle threshold, cf. aroma.light::calibrateMultiscan().

Vial replicates after averaging scan replicates

X vs Y plot. M vs A plot.

Figure: Pairwise comparison of vial replicate i and j across 8 samples and 32 genes in 2 conditions with 3 replicates each (in total number 1536 data points) . Top panels: pairwise cycle thresholds (Ct). Bottom panels: Pairwise log-ratio (M) versus log-intensity cycle (A).

Data after averaging scan replicates:

 num [1:8, 1:2, 1:32, 1:3] 21.8 20.8 22.1 22.3 22.4 ...
 - attr(*, "dimnames")=List of 4
  ..$ sample       : chr [1:8] "Negative ctrl" "siRNA anti-FOXA1" "siRNA anti-GATA2" "siRNA anti-MLL2" ...
  ..$ condition    : chr [1:2] "R" "V"
  ..$ assayID      : chr [1:32] "AB GapDH" "ABCC4" "ABHD2" "AR" ...
  ..$ vialReplicate: chr [1:3] "R1" "R2" "R3"

Comparision to pre-calculated average signals

Data file containing signals after averaging raw scans

The SmartChip qPCR software flags outliers and calculates average scan signals, which are stored in the following file:

TabularTextFile:
Name: Nelly qPCR_replicate_1902 expt_new settings
Tags: 
Full name: Nelly qPCR_replicate_1902 expt_new settings
Pathname: src/Nelly qPCR_replicate_1902 expt_new settings.csv
File size: 117.47 kB (120289 bytes)
RAM: 0.02 MB
Number of data rows: 1536
Columns [11]: 'label', 'assayID', 'n', 'outliers', 'rejected', 'totalReps', 'ct', 'ctSD', 'efficiency', 'conc', 'flags'
Number of text lines: 1537
File checksum: 1ce6d67ae0645dd15e675f24d62ff3ef
 num [1:8, 1:2, 1:32, 1:3] 21.9 20.9 22.1 22.3 22.5 ...
 - attr(*, "dimnames")=List of 4
  ..$ sample       : chr [1:8] "Negative ctrl" "siRNA anti-FOXA1" "siRNA anti-GATA2" "siRNA anti-MLL2" ...
  ..$ condition    : chr [1:2] "R" "V"
  ..$ assayID      : chr [1:32] "AB GapDH" "ABCC4" "ABHD2" "AR" ...
  ..$ vialReplicate: chr [1:3] "R1" "R2" "R3"

X vs Y plot.

Figure: Comparison of average scan cycle thresholds (across S1, S2, S3) as calculated by “us” versus the pre-calculated ones. Correlation is 0.9994.

Missing values

The pre-calculated averaged data has 93 (6.05%) missing cycle thresholds, whereas our average data has 61 (3.97%) missing cycle thresholds. The pre-calculated and our averaged data have 61 missing values in common. The 32 rescued missing values in our averaged data, are distibuted as:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  29.98   32.49   33.06   32.90   33.46   35.56 

compared to the all signals:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19.65   25.47   27.18   27.16   28.99   35.56 

After averaging scan and vial replicates

Data after averaging scan replicates as well:

Our averages:

 num [1:8, 1:2, 1:32] 21.8 20.8 22.1 22.2 23 ...
 - attr(*, "dimnames")=List of 3
  ..$ sample   : chr [1:8] "Negative ctrl" "siRNA anti-FOXA1" "siRNA anti-GATA2" "siRNA anti-MLL2" ...
  ..$ condition: chr [1:2] "R" "V"
  ..$ assayID  : chr [1:32] "AB GapDH" "ABCC4" "ABHD2" "AR" ...

Pre-calculated averages:

 num [1:8, 1:2, 1:32] 21.9 20.9 22.1 22.2 23 ...
 - attr(*, "dimnames")=List of 3
  ..$ sample   : chr [1:8] "Negative ctrl" "siRNA anti-FOXA1" "siRNA anti-GATA2" "siRNA anti-MLL2" ...
  ..$ condition: chr [1:2] "R" "V"
  ..$ assayID  : chr [1:32] "AB GapDH" "ABCC4" "ABHD2" "AR" ...

X vs Y plot.

Figure: Comparison of average vial cycle thresholds (across R1, R2, R3) as calculated by “us” versus the pre-calculated ones. Correlation is 0.9981.

Missing values

The pre-calculated averaged data has 22 (4.3%) missing cycle thresholds, whereas our average data has 16 (3.12%) missing cycle thresholds. The pre-calculated and our averaged data have 16 missing values in common. The 6 rescued missing values in our averaged data, are distibuted as:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  32.25   33.03   33.36   33.34   33.62   34.45 

compared to the all signals:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19.72   25.43   27.21   27.20   29.02   34.45 

Conclusions

The pre-calculated signal averages are very similar to the ones we calculate via robust PCA. The only disadvantage is that the former introduces an additional 6 missing values. We choose to rescue those by averaging using our approach. However, due to their high cycle thresholds (low biological signals) it is unlikely that those signals are of much value in the end. The remaining missing values cannot be rescued, because no signal was obtained from those in the first place.

Appendix

Session information

R version 3.0.1 Patched (2013-06-14 r62965)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] aroma.core_2.9.6   markdown_0.5.4     R.cache_0.6.10     plyr_1.8          
 [5] aroma.light_1.31.3 matrixStats_0.8.5  R.devices_2.3.0    R.filesets_2.0.4  
 [9] R.rsp_0.9.7        R.utils_1.24.4     R.oo_1.13.7        R.methodsS3_1.4.4 

loaded via a namespace (and not attached):
[1] digest_0.6.3 PSCBS_0.35.3 tools_3.0.1