Read sequence quality plots

Figure 1 - Base quality values summarised by position for Read1 (A) and Read2 (B). Green and orange continuous lines represents mean and median quality, respectively. Dashed orange lines represent the 25th and 75th percentiles of sequence quality. Red trace represents coverage. These plots reproduced in supplementary file 1_quality_profiles.pdf

Error models

## 111294200 total bases in 556471 reads from 5 samples will be used for learning the error rates.
## 103420350 total bases in 459646 reads from 4 samples will be used for learning the error rates.
## Read1 error convergence:  47, 0.44, 0.025, 0.0038, 0.00027, 0
## Read2 error convergence:  50, 0.55, 0.038, 0.0061, 0.0029, 0

Figure 2 - log10 error probability vs quality score for each observed base difference in Read1 (A) and Read2 (B). Expected trend shown in red. Fitted line shown in black.

Amplicon sequence variant (ASV) length analysis

Figure 3 - ASV lengths plotted as a histogram. See supplementary file 5_reads_per_seqlen.pdf for this histogram

0 of 36835 ASVs were removed that were either <0 or >600 bases long.

29584 of 36835 putatively chimeric ASVs were removed.

Tracking table of reads/ASVs at different stages of the analysis

Input Filtered DenoisedR1 DenoisedR2 Merged NoChimera
A001-1 245617 151980 150535 151076 146706 136105
A001-2 261367 165102 163369 164155 159042 146309
A002-1 143686 87940 87460 87641 86063 85453
A002-2 88526 54624 53946 54290 52074 51756
A003-1 159341 96825 96053 96486 94318 92912
A004-1 110850 67240 66956 67025 64893 64717
A004-2 295158 177173 175603 176293 167402 153553
A046-1 181781 112119 109238 110630 103597 89515
A046-2 184956 113073 110509 111505 105484 95749
B015-1 264369 162295 160697 161388 155894 149718
B015-2 225185 134128 133051 133495 129105 123808
B016-1 240872 153419 151392 152248 145309 128386
B016-2 317736 200209 197021 198288 184595 156334
C007-1 49382 31049 30338 30594 28643 27148
D008-1 111923 59358 57490 58167 53143 47190
D008-2 243310 137407 134062 134919 125485 106735
D010-1 230518 143817 141310 142517 133011 119077
D010-2 240750 156020 153816 154789 145279 129642
D011-1 125207 78600 77811 78069 75711 73476
D011-2 259237 152757 149542 151103 142104 132907
D012-1 209680 128346 124512 126131 113280 102929
D012-2 189598 119607 117671 118343 110734 103891
D018-1 243025 151461 150597 150845 145409 133595
D018-2 274818 163443 162032 162396 152300 130040
E013-1 179601 110316 106814 108469 97979 85915
E013-2 64290 39180 38616 38603 36740 36068
E014-1 176920 103323 100944 102018 95921 89943
E014-2 66745 43205 42317 42589 40474 39309
F017-2 82760 48442 47127 47623 44727 42763
F024-1 246471 158846 155982 157012 149377 133868
F024-2 207399 127617 124167 125308 116673 101892
G021-1 251268 162778 159541 160671 150716 130660
G021-2 191024 119075 116642 117756 111290 101468
G022-1 250753 149921 147129 148303 141386 124914
G022-2 193428 113591 111450 112154 107239 96486
G025-1 158840 96751 94862 95675 89472 81015
G025-2 189929 121341 119581 120460 115378 108466
H026-1 221197 136921 135976 136215 130779 118407
H026-2 264901 167227 166131 166373 159388 134786
I029-1 228039 141249 139804 140532 136294 129762
I030-1 192717 118940 117347 118078 112314 108335
I031-1 129690 78518 78058 78141 75411 75104
I031-2 300876 188610 184511 186548 173418 144685
I032-1 127471 76336 74488 75248 70478 67662
I033-1 312087 193144 189305 191115 178055 160202
I033-2 254404 158340 155886 157292 149125 131498
I034-1 239829 144659 142235 143097 132683 116118
J036-1 146798 85683 83400 84517 79159 76322
J037-1 178776 108624 106556 107555 101932 99024
J038-1 207682 128120 126401 127182 122620 117361
J039-1 86261 55138 53387 54185 49552 45884
J040-1 153073 97379 95299 96362 90358 86709
J040-2 242836 152338 149401 150836 142032 134337
K043-1 244709 153289 151805 152427 145794 136308
M049-1 214579 136628 133805 134971 127906 113029
M049-2 284329 178271 174876 175891 168316 148692
M050-1 215724 131588 129212 130243 121200 106041
N052-1 199480 120894 118454 119404 109586 95256
N052-2 257783 159851 158715 159284 154204 151017
N053-1 50772 31803 31499 31571 30923 30542
N053-2 80362 50491 49305 49812 46355 43354
N054-1 192490 119667 118564 119130 115726 113016
N055-1 216655 135400 133296 134241 124477 114004
N055-2 279564 177989 175436 176580 165903 151225
N056-1 184723 115633 112905 113977 107975 95332
N056-2 245655 160874 157973 159353 152402 137301
N058-1 308145 188990 184784 187276 170974 153381
N058-2 225890 140075 139016 139533 135670 129676
N059-1 256408 159763 156054 157885 148119 132854
povestrainmix 230916 146960 146690 146525 141420 128410

This table is found in supplementary file 7_tracking_table.tsv

Non-chimeric ASV counts table is found in supplementary file 9_asvs.txt and accompanying taxon assignments are found in 9_taxa.txt. A transposed ASV counts table (ASVs in rows and samples in columns) is found in 6_seqtab_nochim_t.txt

The ASV counts table for positive control samples is found in 8_mock_asvs_raw.txt and accompanying taxon assignments are found in 8_mock_taxa_raw.txt

Library versions

## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
## [1] C
## 
## time zone: Europe/London
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggpubr_0.6.0    lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1  
##  [5] dplyr_1.1.4     purrr_1.0.4     readr_2.1.5     tidyr_1.3.1    
##  [9] tibble_3.3.0    ggplot2_3.5.2   tidyverse_2.0.0 dada2_1.36.0   
## [13] Rcpp_1.0.14    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1            farver_2.1.2               
##  [3] Biostrings_2.76.0           bitops_1.0-9               
##  [5] fastmap_1.2.0               GenomicAlignments_1.44.0   
##  [7] digest_0.6.37               timechange_0.3.0           
##  [9] lifecycle_1.0.4             pwalign_1.4.0              
## [11] magrittr_2.0.3              compiler_4.5.0             
## [13] rlang_1.1.6                 sass_0.4.10                
## [15] tools_4.5.0                 yaml_2.3.10                
## [17] knitr_1.50                  ggsignif_0.6.4             
## [19] labeling_0.4.3              S4Arrays_1.8.1             
## [21] interp_1.1-6                DelayedArray_0.34.1        
## [23] plyr_1.8.9                  RColorBrewer_1.1-3         
## [25] abind_1.4-8                 ShortRead_1.66.0           
## [27] BiocParallel_1.42.1         withr_3.0.2                
## [29] hwriter_1.3.2.1             BiocGenerics_0.54.0        
## [31] grid_4.5.0                  stats4_4.5.0               
## [33] latticeExtra_0.6-30         colorspace_2.1-1           
## [35] scales_1.4.0                dichromat_2.0-0.1          
## [37] SummarizedExperiment_1.38.1 cli_3.6.5                  
## [39] rmarkdown_2.29              crayon_1.5.3               
## [41] generics_0.1.4              RcppParallel_5.1.7         
## [43] httr_1.4.7                  reshape2_1.4.4             
## [45] tzdb_0.5.0                  cachem_1.1.0               
## [47] parallel_4.5.0              XVector_0.48.0             
## [49] matrixStats_1.5.0           vctrs_0.6.5                
## [51] Matrix_1.7-3                carData_3.0-5              
## [53] jsonlite_2.0.0              car_3.1-2                  
## [55] IRanges_2.42.0              hms_1.1.3                  
## [57] S4Vectors_0.46.0            rstatix_0.7.2              
## [59] jpeg_0.1-10                 jquerylib_0.1.4            
## [61] glue_1.8.0                  codetools_0.2-20           
## [63] cowplot_1.1.3               stringi_1.8.7              
## [65] gtable_0.3.6                GenomeInfoDb_1.44.0        
## [67] deldir_2.0-4                GenomicRanges_1.60.0       
## [69] UCSC.utils_1.4.0            pillar_1.10.2              
## [71] htmltools_0.5.8.1           GenomeInfoDbData_1.2.14    
## [73] R6_2.6.1                    evaluate_1.0.3             
## [75] lattice_0.22-5              Biobase_2.68.0             
## [77] backports_1.5.0             png_0.1-8                  
## [79] Rsamtools_2.24.0            broom_1.0.5                
## [81] bslib_0.9.0                 SparseArray_1.8.0          
## [83] xfun_0.52                   MatrixGenerics_1.20.0      
## [85] pkgconfig_2.0.3