Notas
| Methodology:
Genomic DNA extracted from pooled libraries was processed for Tn-seq (see manuscript) and the sequencing was performed using an Illumina MiSeq system with 150 cycles. Briefly, filtered and trimmed reads were mapped against HB27 assembly GCA_000008125 using Bowtie2 2.5.1 (with --very-sensitive option). We considered only Tn-seq insertions derived from reads mapping within the 10-90% interval of each gene to avoid issues with truncated or chimeric proteins. Tn insertion counts are derived from read mapping coordinates, normalized by the total mapped reads in coding regions. We use a log2 transformation to manage pseudocounts and scale the data with the R function scale() to obtain a Z-score, facilitating sample comparison. To analyze the gene scores, we compared PAM, Kmeans and DBSCAN clustering methods using the R packages cluster, factoextra and dbscan, respectively. The best results by highest average silhouette width were obtained for a Kmeans clustering in two main groups. Then, the large group with less insertion rate was split into two groups by shortlisting the 20% of genes with lower Z-score, obtaining the three groups of Less Permissive, Intermediate and Highly Permissive genes. Finally, gene insertion rates were analyzed in the context of gene conservation and available data of both essential genes in other prokaryotic organisms and Thermus thermophilus transcription data. |