What is BACTOME?

BACTOME is an interactive bacterial genome database providing detailed information about a large collection of clinical Pseudomonas aeruginosa isolates. This includes gene sequence information (DNA sequencing, raw data available at NCBI), transcriptome profiles (RNA sequencing, raw data available at NCBI) as well as phenotypic data.

Using BACTOME you can:

BACTOME was published in the Database Issue of Nucleic Acids Research and should be cited as follows (1):

Hornischer, K., Khaledi, A., Pohl, S., Schniederjans, M., Pezoldt, L., Casilag, F., … Häussler, S. (2018). BACTOME — a reference database to explore the sequence- and gene expression-variation landscape of Pseudomonas aeruginosa clinical isolates. Nucleic Acids Research, 1–5. http://doi.org/10.1093/nar/gky895

Phenotype Section


The phenotype section comprises recorded hospital-derived data for each clinical isolate, such as the geographic origin or the infection site origin. Additionally, four clinically relevant phenotypic categories were screened and recorded for all isolates: antibiotic susceptibility, colony morphology, biofilm formation and virulence.

Antibiotic susceptibility

The minimal inhibitory concentration (MIC) values were determined by agar dilution in at least triplicates. Breakpoint definition of antibiotic resistance and susceptibility was done according to CLSI (Clinical and Laboratory Standards Institute) guidelines as follows (2):

S I R
Ceftazidime ≤8 16 ≥32
Ciprofloxacin ≤1 2 ≥4
Colistin ≤2 4 ≥8
Meropenem ≤2 4 ≥8
Tobramycin ≤4 8 ≥16

Colony morphology

Colony morphology was assayed on Columbia Blood Agar plates after 24h incubation at 37°C. Five clearly distinguishable parameters were recorded and categorized:

Colony size ranging from 0 (very small (SCV)) to 3 (large)
Spreading of the colony/colony margin ranging from 0 (none) to 2 (large)
Surface appearance ranging from 0 (rough surface) over 1 (smooth and shiny) to 2 (clearly mucoid)
Iridescent metallic sheen ranging from 0 (none) to 2 (clearly visible throughout the bacterial lawn)
Hemolytic activity ranging from 0 (none) to 2 (enhanced)

Biofilm formation

Biofilms were treated with live/dead staining (Syto9 and propidium iodite) and recorded after 48h of incubation at 37°C using an automated confocal laser scanning microscope. For a number of isolates, biofilm categories were assigned. This was done manually by visual inspection of the images according to specific structural features.

Virulence

For virulence evaluation, Galleria mellonella larvae were infected with bacterial cells and incubated at 37°C. Larval death was assayed after 24h and 48h by the lack of movement and melanization of the cuticle. Categorization was done ranging from 1 (avirulent, survival rate at least 75%) to 4 (virulent, survival rate below 25%).

Phenotype Distribution

The phenotype distribution section arranges the phenotypic information of a single isolate in the context of all isolates. Start by either selecting a particular phenotype or a clinical isolate (typing in the isolate ID or clicking on it). Both options lead to a depiction of the variations of each phenotype among the clinical isolates. Click on the different riders to switch between phenotypes. If you started by defining a particular isolate, your isolate will be highlighted within the distribution. Alternatively, you can choose to highlight a particular isolate by subsequently inserting its isolate ID (e.g. CH2591).

The virulence display integrates additional features which highlight any two corresponding data points from the 24h and 48h time points for every isolate by mouse over. They can be clicked to gain more detailed information on the phenotype characteristics for the chosen isolate (this section can also be accessed through the ISOLATE SELECTION or by clicking the ID of the already selected isolate in the top left corner).

Isolate Selection

The isolate selection section shows all phenotypic characteristics for one particular clinical isolate in detail or filters for a group of isolates with similar categorization in one or more phenotypes. Select a particular isolate by inserting its ID (e.g. CH2707) or clicking on it in the zoom-able, interactive phylogenetic tree.

Click on the different riders to switch between phenotypes. You can subsequently switch isolates by inserting a new isolate ID. Alternatively, select one specific phenotypic characteristic or combinations of these to retrieve lists of isolates with similar phenotypic properties. The lists can be downloaded and used for further genetic analysis e.g. via the group comparison tools.

Genome Section

BACTOME comprises genomic data of all clinical isolates. Sequence information was created by DNA sequencing on an Illumina HiSeq or MiSeq platform.

Pangenome

The pangenome is referred to as the gene repertoire of a particular species in a dataset of “n” genomes, where “n” is the number of genomes. Based on a set of orthologous genes, the gene repertoire can be categorized as conserved (core genes from all of the “n” genomes), mobile (accessory genes present in, 1 < accessory < n genomes) and singletons (strain-specific genes). This Pseudomonas aeruginosa pangenome was created based on 101 genomes (99 clinical isolates plus two reference genomes: PAO1 and UCBPP-PA14).
Get more information on the abundance of your gene of interest within the species by inserting a UCBPP-PA14 or PAO1 gene name or gene ID (e.g. ampC).

Consensus Sequence

The consensus sequence of core genes was created based on a position-wise sequence alignment from which the most frequently occurring nucleotide in each position was taken. Insert a UCBPP-PA14 gene ID or gene name (e.g. ampC) for the consensus sequence of your gene of interest and compare sequence similarities and differences between the clinical isolates and the two reference strains. Note: Currently only core genes which are identical in length in all of the genomes are used.
At positions where variations occur, the respective nucleotide is shown on top of the consensus nucleotide. Overall diversity at a specific position is indicated by grey shaded squares with shading corresponding to the extent of diversification. The consensus sequence and the individual sequences of the clinical isolates can be scrolled, downloaded or text-copied from the depicted SVG.

SNP extraction

This tool extracts information on mutations (single nucleotide polymorphisms (SNPs), insertions and deletions) for any genes and isolates of interest. The mutation information is derived from mapping the sequencing reads against the UCBPP-PA14 genome as a reference. Insert a list of genes, select your isolates and define the analysis options before processing. You may adjust the SNP quality score, the minimal read coverage, as well as the extent of gene flanking regions included in the SNP detection.
Example: retrieve mutations in ampC, parC and gyrA in the clinical isolates (e.g. CH2582, CH2591, CH2597, CH2657, CH2658, CH2675, CH2677).

Further specification such as SNP location or effect can be defined by additional filtering options. SNP scores are SAMtools derived and can range between 1 and 222 (with 222 representing highest confidence). SNPs which lead to amino acid exchanges or could have an effect on predicted RNA folding structures or transcription factor binding sites, as well as insertions and deletions are marked red in the output table.

SNP Comparison

This comparison tool uses R’s Fisher’s exact test to search for significant accumulations of mutations in groups of isolates (3). Please define your isolates of interest and choose between the three comparison modes:

  • SNPs (position-wise) searches for frequently mutated nucleotides
  • SNPs (gene-wise) accumulates SNPs within one gene to extract frequently mutated genes
  • Stop codons searches for intragenic stop sites

Further comparison options like minimal read coverage or SNP quality score at the considered positions can be custom defined. Please mouse over the different parameters for more information. If ‘Join “Position SNPs”’ is marked, SNPs at the same nucleotide position are connectedly analyzed independent of possibly different nucleotide exchanges at the particular position. For an estimation of relevant p-value cut-offs of the obtained results, the analysis may be repeated with randomly permutated datasets by choosing the option “Run statistics with permutated data matrix”. While the analysis is usually using Bonferroni corrected p-values, the option to use uncorrected p-values is available for intragenic stop codons, as they are rare throughout the genome.

Example groups for comparison are available here and here.

Transcriptome Section

BACTOME comprises RNA sequencing data of all integrated clinical isolates. The transcriptomes were recorded in LB broth at an OD = 2 and read data was mapped to UCBPP-PA14 as a reference.

Relative Expression Distribution

The database features an interactive gene browser containing the transcriptomes of all isolates. Expression lines of the isolates are colored according to their phylogenetic relatedness in two groups, PA14-like (red) and PAO1-like (blue). Optionally, also the expression line of a particular selected isolate can be colored in black (e.g. CH2500). Additionally, the median expression of each group and the total median expression are depicted.

Choose to display only PAO1- or PA14-like isolates or “all” and select a genome location to start with (gene name, e.g. ampC, UCBPP-PA14 locus tag or genomic position). “Relative Display” adjusts the y-axis scale to the maximum transcription values while scrolling; “Absolute Display” preserves the y-axis scale. The initial display window (x-axis) can be selected to cover 100-5000 nt. Depending on the zoom level, varying further features such as operon structure or predicted transcription factor binding sites are available (see “Features”). The data can be exported in several text or graphic formats for further processing.

Gene Expression Extraction

This function retrieves gene expression values (as log2 fold changes) for any gene of interest in any isolate. Insert a list of genes and select your isolates before processing. Note that there are only single values available for the transcriptomes, which are thus not supported by p-values.
Example: retrieve the expression of ampC, gyrA and parC in CH2500, CH2527, CH2582, CH2598, CH2639 and CH2660.

Gene Expression Comparison

The gene expression group comparison tool uses Student’s t-test to search for significant differential gene expression between two groups of isolates using logarithmic, normalized reads per kilobase (lnRPK) (3, 4) (normalized with DESeq2 and by gene length (5)). Simply define your two groups of interest (e.g. isolates clustered according to their phenotypes) to use the tool. Optionally you may also run your comparison with a permutated dataset to get an idea about reasonable p-value cut-offs for your analysis.

Example groups for comparison are available here and here.


Publications analysing the clinical isolates

Multiple analyses using these clinical isolates have already been published:

References

1. Hornischer,K., Khaledi,A., Pohl,S., Schniederjans,M., Pezoldt,L., Casilag,F., Muthukumarasamy,U., Bruchmann,S., Thöming,J., Kordes,A., et al. (2018) BACTOME—a reference database to explore the sequence- and gene expression-variation landscape of Pseudomonas aeruginosa clinical isolates. Nucleic Acids Research, gky895, 10.1093/nar/gky895.
2. CLSI (2017) Performance Standards for Antimicrobial Susceptibility Testing. 27th ed CLSI supplement M100, Wayne, PA: Clinical; Laboratory Standards Institute.
3. R Core Team (2017) R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria.
4. Dötsch,A., Eckweiler,D., Schniederjans,M., Zimmermann,A., Jensen,V., Scharfe,M., Geffers,R. and Häussler,S. (2012) The Pseudomonas aeruginosa transcriptome in planktonic cultures and static biofilms using RNA sequencing. PLoS One, 7, e31092, 10.1371/journal.pone.0031092.
5. Love,M.I., Huber,W. and Anders,S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550, 10.1186/s13059-014-0550-8.
6. Kordes,A., Grahl,N., Koska,M., Preusse,M., Arce-Rodriguez,A., Abraham,W.-R., Kaever,V. and Häussler,S. (2019) Establishment of an induced memory response in Pseudomonas aeruginosa during infection of a eukaryotic host. The ISME journal, 13, 2018–2030, 10.1038/s41396-019-0412-1.
7. Erdmann,J., Thöming,J.G., Pohl,S., Pich,A., Lenz,C. and Häussler,S. (2019) The Core Proteome of Biofilm-Grown Clinical Pseudomonas aeruginosa Isolates. Cells, 8, 1129, 10.3390/cells8101129.
8. Muthukumarasamy,U., Preusse,M., Kordes,A., Koska,M., Schniederjans,M., Khaledi,A. and Häussler,S. (2020) Single-nucleotide polymorphism-based genetic diversity analysis of clinical pseudomonas aeruginosa isolates. Genome Biology and Evolution, 12, 396–406, 10.1093/gbe/evaa059.
9. Thöming,J.G., Tomasch,J., Preusse,M., Koska,M., Grahl,N., Pohl,S., Willger,S.D., Kaever,V., Müsken,M. and Häussler,S. (2020) Parallel evolutionary paths to produce more than one Pseudomonas aeruginosa biofilm phenotype. npj Biofilms and Microbiomes, 6, 2, 10.1038/s41522-019-0113-6.
10. Khaledi,A., Weimann,A., Schniederjans,M., Asgari,E., Kuo,T.-H., Oliver,A., Cabot,G., Kola,A., Gastmeier,P., Hogardt,M., et al. Predicting antimicrobial resistance in pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO Molecular Medicine, e10264, 10.15252/emmm.201910264.