We present bacterial 16S rRNA gene datasets derived from stool samples of 44 patients with diarrhea indicative of a Clostridioides difficile infection. For 20 of these patients, C. difficile infection was confirmed by clinical evidence. Stool samples from patients originating from Germany, Ghana, and Indonesia were taken and subjected to DNA isolation. DNA isolations of stool samples from 35 asymptomatic control individuals were performed. The bacterial community structure was assessed by 16S rRNA gene analysis (V3-V4 region). Metadata from patients and control individuals include gender, age, country, presence of diarrhea, concomitant diseases, and results of microbiological tests to diagnose C. difficile presence. We provide initial data analysis and a dataset overview. After processing of paired-end sequencing data, reads were merged, quality-filtered, primer sequences removed, reads truncated to 400 bp and dereplicated. Singletons were removed and sequences were sorted by cluster size, clustered at 97% sequence similarity and chimeric sequences were discarded. Taxonomy to each operational taxonomic unit was assigned by BLASTn searches against Silva database 123.1 and a table was constructed.

Figure 1: Bacterial community composition at family level of human stool samples analysed in this study.
Figure 1

The bacterial community profiles are based on operational taxonomic unit (OTU, defined at 97% genetic identity) frequency in stool samples of 44 patients with diarrhea indicative of C. difficile infection and 35 asymptomatic control individuals (n=79). One stool sample per patient was used and amplicon PCRs were performed in triplicate for this analysis. Families, which exhibited an abundance of lower than 1% in the entire dataset, were summarized as rare taxa. Relative abundance of C. difficile (Peptoclostridium difficile in SILVA database 123.1) is displayed separately and exhibited highest similarity to Clostridioides difficile strain 630 delta erm (Accession number CP016318). Occurrence of diarrhea in patents is indicated by plus (patient exhibited diarrhea) and minus (no diarrhea), results from microbiological diagnosis of C. difficile infection (C. d. m. t.) are shown below (plus, positively tested for C. difficile; minus, negatively tested for C. difficile). Presence and absence of C. difficile in amplicon data (C. d. NGS) are indicated by plus (present) and minus (absent). Data processing and employed tools are described in detail in the methods section.

Full size image

Figure 2: Multivariate analysis of the bacterial community from human stool samples.
Figure 2

Non-metric multidimensional scaling (NMDS) based on weighted Unifrac12 was used to display the bacterial community structure in 79 stool samples at same sequencing effort (10.000 reads per sample). Samples from patients who…