Metagenomics and Whole Genome Sequencing in Clinical Microbiology: A Narrative Review
Correspondence Address :
Dr. Preeti Thakur,
Assistant Professor, Department of Microbiology, Lady Hardinge Medical College, New Delhi-110001, India.
E-mail: drpreetithakur@yahoo.com
The role of Whole Genome Sequencing (WGS) in clinical microbiology is increasing and is not restricted to molecular epidemiology alone. With the advent of third-generation sequencing technologies, its applications in infectious disease diagnostics have further expanded to include the direct identification of microorganisms from primary clinical samples, prediction of antimicrobial resistance (including antiviral resistance) by detecting resistance genes, and detection of virulence genes. Untargeted metagenomic sequencing of clinical samples can provide a promising platform for the comprehensive diagnosis of infections. This article outlines the applications and scope of WGS and metagenomic sequencing in routine clinical microbiology, along with the challenges and practical issues in their implementation.
Bioinformatics, Genomic library, Infectious disease
The WGS is an emerging method for assembling genomes of novel microbial pathogens and completing the genomes of known organisms. De novo WGS involves assembling a genome without the use of a reference genome and is often used to sequence novel microbial genomes (1). Microbial whole-genome re-sequencing involves sequencing the entire genome of bacteria, viruses, and other microbes and comparing their sequence to known reference sequences (2).
Over the years, WGS has been applied mainly in specific niches such as molecular epidemiology; however, as technology progresses, the role of WGS in clinical microbiology is increasing. Applications in infectious disease diagnostics consist of the direct identification of microorganisms from primary clinical samples and the prediction of antimicrobial, including antiviral, resistance by characterising resistance genes and detecting virulence genes (3). Sequencing the entire bacterial, viral, and other microbial genomes is important for microbial identification in multiple clinical samples and for comparative genomic studies.
For sequencing the isolates from clinical samples, the optimum sequencing method is determined by the type of sample and previous information on the organisms of interest (4). For this purpose, WGS of DNA or Whole-Transcriptome Sequencing (WTS) of RNA can be performed. These methods can detect known targets or offer a hypothesis-free analysis of the sample. The targeted sequencing approach is recommended when the target organism is either known or suspected. In targeted resequencing a subset of genes is isolated and further sequenced. This approach includes techniques like DNA or RNA amplicon or enrichment sequencing for known and suspected pathogens, respectively. With amplicon sequencing, the high number of different amplicons resulting from amplification of a usual sample is arranged and sequenced. Target enrichment captures the genomic regions of interest by hybridization to target-specific biotinylated probes, which are then isolated (5),(6). When the target organism is unknown, approaches like shotgun metagenomics or meta transcriptomic sequencing are used (7). WGS mainly relies on the use of new technologies such as Next-Generation Sequencing (NGS). NGS allows rapid sequencing of a large amount of genetic material compared to the original Sanger sequencing method. The typical workflow of NGS consists of genomic library preparation, sequencing, and data analysis.
The objective of this review was to discuss the application and scope of WGS, including metagenomic sequencing, in routine clinical microbiology.
Application in Infectious Disease Diagnostics
The conventional testing for pathogens, mainly bacteria and fungi, in clinical samples ranges from the identification of microorganisms growing in culture by phenotypic tests or automated methods, followed by antimicrobial susceptibility testing. The entire process usually takes 2-4 days for the final report, although it can take longer for fungi and other slow-growing bacteria. However, this workflow fits less accurately for viruses, where organism-specific markers such as antigen or antibody detection methods are more commonly performed in conjunction with molecular approaches such as Polymerase Chain Reaction (PCR) (8).
Untargeted metagenomic sequencing of clinical samples can provide a promising platform for the comprehensive diagnosis of infections. In general, nearly all pathogens, including viruses, bacteria, fungi, and parasites, can be identified in a single assay in much less time compared to conventional methods. Metagenomic sequencing is appealing because of its culture-independent approach, which reduces the chance of false-negative results and may be used to accurately detect organisms that are non cultivable or difficult to culture, including bacteria and viruses (4),(9). With the introduction of NGS, many organisms can be sequenced simultaneously. Furthermore, NGS can detect low-frequency variations and genomic rearrangements that would otherwise be ignored or too costly to detect using current methods (10). Newer technologies have made it possible to decrease the turnaround time while increasing the throughput at the same time. As it is a single-tube assay, it not only helps in diagnostic identification but also offers additional information, like subtyping of the strains, prediction of antimicrobial resistance, and virulence profiling (11). The comparison between the workflow of conventional methods and metagenomic sequencing is depicted in (Table/Fig 1).
One of the earliest known examples in which metagenomic sequencing was employed is the case of a 14-year-old boy with severe combined immunodeficiency who was diagnosed as a case of neuroleptospirosis by NGS (12). Nonetheless, several groups have successfully established the use of metagenomic sequencing for the diagnosis of infections, including meningitis or encephalitis, sepsis, and pneumonia (Table/Fig 2) (13),(14),(15),(16),(17),(18).
Metagenomic sequencing can also be used in cases where the pathogen is completely novel or when it is a variant of a known pathogen that leads to false-negative results. A classic example of this is the identification of SARS-CoV-2 in 2020 and the monitoring of its variants during the pandemic. Independent teams of Chinese scientists discovered that the causative agent of severe pneumonia is a beta coronavirus that had never been seen before, using metagenomic RNA sequencing from bronchoalveolar lavage fluid samples from patients (19). Few studies have also established the clinical metagenomic pipeline for the identification of SARS-CoV-2 and its variants of concern (20),(21). One group from India applied the amplicon-based target sequencing method to sequence and further characterise the SARS-CoV-2 virus from clinical samples collected from all over the country (22). These target-based approaches can further provide insight into the evolution of the virus during the pandemic.
The genotypic detection and prediction of drug resistance genes have revolutionised the clinical microbiological laboratory as they can identify the organism and detect drug resistance genes simultaneously. Understanding and identifying horizontal gene transfer among pathogenic and non pathogenic species may aid in determining the mechanisms underlying resistance transmission and dissemination. The use of high-throughput sequencing technologies has made antibiotic resistance gene sequence analyses feasible and accessible. Numerous studies have utilised the sequencing platform for antimicrobial resistance testing (23),(24),(25),(26). The prediction of antibiotic resistance genes in clinical samples has led to improvement in patient outcomes (27).
The importance of the microbiome and its likely involvement in both acute and chronic disease states cannot be ignored (28). Various studies have concluded that the skin microbiome is a possible key factor in antibiotic-resistant Staphylococcus aureus infections, and the nasal microbiome has interacted with the pneumococcus population to influence its epidemiological carriage patterns subsequent to vaccination programs (29),(30),(31).
Many researchers now use metagenomic sequencing instead of targeted sequencing of the 16S rRNA gene for in-depth characterisation of the microbiome (32). One study has concluded that alterations in the gut microbiota precede bacteraemia caused by vancomycin-resistant Enterococcus faecium, and monitoring at-risk patients could avoid infection through early intervention (33). Irrational use of broad-spectrum antibiotics or recent gastrointestinal surgery is risk factors for the opportunistic infection of the gut by Clostridium difficile (C. difficile). Hence, microbiome analysis holds significance in the management of C. difficile-associated disease, evident by the 80-90% effectiveness of fecal stool transplantation in curing the disease. The development of a bacterial probiotic mixture used for prophylaxis or treatment of C. difficile-associated disease has become possible due to various studies using metagenomic sequencing for microbiome analysis (34).
Another application of the microbiome is in the study of bacterial diversity, which can differentiate between infectious causes of illness and non infectious causes. This has been demonstrated by a study that used metagenomic sequencing to identify pathogens in patients with pneumonia. It concluded that individuals with culture-proven infection had significantly less diversity in their respiratory microbiome (35).
Another study analysed sputum from cystic fibrosis patients and concluded that Whole Genome-New Generation Sequencing (WG-NGS) provides bacterial and viral components in a single analysis, which is more comprehensive than cultures and PCR and better covers taxonomic diversity. It revealed the anaerobic flora, fastidious bacteria, and viruses that are not screened in routine diagnostics, potentially leading to increased detection of fastidious organisms (36).
Lastly, expression profiling of genes using RNA-seq has been used to characterise several infections, including staphylococcal infections, Lyme’s disease, candidiasis, tuberculosis, and influenza. Whole blood analysis for tuberculosis risk signature identified people at risk of developing active tuberculosis, opening the possibility for targeted intervention to prevent the disease (37). The prior knowledge of human host response and transcriptome analysis may help identify the risk factors and enable early intervention. Although no RNA-seq-based assay has been clinically validated for use in patients, the potential clinical impact of RNA-seq analyses is high.
Practical Issues in Implementing WGS in Routine Diagnostic Microbiology
Infrastructure and technical requirement: The sequencing of pathogens primarily requires two components: experimental manipulations (wet lab) and bioinformatic analysis (dry lab). The wet lab manipulations include sample pretreatment, nucleic acid extraction, library construction, and sequencing. The dry lab bioinformatic analysis includes quality control of data, removal of human sequences, sequence alignment of sequences from microbial species, and analysis of output data. The workflow of metagenomic sequencing consists of the following components: sample collection and preprocessing, nucleic acid extraction, library preparation, sequencing, bioinformatic analysis, and reporting. For the routine implementation of WGS for the diagnosis of infectious diseases, various factors need to be taken into account, such as the clinical relevance of the organism in mixed sample types, the impact of sample variation, infrastructure and technical requirements, cost issues, quality control, and interpretation of data (Table/Fig 3).
Impact of human host background and sample type variation: The samples collected for sequencing can be from different sites depending on the clinical presentation of the patient and may be contaminated with human host DNA. The presence of human host DNA in the background can interfere with the sensitivity of the test. Another limitation of metagenomic sequencing is decreased sensitivity due to a smaller number of pathogens in initial samples or the presence of mixed flora, such as stool. To overcome these challenges, host depletion methods, also known as dehosting, have been developed for RNA libraries. These methods include the use of RNA probes followed by RNAase H treatment or CRISPR-Cas9 based approaches [38,39]. For DNA libraries, effective enrichment methods have been proven to be differential lysis of human cells followed by degradation of background DNA with DNase I and DNA methylation (40). The efficiency is also low in the case of detecting intracellular pathogens such as Mycobacterium tuberculosis and fungi. Therefore, certain suitable lysis methods are used to lyse the cell wall of these fungi and bacteria.
Cost Issues
There has been a substantial cost-reduction for sequencing over the last decade. Earlier, due to the presence of low-capacity systems, the throughput of machines was low, while costs were very high. However, with the advent of new sequencing techniques with less complex protocols, more and more laboratories are adopting this technique. The price and turnaround time will most likely decrease due to the existing competition between current and emerging sequencing platforms. Additionally, more technical improvements and automation are required during sample processing to increase throughput and decrease costs.
Quality Control
As with every analytical assay, quality control plays an important role in maintaining the diagnostic accuracy of these assays. Important steps in quality control may include initial sample quality checks, library parameters, sequence data generation, retrieval of internal controls, and performance of external controls. Well-characterised reference standards and controls, along with universally standardised and validated protocols, are needed to ensure assay quality (3). In addition to this, the recruitment of well-qualified and trained personnel is paramount due to the complexity of the procedure and to avoid errors and cross-contamination.
Challenges in Bioinformatic Pipeline
Bioinformatics pipelines for metagenomic sequencing utilise various algorithms, which are regularly updated. A typical bioinformatics pipeline analyses the raw input from FASTQ files. The analysis protocol includes procedures such as adapter trimming, human host reduction, alignment to reference databases, sequence assembly, and taxonomic classification of individual reads. For this purpose, the bioinformatics software should be user-friendly with high diagnostic accuracy. These software should also be validated before their research and diagnostic application (41).
Interpretation and the Clinical Relevance of Data
Finally, the main challenge for WGS is likely not technical but resides in the interpretation of the results by physicians. Metagenomic sequencing identifies all agents present in a sample, regardless of their association with the disease, and it is often unclear whether a detected microorganism is a contaminant, coloniser, or actual pathogen. Interpretation requires appropriate training and is easier in normally sterile samples, such as blood, than in samples with mixed flora.
There is no denying that WGS is a very lucrative alternative for the identification and characterisation of pathogen from the clinical samples and it outperforms traditional approach in terms of throughput, turnaround time, and culture independence. However, still there are considerable barriers that can affect its sensitivity and diagnostic accuracy due to which it cannot be used as a sole diagnostic test. Although many studies have established its utility in the diagnosis of infectious diseases but still large prospective clinical trials are required which can address various issues regarding the sensitivity, interpretation, laboratory workflow and overall cost for the implementation of these technologies. In near future, with further advancements in technologies and decrease in cost, this platform has potential to revolutionise the infectious disease diagnostics.
DOI: 10.7860/JCDR/2023/65580.18779
Date of Submission: May 31, 2023
Date of Peer Review: Aug 11, 2023
Date of Acceptance: Oct 14, 2023
Date of Publishing: Dec 01, 2023
AUTHOR DECLARATION:
• Financial or Other Competing Interests: None
• Was informed consent obtained from the subjects involved in the study? NA
• For any images presented appropriate consent has been obtained from the subjects. NA
PLAGIARISM CHECKING METHODS:
• Plagiarism X-checker: May 31, 2023
• Manual Googling: Aug 16, 2023
• iThenticate Software: Nov 24, 2023 (24%)
ETYMOLOGY: Author Origin
EMENDATIONS: 5
- Emerging Sources Citation Index (Web of Science, thomsonreuters)
- Index Copernicus ICV 2017: 134.54
- Academic Search Complete Database
- Directory of Open Access Journals (DOAJ)
- Embase
- EBSCOhost
- Google Scholar
- HINARI Access to Research in Health Programme
- Indian Science Abstracts (ISA)
- Journal seek Database
- Popline (reproductive health literature)
- www.omnimedicalsearch.com