Microbial identification should be commensurate with the microbial contamination risk to the product (1). Taxonomy is the science of naming and classifying organisms. It is a component of Systematics. Microbial taxonomy is a means by which microorganisms can be grouped together. Organisms having the similarities with respect to the criteria used are in the same group and are separated from the other groups of microorganisms that have different characteristics. Classification, nomenclature and identification are the three facets of taxonomy. Classification is the arrangement of organisms into groups (taxa) sharing similar morphologic, physiologic and genetic traits. Nomenclature refers to the assignment of names to taxonomic groups. Identification refers to the determination of the taxon to which a microbial isolate belongs. The two main functions of taxonomy are: describe as completely as possible the basic units of taxonomy, the “species” and to create an appropriate way of arranging and cataloguing these units (species) (2). Even though some of the concepts still applies, it is important to mention that in this article, when microbial identification is discussed, it is mainly discussed from bacterial identification standpoint as fungal identification has its own complexities and not covered in detail within this article.
In case of higher organisms, the biological concept of species can be used, which is defined from the reproductive point of view. It defines species as a group of organisms capable of interbreeding. This concept of species cannot be applied in bacteria as genetic recombination is very common in them and their means of reproduction is different from higher organisms as they typically reproduce by binary fission. A bacterial species is defined based on properties such as biochemical reactions, cell morphology, genetic characteristics and immunological properties. Identifying a species present the most difficult and challenging aspect of taxonomy. The definition of species per the Bergey’s manual (3) was, “A bacterial species may be regarded as a collection of strains that share many features in common and differ considerably from other strains”. The technical definition of a species based on DNA-DNA hybridization study was, an organism belongs to a same species if it has ~70% or greater DNA-DNA relatedness with ΔTm of 5°C or lower for the stability of heteroduplex molecules. The current definition of species is transitioning to a sequence-based taxonomy (4).
To understand the concept of microbial species it is important to understand the concept of “strains”. A strain is a population of microbe descended from a single individual or pure culture. A collection of microbial strains that share many properties and differ significantly from other group of strains form the “species”. In the microbial identification world, species are identified by comparison with known “type strains”, which are well characterized pure cultures. Type Strains can be obtained from culture collections such as American Type Culture Collection (ATCC), German Collection of Microorganisms and Cell Culture (Leibniz Institute DSMZ) etc. Strains belonging to the same species can also have differences within each other in certain characteristics such as pathogenicity, antimicrobial profiles etc.
It is interesting to note that microbial speciation is always in a state of flux as both the microorganisms and the science of systematics (evolutionary relationship/history) are constantly evolving. Traditionally, microbes were identified based on phenotypic characteristics. However, with the advent of technologies (gas chromatography, mass spectrometry) as well as the discovery of DNA and its associated technology (sequencing) contributed immensely to the science of microbial taxonomy as well as systematics. Figure 1 illustrates the evolution of bacterial identification in the microbiology laboratory. The future of microbial identification is Whole Genome Sequencing (WGS), the new gold standard as the whole-genome average nucleotide identity (ANI) is emerging as a robust method with organisms belonging to the same species typically showing ≥95% ANI among themselves (4, 5). ANI represents the average nucleotide identity of all orthologous genes shared between any two genomes and offers robust resolution between strains of the same or closely related species. Also, it closely reflects the traditional microbiological concept of DNA–DNA hybridization relatedness for defining species (5). In other words, ANI is the accepted in silico version of a DNA-DNA hybridization assay, the gold standard for demarcation of new bacterial species.
Fig 1: Evolution of microbial identification systems
The correct identification of microorganisms is of fundamental importance to microbial taxonomist. In clinical and diagnostic microbiology accurate identification of microbes play a critical role in timely diagnosis and treatment of diseases. In the era of antibiotic resistance crisis across the globe, accurate identification of microbes has become increasingly important to combat infections in a timely and efficient manner. It is equally important in the pharmaceuticals and the medical devices industries as it is helps: a. with source tracking of the contaminant within a manufacturing process, b. understand the environmental monitoring trend data, c. use as challenge organisms in disinfectant efficacy studies, d. growth promotion testing, e. method suitability and validation, f. as well as assessment of microorganisms of concerns/objectionable organisms, and g. antibiotic susceptibility testing of recovered isolates (from nonsterile drug products) (6). Thus, the accuracy of microbial identification is also important for the pharmaceuticals and medical devices industries to ensure product quality and enhance patient safety.
Microbial identification is primarily based on phenotypic, chemotaxonomic, and genotypic/phylogenetic characteristics of the microbes. Colony morphology, Gram staining (bacterial), cell morphology, biochemical characteristics falls under the category of phenotypic identification. The first-generation conventional phenotypic characterization were manual laboratory tests using test tubes, glass slides, API strips and BBL crystal panels and associated reagents. The second-generation phenotypic tests were semi-automated or automated biochemical tests such as MicroScan® Walkway, Phoenix®, Biolog®, Vitek® 2 systems. The chemotaxonomic systems are Fatty Acid Methyl Ester Analysis (FAME) (MIDI-Sherlock®), MALDI-TOF (Vitek® MS, MALDI Bio Typer® Microflex LT, MALDI micro MX®) and the genotypic systems are all nucleic acid sequence-based technologies targeting either specific molecular markers such as 16S rRNA, Internal Transcriber Spacer (ITS), D2, rpoD, gyrB to name a few or the whole genome sequence of the targeted isolate. ITS and D2 markers are widely used for fungal identification. The foundation is built upon DNA sequencing. Some of the examples of commercial genotypic systems are MicroSeq®, Riboprinter®, DiversiLab®. The MicroSeq® system is used for both bacterial and fungal species level identification, whereas, the Riboprinter® and the DiversiLab® systems are used for strain typing. The sequencing-based technologies are the third-generation ID systems. Another valuable sequence-based methodology is multilocus sequence typing (MLST). MLST is an unambiguous procedure for characterizing isolates of bacterial species using the sequences of internal fragments of (usually) seven house-keeping genes (typically constitutive genes that are required for the maintenance of basic cellular function). Approximately, 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated DNA sequencer. For each house-keeping gene, the different sequences present within a bacterial species are assigned as distinct alleles and, for each isolate, the alleles at each of the seven loci define the allelic profile or sequence type (https://pubmlst.org/general.shtml). The 16S rRNA gene sequences by far has been the most common house-keeping gene used for multiple reasons such as it is present in all bacteria, often existing as a multigene family or operons; the function of the 16S rRNA gene over time has not changed and it is large enough (1,500 bp) for bioinformatics purposes. The MALDI-TOF technology is the most recent technology and belongs to the fourth generation of identification platforms. It is emerging as an efficient, accurate and cost-effective alternative in microbial identification. The whole genome sequencing would probably be considered as the fifth-generation systems in the near future. Whole-genome ANI offers several important advantages such as higher resolution among closely related genomes. A taxonomic study by Potter et al. mentioned that the taxonomic identity of two strains (BP-1 and BP-2) recovered from ICU room surfaces which could not be identified using the MALDI-TOF were able to be identified by whole-genome sequencing as Superficieibacter electus (7). In a routine QC Microbiology Laboratory, the most commonly encountered systems are Vitek® 2, MicroSeq®, Vitek® MS (as well as other MALDI-TOF MS systems). The choice of the identification platform usually depends on the need and the budget of the laboratory as well as throughput. It is important to note that each of the above-mentioned system has its limitations and as mentioned earlier microbial speciation is in a state of flux, thus, it is important to understand what factors influences the accuracy of microbial identification. Like any other microbial data, microbial identification data must be interpreted and not accepted at its face value (8).
Accuracy of Microbial Identification
Accuracy of Microbial Identification depends on the following factors: a. Data quality b. Data Assembly and c. Data Interpretation (Quality of Reference Library). The process flow (Fig 2) for accurate microbial identification consists of the ID platform used, the quality of the data, the quality and robustness of the reference library and finally the data interpretation. It is important to note that analyst techniques also play a role in obtaining accurate ID. Based on the platform used it is important to understand how the data is generated/how the identification method works and what kind of controls should be in place to generate quality data. Regarding data quality, one of the most important first steps, irrespective, of the platform (Vitek® 2, Biolog®, MicroSeq®, MALDI-TOF) used is to obtain a pure culture of the target isolate. Mixed cultures can have a critical impact on the accuracy of microbial identification. This step plays a major role in the accuracy of the microbial data and thus Pure Culture Techniques are important. Before trying to identify the culture using any platform it is important to make sure that the culture is pure as in nature microorganisms prefer to live in communities and not as single pure cultures. Figure 3 represents colony forming units (CFU) on a typical microbial test plate. In this respect, Gram negative bacteria such as species belonging to the family Enterobacteriaceae and Pseudomonadaceae are more problematic than others. Another important aspect to note is that as microorganisms prefer to live in communities, in a microbiology laboratory we periodically encounter the problem of successfully sub-culturing of an isolate into a pure culture. The microorganism grows into a colony on the original test plate, however, it fails to grow as a pure culture, when sub-cultured on a fresh media plate as it is removed from the vicinity of its neighboring microbes, which are essential for its growth.
Fig 2: The process flow for accurate microbial identification
If the platform of choice is based on biochemical characteristics such as Vitek® it is important not only is to have a pure culture of the bacterial isolate but also the age of the culture is important. Gram Positive organisms tends to shed their cell wall as the culture ages, thus Gram-positive organisms might appear Gram negative leading to use of a wrong Vitek card for the identification. Using a wrong card can have an impact on the accuracy of the data. The Gram Stain techniques also plays a crucial role therefore it is important that the analyst performing the techniques is well trained and all the reagents and supplies associated are adequate. As a best practice, Gram Stain should also be performed in replicate and the result under the microscope should be checked/verified by a second analyst for accuracy, especially for Gram variable microorganisms. Both for Biolog® and Vitek® 2 systems, the concentration and uniformity of suspension required for inoculation plays a very critical role in the accuracy of the ID. For each microbial type, it is important to follow the manufacturer’s instructions closely for the range of the OD to be used for the suspension. Using suspension outside the recommended concentration range can have an impact on the accuracy of the data. Species belonging to the Genus Bacillus are problematic in this regard. If the systems require the use of laboratory incubators the incubator time and temperature play an important role in the accuracy of the ID.
Fig 3: Microbiological Media Plate with Colony Forming Units (CFU)
If the platform of choice is genotypic in addition to the purity of the culture, the purity of the extracted nucleic acid plays an important role in the accuracy of the ID. Additionally, the quantity of extracted nucleic acid also plays an important role. Enough quality and quantity need to be recovered from the samples of interest. In this regard, the microbial type (Gram-Negative and Gram Positive) depending on cell wall structure plays an important role, especially, for Bacillus species the age of the culture is also important as bacterial endospore disruption and nucleic acid extraction resulting in DNA of PCR amplifiable quality or quantity are not trivial tasks. For genotypic methods, it is also important to have right QC parameters for the raw sequence and the sequence assembly to generate reliable ID. Issues such as the number of position ambiguities, sequence gaps and use of gap and/or non-gapped programs with regards to sequence evaluation and analysis can affect the accuracy of the ID. Possible chimeric molecule formation can affect final identifications (9).
The next important criteria for generating accurate ID is the Reference Library. Some key questions to ask for Reference library are: How large and robust is the reference library? How it was originally created? It is important to understand which reference method was used to create the reference library? For example, although 16S rRNA gene sequencing is highly useful regarding bacterial taxonomy, it has low phylogenetic power at the species level and poor discriminatory power for some genera. Groups such as Enterobacteriaceae, rapid-growing Mycobcateria, Bacillus, Stenotrophomonas, Pseudomonas and Actinomyces. Another problem regarding the resolution of 16S rRNA gene sequencing concerns sequence identity or very high similarity scores between closely related species (9). Therefore, if the reference library is created with organisms initially identified by 16S rRNA sequencing methodology then it is important the sequence information was of high quality derived from well characterized strains (type strains). The usefulness of 16S rRNA gene sequencing as a tool in microbial identification is dependent upon two key elements, deposition of complete unambiguous nucleotide sequences into public or private databases and applying the correct “label” to each sequence (9). Is the reference library clinical driven or environmental driven? Many reference libraries only consist of clinical isolates as the libraries are biased towards clinical microbiology, which might not be adequate for accurate identification of pharmaceutical isolates. How often is the Reference library updated or curated? As mentioned earlier microbial taxonomy is in a state of flux. New microorganisms are discovered everyday as well as microorganisms undergo taxonomic (name) changes. Can the lab add its own environmental isolates data to the existing library to make it more relevant for its facility? The quality and the robustness of Reference library is one of the most important factors in generating accurate microbial identification (10). Even with MALDI-TOF MS, the limitation of the technology is that identification of new isolates is possible only if the spectral database contains peptide mass fingerprints of the type strains of specific genera/species/subspecies/strains (7, 11). For example, in the study by Seiffert et al. (12) the clinical isolate was initially identified as Klebsiella oxytoca using MALDI-TOF MS based identification, however, WGS identified the pathogen as K. michiganensis (95.24% query coverage, 96.7% reference coverage).
The last but not the least important criteria are the Data Interpretation, which is directly related to the Data Quality and the Reference Library. Another aspect of Data Interpretation is the understanding of the generated data and what to make of it. In this respect, the acceptance criteria are important. How have the lab or the vendor set the acceptance criteria for the system to determine the confidence level of the ID (if an ID is Genus, species or strain level ID)? How the lab resolves questionable ID such as ID only up to Family or Genus level ID, or more sometimes have more than one match? For Sequence based data, do the lab take advantage of Phylogenetic tree for assessing and resolving accuracy of ID? Taking everything into account it is important to understand what the accuracy/validity of the ID is. The PQ of the system plays an important role in understanding the robustness of the Reference library as well as the Data Interpretation. In this regard, it is important not only to use QC strains but also to incorporate Environmental isolates (accurately identified by a different platform or laboratory) in the PQ studies. As mentioned above, identification of new isolate is possible only if the database contains fingerprints of the type strains of specific genera/species/subspecies/strains.
In several instances, irrespective, of the platform used the system fails to generate accurate ID of the target isolate. The lab can encounter this issue with phenotypic systems, chemotaxonomic systems or even the gold standard genotypic systems. For example, discrimination of certain taxonomic groups, such as Bacillus cereus complex, Brukholderia cepacia complex, Escherichia coli and Shigella group, Enterobacter cloacae complex, and Pseudomonas putida complex, remain a challenge for routine MALDI-TOF MS analysis as well as 16S rRNA gene sequencing (13). Therefore, it is important to check the accuracy, or the validity of the ID generated. In this regard, it is beneficial to use “Polyphasic Taxonomy” to verify the ID. Polyphasic taxonomy the term coined by Colwell (14) is the integration of all the above-mentioned data (phenotypic, chemotaxonomic and genotypic/phylogenetic) and is used by modern taxonomist for identification of novel microorganism (14, 15, 16). In other words, “taxonomy that assembles and assimilates many levels of information from molecular, physiological, morphological, serological, or ecological sources to classify a microorganism” (16) Basically, when the lab encounters a questionable ID (or even an incomplete or NO ID), the lab can take advantage of a second platform to determine to resolve the issues of questionable, incomplete or no ID. It is advisable for the lab to always have access to more than one type of identification platform so that when required the accuracy/validity check of the ID could be performed. For example, if a laboratory uses Vitek®, Biolog® or MALDI-TOF as its primary identification platform then it could either have a genotypic based platform in-house or have a third-party lab qualified with that capability. Sometimes, using a dual laboratory to check the accuracy of the ID can also be helpful. The accuracy of the ID becomes very critical when the laboratory has to perform a sterility failure investigation, assessment of objectionable organisms as well as other microbiology related investigations. Therefore, a strategy involving the use of more than one technology should be in place for a QC microbiology laboratory. It is important to note that even though the future of taxonomy is sequence/genome based but the phenotypic traits of organisms will always be valuable, especially, in the pharmaceutical industry where it is important to determine the impact of the organisms on the product quality as well as the patient safety. Biochemical systems such as Vitek® and Biolog® can be very useful for the determination of the impact of the organism on the quality and stability of the product.
A contract laboratory was using Vitek 2 Compact identification system to perform a QC check (Identity) on a commercially obtained known ATCC culture of Bacillus species before using the strain in the laboratory for different applications. While performing the identification using the pure culture by following the laboratory SOP, the lab was not able to obtain the accurate species level identification for the organisms. This was the first time they encountered this issue with this known ATCC strain, which they routinely use in the laboratory as a QC strain.
An investigation was initiated to determine the root cause of the issue. As part of the investigation the analyst, instrument, culture purity, method/SOP, reagents were reviewed, respectively. A second analyst also performed an investigational test as part of the investigation. All the above-mentioned activities did not lead to a root cause or a probable root cause.
A more in-depth investigation was initiated in which the accuracy of the inoculum density was reviewed in detail. Upon review of the inoculum density it was determined that for GP card a density within a range of 0.5 to 0.63 McFarland is required per the new/updated manufacturer’s instruction manual. However, the SOP contained the information from the previous version of the manual, which had a different inoculum density requirement. The laboratory was unaware of the manual update and the new requirement by the manufacturer as the information was never effectively communicated or miscommunicated due to language barrier between the instrument company and the QC laboratory. Therefore, it is always important to perform a robust investigation for microbial data deviation.
There are multiple options when it comes to Microbial ID and there are several factors to consider assessing the accuracy of the ID. So, the most important question we need to ask is “What is Enough?” The answer to this question is, “it depends”. It depends on what we are monitoring, what results are we evaluating and what is our application? In other words, microbial identification should be adequate with the microbial contamination risk to the product. In some cases, just performing the elementary Gram Stain on the pure culture might be enough. General Guidance in the area of microbial identification can be found in USP Informational chapter <1113>.
- Griffin, M., Reber, D. (2012). Microbial Identification: the keys to a successful program. PDA. Davis Healthcare International Publishing. Illinois, USA.
- P.S. Bisen, P.S. (2012). Microbes in Practice. IK International, New Delhi, India.
- Garrity, G., Boone, G., Castenholz, D.R. (2001). Bergey’s Manual of Systematic Bacteriology. Volume 1. Springer. New York, USA.
- Garrity, G. (2016). A genomic driven taxonomy of Bacteria and Archaea: are we there, yet? Journal of Clinical Microbiology, 54, 1956-1963. doi:10.1128/JCM.00200-16
- Jain, C., Rodriguez-R, L.M., Phillippy, A.M., Konstantinidis, K.T., Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications, 9, 5114. doi: 10.1038/s41467-018-07641-9
- Sandle, T. (2017) Microbial identification strategy for pharmaceutical microbiology. Journal of GXP Compliance, 21, 4.
- Potter, R.F., D’Souza, A.W., Wallace, M.A., Shupe, A., Patel, S., Gul, D., Kwon, J.H., Beatty, W., Andleeb, S., Burnham, C.D., Dantas, G. (2018). Superficieibacter electus gen. nov., sp. Nov., an extended-spectrum β-Lactamase possessing member of the Enterobacteriaceae family, isolated from intensive care unit surfaces. Frontiers in Microbiology, 9, 1629.
- USP<1117>Microbiology Best Laboratory Practices, USP 41-NF 36, 7325.
- Janda, J.M., Abbott, S.L. (2007). 16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls. Journal of Clinical Microbiology, 45, 2761-2764.
- Saha, R., Wheeler, S., Bestervelt, L., Donofrio, R., Saha, N., Farrance, C., Verghese, B., Hong, S. (2016). Microbial hotspots and diversity on common household surfaces. Charles River, Technical Notes (https://www.criver.com/sites/default/files/resources/MicrobialHotspotsan...)
- Singhal, N., Kumar, M., Kanaujia, P.K., Virdi, J.S. (2015). MALDI-TOF mass spectrometry: an emerging technology for microbial identification and diagnosis. Frontiers in Microbiology, 6, 791.
- Seiffert, SN., Wuethrich, D., Gerth, Y., Egli, A., Kohler, P., Nolte, O. (2019). First clinical case of KPC-producing Klebsiella michiganensis in Europe. New Microbes and New Infections. https://doi.org/10.1016/j.nmni.2019.100516
- Rahi, P., Prakash, O., Shouche, Y.S. (2016). Matrix-assisted laser desorption/ionization time-of-flight mass-spectrometry (MALDI-TOF MS) based microbial identifications: challenges and scope for microbial ecologists. Frontiers in Microbiology, 7, 1359. doi.org/10.3389/fmicb.2016.01359
- Vandamme, P., Pot, B., Gillis, M., DE Vos, P., Kersters, K., Swings, J. (1996). Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiology Review, 6, 407-438.
- Saha, R., Farrance, C.E., Verghese, B., Hong, S., Donofrio, R.S. (2013). Klebsiella michiganensis sp. nov., a new bacterium isolated from a tooth brush holder. Current Microbiology, 66, 72-78.
- USP<1113>Microbial Characterization, Identification and Strain Typing, USP41-NF36.