When word of the gut microbiome first emerged, scientists focused on profiling the bacterial species that inhabited the gut. These efforts helped us unearth an extraordinarily diverse group of bacteria, down to the species level in not just the gut, but organs across the body as well. Since then, we have uncovered even more genetic diversity at the strain level (read more about strains here) within and between the humans where they live. Along the way, we also developed ways to characterize the genes being expressed, the proteins being produced, and even the metabolites being circulated in cells.
The mountains of data generated in microbiome research have been instrumental for developing new ways to treat disease and enhance human health. But with so much data available, researchers also need tools and techniques that integrate the data to glean more useful insights.
That’s where machine learning comes in.
Even before the AI craze, researchers have been using AI-based tools to screen for novel therapeutics to combat disease and improve health. Machine learning can identify a host of disease-associated features, such as microbes producing disease-associated metabolites in a patient. It can also help scientists distinguish between people’s microbiomes and delineate physiological processes by the microbes that dwell in them.
But how can we develop these models? What kinds of data do we need to execute such a study properly? And how can they lead to products that help humanity? That’s where Noah Zimmerman, Chief Technology Offer of Verb Biotics comes in. I had the pleasure to chat with him after his talk on this very topic. So read on to learn more about the ways that Verb Biotics is processing big data to learn big things about the gut microbiome.
PN: What prompted you to go beyond genomics to profile the human microbiota? How does doing so help us get a more complete picture of how our bodies work?
NZ: Let me begin by talking a bit about the many components that help cells run. For some context, cells comprise biomolecules that work together to help them function as they should. Scientists can profile these molecules in cells through what we call the ‘omics. There are many of these, but for our discussion, we will delve into four of these. I also associate each of these with a single word to help visualize how the cell works in full at the biochemical level. With that, here are the four ‘omics:
- Genomics — Potential: DNA is the basic building block of genes. It contains the information that cells need to be who they are. However, they alone cannot express what the cells will be at a point in time. That’s why I believe that genomics profiling gives us a cell’s potential.
- Transcriptomics — Likelihood: The first step in the central dogma of genetics is transcription. Here, our cells transcribe the DNA into messenger RNA (mRNA). Most often, the mRNA comprises the information of the proteins and other regulatory elements that the cells will produce with its machinery. However, mRNA processing steps can affect what kinds of transcripts are produced and the chances they will be used. That’s why I believe that transcriptomics represents the likelihood of different cellular phenotypes.
- Proteomics — Tools: A cell’s proteins comprise the machinery that helps them operate as they should. These proteins have many functions, from regulating physiological processes to keeping the cells intact and protecting them from stress. That’s why I consider proteomics to study the tools that a cell uses to live.
- Metabolomics — Function: The metabolites are the manifestations of cellular function. They are the final downstream product of any perturbation that occurs within a cell. Metabolites can be by-products of a chemical reaction or be essential building blocks for cells. Because they indicate a cell’s behaviour, I consider metabolomics the means to study function.
Researchers may be tempted to just skip the genomics and transcriptomics considering characterizing activity directly with proteomics and metabolomics. Nonetheless, metabolites can come and go in mere milliseconds. Furthermore, next-generation sequencing can be done affordably to obtain useful data on what cells can express.
Most importantly, our cells integrate all four components to survive and grow with each other. It’s this and the many possibilities to integrate big data and interrogate cellular activity that piqued my interest in going beyond genomics to appreciate how our bodies work.
PN: I’m certain that integrating data from the four omics will not be easy. How do you analyze so much data all at once?
NZ: As a preface, I’m a user of the data and the approaches used to analyze them. Flux balance analysis (FBA) is one such approach we employ. If you recall the central dogma of genetics, a protein is typically encoded by a DNA sequence which is transcribed into an RNA sequence. Thus, we can link a protein sequence and the metabolites they process with specific genes and transcripts. Then in FBA, we use machine learning tools to network the complete array of metabolic reactions in an organism and map the genes encoding each enzyme mediating the reactions. These efforts began with single microorganisms but have since expanded to profiling the metabolic activities of diverse microbiotas.
With machine learning efforts, we also run multiple iterations to refine the predictions. Sometimes, we get fluxes that make sense and others that don’t. For the ones that don’t, we can go back to retrain and reassess the metabolic pathways that our models say are taking place. To best know whether our models are robust, we do follow-up experiments in the lab to confirm that those metabolic processes are occurring.
PN: Conducting a FBAs seems to take a lot of resources and effort to do. What would one need to best leverage FBAs and other machine learning tools in the lab?
NZ: I would say that having complete data is the most important thing any researcher will need to work with FBAs. Most importantly, you need complete genomes of any microorganism you anticipate will be present in the environment you’re sampling. I’m not just talking species level identification either. Microbes are incredibly diverse within a species, comprising many strains with unique genomic content. That’s why researchers must use complete genome databases that document microorganisms to the strain level.
Once you have a complete genome database, you can then hire the personnel you need to develop FBAs and other machine-learning models to profile potential microbial interactions and metabolic pathways. Amid the pipeline, one would also need to collect longitudinal data. This way, scientists can better understand how cells and organisms interact with each other over time.
PN: You have such a complex workflow, but I can see it being needed to analyze so much data all at once. Have your efforts helped generate a candidate biotic for enhancing well-being? Tell me more about the fruits of your efforts!
NZ: I’m happy to say that we’re in the process of launching our products now. Among these products are strains that may improve sleep quality and reduce feelings of anxiety and stress. They do so by producing a neurotransmitter, a class of molecules that carries chemical messages along our nervous systems. In doing so, these strains operate through our gut-brain axis, a hot topic of discussion since it links the gut microbiome with our mental health. Most impressively, the strains we identified produce about three times the amount of neurotransmitter as the leading strains in the market. As such, we are seeking to further develop these strains through clinical trials. That way, we can confirm that the strains work as they should in humans before we launch the product.
Beyond the strains we’re working on, I’m certain that machine learning tools such as FBAs can help us refine the groups of people to whom biotics will be most useful. For example, we could tailor a biotic to people who adopt specific diets, such as vegetarians or people eating a typical Western diet. We can see whether some cohorts or populations respond better to treatments than others before we send the product through a clinical trial.
PN: Congratulations on the progress you’ve made at Verb Biotics with machine-learning models! What were you hoping your listeners took away after your talk at the summit?
NZ: The biggest takeaway I want my listeners to take is that now is the time we can actually build functionally important biotics for consumers. Before, we could only have dreamed of generating and actually using so much bioinformatics data to comprehensively profile the human microbiome. Now that we have the data, it’s a matter of integrating all the dry lab and wet lab data to extract useful information and refine the biotics that we develop to produce health benefits.