Journal Club

Journal Club: Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation

Amid the holiday rush last month, I was gratified to see a publication describing a new computational method that had been on my mind. 

The paper is "Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation", published in Nature Biotechnology on December 11, 2017. 

The Concept

Bacteria do something funny to their DNA that you may not have heard of. It's called covalent DNA modification, and it often consists of adding a small methyl group to the DNA molecule itself. This added methyl group (or hydroxyl, or hydroxymethyl, etc.) doesn't interfere with the normal operation of DNA (making RNA, and more DNA). Instead it serves as a marker that differentiates self-DNA from invading DNA (such as from a phage, plasmid, etc.).

For example, one species may methylate the motif ACCGAG (at the bolded base), while another might methylate CTGCAG. If the first species encounters a ACCGAG motif that lacks the appropriate methyl, it treats it as invading DNA and chews it up.

The ongoing arms race of mobile DNA elements has helped maintain a diversity of DNA modifications, such that many different species of bacteria have a unique profile. 

The interesting trick here is that we now have a way to read out the methylation patterns in DNA in addition to the sequence. This is most notably available via PacBio sequencing, which typically will generate a smaller number of much longer sequence fragments than other genome sequencing methods. Bringing it all together, the authors of this paper were able to use the methylation patterns from PacBio sequencing to much more accurately reconstruct the microbes present in a set of environmental samples (such as the human microbiome). 

The Data

The approach of the authors was to first assemble the PacBio sequences, and then bin those larger genome fragments together based on a common epigenetic signature. 

Figure 2c shows the binning of genome fragments in a sample containing a known mixture of bacteria.

Figure 2c shows the binning of genome fragments in a sample containing a known mixture of bacteria.

Figure 2e shows the binning of genome fragments in a gut microbiome sample containing a mixture of unknown bacteria (and viruses, fungi, etc.).

Figure 2e shows the binning of genome fragments in a gut microbiome sample containing a mixture of unknown bacteria (and viruses, fungi, etc.).

In general, this seems like a very interesting approach. After my read of the paper, it appears that the bacteria present in the microbiome contain a distinct enough set of epigenetic patterns to enable the deconvolution of many different species and strains. I look forward to seeing how this method stacks up against other binning approaches in the future. 

Final note

For those of you interested in phage and mobile genetic elements, I wanted to point out that the authors also explore a topic that has been studied somewhat by others – the linkage of phage to their hosts via epigenetic signatures. The idea here is that it can be computationally difficult to match a phage genome or plasmid with its host. One experimental method that accomplishes this is Hi-C, which takes advantage of the physical proximity of the host genome within an intact cell. In contrast, the PacBio method does not require intact cells, and can be used to link phage or plasmids with their host based on a shared epigenetic signature. 

My hope is that this type of data starts to become more widely available. There are clearly a number of computational tools that need to be refined in order to get full use of all this information, but it does seem to hold a good deal of promise. 

Journal Club: A role for bacterial urease in gut dysbiosis and Crohn’s disease

It's nice when you see a new paper describing the type of science that you find the most exciting and relevant, and doubly so when it's written by a group of people that you know and respect. Full disclosure: this paper is from the group that I did my graduate work with at the University of Pennsylvania, and there's quite a lot to talk about. 

Ni, J. et al. (2017) ‘A role for bacterial urease in gut dysbiosis and Crohn’s disease’, Science Translational Medicine, 9(416), p. eaah6888. doi: 10.1126/scitranslmed.aah6888.

Building a model for microbiome research

Before talking about what the authors did, I want to describe how they did it. They started (in a previous study) with a cohort of patients being treated for a common disease with no known cure (Crohn's) and looked for differences in the genomic content of their microbiome compared to healthy control subjects. That initial analysis indicated that one particular metabolic pathway was enriched in patients with the disease. In this study they followed up on that finding by analyzing the metabolites produced by those microbes. The combination of those two orthogonal analyses (genomic DNA and fecal metabolites) provided some suggestion that microbes in the gut were producing one particular chemical which may have a role in disease. To test that hypothesis they moved into a mouse model where they could test that hypothesis with controlled experiments, not only adding and removing microbes but also tracking the passage of metabolites via isotopic labelling. That model system provided crucial data that supported the findings from the human clinical samples – that a specific bacterial enzyme may have a role in human disease. After this study, I can only imagine that the next step would be to move back into humans and see if that target can be used to generate a useful therapeutic. 

The microbiome field is relatively new, and I believe that its first batch of groundbreaking therapeutics is just over the horizon. I wouldn't be surprised if the most influential studies in this field follow a trajectory that is similar to what I describe above: generating hypotheses in humans, testing them in animal models, and moving back to human to test and deliver therapeutics. 

Ok, enough predictions, let's get into the paper.

The Microbiome – Who's Who and What's What

Fig. 2. Associations between bacterial taxa abundance ascertained by fecal shotgun metagenomic sequencing and the fecal metabolome in healthy pediatric subjects and those with Crohn’s disease.

Fig. 2. Associations between bacterial taxa abundance ascertained by fecal shotgun metagenomic sequencing and the fecal metabolome in healthy pediatric subjects and those with Crohn’s disease.

The figure above shows the association between Who's in the microbiome (bacterial genera, vertical axis) and What's (metabolites, horizontal axis) in the microbiome. The point here is that certain microbes are associated with certain metabolites, with a big focus on the amino acids that are being produced. The prior set of experiments suggested that the microbes in Crohn's patients had an increased capacity for producing amino acids, while this figure (and Figure 1) goes further to show that there is a subset of microbes associated with higher actual levels of amino acids in those subjects.

Tracking metabolism in the microbiome

Here's a deceptively simple figure describing a powerful finding. 

Fig. 3. In vivo heavy isotope assays using 15N-labeled urea to determine the effect of bacterial urease on nitrogen flux in the murine gut microbiota.

Fig. 3. In vivo heavy isotope assays using 15N-labeled urea to determine the effect of bacterial urease on nitrogen flux in the murine gut microbiota.

 

This experiment uses radiolabelled urea to measure the production of lysine by microbes in the gut. The central question here is what is producing the extra amino acids in Crohn's disease? In this experiment the authors added radiolabelled urea so that they could track how much lysine was being produced from that urea. Crucially, they found that adding either antibiotics or a defined set of microbes (called "ASF") reduced the amount of lysine that was produced from that urea, which supports the hypothesis that microbes in the gut are directly metabolizing urea. 

Tying it all together

Fig. 6. Effect of E. coli urease on colitis in a T cell adoptive transfer mouse model of colitis

Fig. 6. Effect of E. coli urease on colitis in a T cell adoptive transfer mouse model of colitis

I've skipped over a lot of interesting control experiments so that I could get to the grand finale. Everything I've told you up until now has established that (a) Crohn's patients have higher levels of certain amino acids in their stool, (b) those high amino acid levels are associated with particular bacteria, and (c) bacteria in the gut are able to produce amino acids from urea. To bring it all together the authors went to a mouse model of colitis to see whether adding or removing a single gene would have an effect on disease. They found that E. coli with urease (an enzyme that metabolizes urea) caused significantly more disease than E. coli without urease. This brings it all back to the action of a single gene on a model of human disease, which is the classic goal of reductionist molecular biology, but the gene of interest is encoded by the human microbiome. 

I think that's pretty cool. 

From here it's easy to imagine that people might be interested in designing a drug that targets microbial urease in order to reduce human disease, although that's got to be a pretty difficult task and I have no idea how diverse bacterial urease enzymes are. 

Bringing it back to my first point, this seems like the best case example of how to advance our understanding of the human microbiome with the goal of treating human disease. I hope to see many more like it in the years to come.

Journal Club: Strains, functions and dynamics in the expanded Human Microbiome Project

I must have been really busy these last few weeks to have gone so long without posting about this paper. For anyone interested in the microbiome this is a hugely important paper to read (link). If you want a more comprehensive summary, you can find a few in the popular press

At the risk of rambling on and on, I want to talk about a few things from this paper that really caught my eye. Let's start with Figure 1a.

Figure 1a: Personalization, niche association, and reference genome coverage in strain-level metagenomic profiles. a, Mean phylogenetic divergences17 between strains of species with sufficient coverage at each targeted body site (minimum 2 stra…

Figure 1a: Personalization, niche association, and reference genome coverage in strain-level metagenomic profiles. a, Mean phylogenetic divergences17 between strains of species with sufficient coverage at each targeted body site (minimum 2 strain pairs)

The statistic being plotted is the "mean distance of strains," meaning that they computed the exact genome sequence of each of the strains of all of the dominant organisms in every sample(!) and then calculated how different the strains within each species were between different samples. That process is (in my opinion) a difficult task to pull off well, and I find myself in the position once again of being very much in awe of the Huttenhower group for their skill and hard work. Ok, now how about the biology? This figure tells us that people harbor strains of microbes in their microbiome that are distinct from other people's strains, and that those strains stick around over time to some degree. Not only that, but the degree to which those strains stick around varies by body site, with the stool (and gut) likely having the most persistent set of strains. 

Ok, so now we've covered the fact that people have different strains of the same microbial species in their microbiomes, so let's go into a bit more depth with more of figure 1.

Figure 1, continued. b, Individuals tended to retain personalized strains, as visualized by a principal coordinates analysis (PCoA) plot for Actinomyces sp. oral taxon 448, in which lines connect samples from the same individual. d, PCoA s…

Figure 1, continued. b, Individuals tended to retain personalized strains, as visualized by a principal coordinates analysis (PCoA) plot for Actinomyces sp. oral taxon 448, in which lines connect samples from the same individual. d, PCoA showing niche association of Haemophilus parainfluenzae, showing subspecies specialization to three different body sites. e, PCoA for Eubacterium siraeum. 

Let's break this out:

  • Figure 1b: The exact strain of (one of the) bacteria in your dental plaque sticks around from day to day, even though people are (presumably) brushing their teeth!
  • Figure 1d: A bacterial species that is found all over the body, H. parainfluenzae, is genetically distinct (to some degree) by body site. Most intriguingly, it is only partially distinct by body site, which raises all kinds of questions about its evolutionary history.
  • Figure 1e: An organism that we call a single species (E. siraeum) seems to form three completely distinct genetic groupings. Note that the horizontal axis accounts for ~50% of the total genetic variation. That's huge. Is it a single species? Is it three? What is a "species"? Does it matter?

Reflections:

I don't want to try to sum up this paper with a single take-home message. I think this is a paper to read and reread and think about. However there are a few aspects of the methods used that I want to point out for those who may not think about this type of analysis very often. The first is that the authors defined a single strain for each sample (using StrainPhlAn). Do we think that there is only one strain of each species present at a single time? How could we test that hypothesis or even deal with a sample containing multiple, closely related strains? The next is that their most in-depth characterization of strain differences hinged on comparing the samples to known reference genomes. What about the variation that has never been captured in a reference genome? How would we even approach that data?

Lastly I'll say that I really think this work is important because I think that strain level variability in the microbiome is a crucial factor in human health and disease. This paper provides strong evidence that strain level variation is extensive, and the authors have provided powerful tools for characterizing that variation. The next question is, how do we apply this type of data in a way that uncovers the biological mechanisms underlying human health? In other words, how do we use microbiome profiling to generate some experimentally testable hypotheses? It seems so clear to me that this avenue of research is going to uncover important biological mechanisms of the human microbiome, but I can also see that we're going to need some creative, collaborative problem solving in order to realize this system's full potential. 

 

UPDATE:

On October 30, 2017, the Pollard Lab posted a preprint in which they specifically tackle the challenge of analyzing multiple strains per species in a given sample. You can read the preprint here. There's a lot of detail there that really deserves its own blog post, so stay tuned.

Journal Club: Potential role of intratumor bacteria in mediating tumor resistance to the chemotherapeutic drug gemcitabine

The biggest reason that I was struck by this paper was not the specific finding that they made (as important as that may be), but rather the general theme that it supports: that microbes impact human health via the specific genes that are encoded by individual strains. 

The (very) short summary I would give for this paper is that the presence of specific bacterial strains decreases the effectiveness of an anticancer drug (gemcitabine) because those bacterial strains encode an enzyme that chemically modifies and deactivates that drug. The authors did a lot of careful and intricate work demonstrating that these bacteria are present within tumors, and that the effect can be linked to one particular enzyme. The figure below presents some microscopy used to detect bacteria in human tumor samples.

Fig. 4. Characterization of bacteria in human pancreatic ductal adenocarcinomas.

Fig. 4. Characterization of bacteria in human pancreatic ductal adenocarcinomas.

After acknowledging the hard work done by the authors, and the significant contribution that this paper will undoubtedly have on the field, I'd like to point out a larger theme. The authors found that a particular enzyme was responsible for this phenotype, which was the decreased efficacy of an anticancer drug. However they absolutely did not find that this enzyme was encoded by a particular genus or species of bacteria. Instead, the enzyme was found to be present and active sporadically across the entire range of Proteobacteria, a phylum found commonly across the human microbiome. In other words, the name that we give to bacteria (their taxonomic identity) is not as important as the particular set of genes in their genome

The world of microbiome research is more exciting than ever because of findings just like this one. We know that microbes are important for our health, and studies like this are starting to show us exactly why that is.