What's Big about Small Proteins?

Something interesting has been happening in the world of microbiome research, and it’s all about small proteins.

What’s New?

There was a paper in my weekly roundup of microbiome publications which caught my eye:

Petruschke, H., Schori, C., Canzler, S. et al. Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome. Microbiome 9, 55 (2021).

Reading through the abstract, the authors have “a particular focus on the discovery of novel small proteins with less than 100 amino acids.” While this may seem to be a relatively innocuous statement, I was very interested to see what they found because of some recent innovations in the computational approaches used to study the microbiome.

What’s the Context?

When people study the microbiome, they often only have access to the genome sequences of the bacteria which are present. This is very much the case for the type of metagenomic analysis which I focus on, as with any approach which takes advantage of the massive amounts of data which can be generated with genome sequencing instruments.

When analyzing bacterial genomes, we are able to predict what genes are contained in each genome using annotation tools designed for this purpose. The most commonly used tool for this task is Prokka, made by Torsten Seemann. Recently, researchers have started to realize that there are some bacterial proteins which were being missed by these types of approaches, since the experimental data used to build the predictive models did not include a whole collection of small proteins.

Then, in 2019 Dr. Ami Bhatt’s group at Stanford published a high-profile paper making the case that microbiome analyses were systematically omitting small bacterial proteins:

Sberro H, Fremin BJ, Zlitni S, Edfors F, Greenfield N, Snyder MP, Pavlopoulos GA, Kyrpides NC, Bhatt AS. Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes. Cell. 2019 Aug 22;178(5):1245-1259.e14. doi: 10.1016/j.cell.2019.07.016. Epub 2019 Aug 8. PMID: 31402174; PMCID: PMC6764417.

Around the same time, other groups were publishing studies which used other experimental approaches which supported the idea that bacteria encoded these small genes, which were also being transcribed and translated as bona fide proteins (a few quick examples).

What’s the Point?

The reason I think this story is worth mentioning is because it shines light on part the foundation of microbiome research. When we conduct a microbiome experiment, we can only make a limited number of measurements. We then do the best job we can to infer the biological features which are relevant to our experimental question. Part of the revolution of microbiome research from the last ten years has been the explosion of metagenomic data which is now available. This research is particularly interesting because it shows us how our analysis of that data may have been missing an entire class of genetic elements — genes which encode proteins less than 100 amino acids in length.

At the end of the day, the message is a positive one: with improved experimental techniques we can now generate more useful and accurate data from existing datasets. I am looking forward to seeing what we are able to find as the field continues to explore this new area of the microbiome!