diff --git a/kb/communities/Cheese_Rind_InSitu_InVitro_Model_Community.yaml b/kb/communities/Cheese_Rind_InSitu_InVitro_Model_Community.yaml index 401a1f41..2baab3ea 100644 --- a/kb/communities/Cheese_Rind_InSitu_InVitro_Model_Community.yaml +++ b/kb/communities/Cheese_Rind_InSitu_InVitro_Model_Community.yaml @@ -206,4 +206,79 @@ associated_datasets: evidence_source: IN_VIVO snippet: Sequencing of 137 different rind communities across 10 countries explanation: Supports the dataset scope and sampling scale. +related_ingredients: +- preferred_term: valine + chebi_term: + id: CHEBI:27266 + label: valine + relevance: > + Branched-chain amino acid degraded by the rind community, contributing to + volatile (sweaty) aroma compounds. + evidence: + - reference: PMID:25036636 + supports: SUPPORT + evidence_source: COMPUTATIONAL + snippet: Pathways for valine, leucine, and isoleucine degradation + explanation: Names valine among the branched-chain amino acids degraded by the community. +- preferred_term: leucine + chebi_term: + id: CHEBI:25017 + label: leucine + relevance: > + Branched-chain amino acid degraded by the rind community, contributing to aroma. + evidence: + - reference: PMID:25036636 + supports: SUPPORT + evidence_source: COMPUTATIONAL + snippet: Pathways for valine, leucine, and isoleucine degradation + explanation: Names leucine among the branched-chain amino acids degraded by the community. +- preferred_term: isoleucine + chebi_term: + id: CHEBI:24898 + label: isoleucine + relevance: > + Branched-chain amino acid degraded by the rind community, contributing to aroma. + evidence: + - reference: PMID:25036636 + supports: SUPPORT + evidence_source: COMPUTATIONAL + snippet: Pathways for valine, leucine, and isoleucine degradation + explanation: Names isoleucine among the branched-chain amino acids degraded by the community. +- preferred_term: cysteine + chebi_term: + id: CHEBI:15356 + label: cysteine + relevance: > + Sulfur-containing amino acid whose metabolism by the community yields pungent + volatile sulfur compounds characteristic of cheese rinds. + evidence: + - reference: PMID:25036636 + supports: SUPPORT + evidence_source: COMPUTATIONAL + snippet: Cysteine and methionine metabolism + explanation: Names cysteine among the sulfur amino acids metabolized for aroma. +- preferred_term: methionine + chebi_term: + id: CHEBI:16811 + label: methionine + relevance: > + Sulfur-containing amino acid metabolized by the community to volatile sulfur compounds. + evidence: + - reference: PMID:25036636 + supports: SUPPORT + evidence_source: COMPUTATIONAL + snippet: Cysteine and methionine metabolism + explanation: Names methionine among the sulfur amino acids metabolized for aroma. +- preferred_term: methanethiol + chebi_term: + id: CHEBI:16007 + label: methanethiol + relevance: > + Volatile sulfur compound enriched in these rind communities, a key cheese aroma product. + evidence: + - reference: PMID:25036636 + supports: SUPPORT + evidence_source: COMPUTATIONAL + snippet: methanethiol, are enriched in these communities + explanation: Anchors methanethiol as a volatile sulfur product of the rind community. metal_relevance: NOT_APPLICABLE diff --git a/kb/communities/hCom2_Complex_Gut_Microbiome.yaml b/kb/communities/hCom2_Complex_Gut_Microbiome.yaml index 6e767cb2..f933350b 100644 --- a/kb/communities/hCom2_Complex_Gut_Microbiome.yaml +++ b/kb/communities/hCom2_Complex_Gut_Microbiome.yaml @@ -57,3 +57,127 @@ environmental_factors: colonization resistance. evidence: - *id001 +related_ingredients: +- preferred_term: arginine + chebi_term: + id: CHEBI:29016 + label: arginine + relevance: > + Arginine is fermented by hCom2 members via the arginine deiminase pathway, + a defined amino-acid-utilization route in the community. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: conversion of arginine to ornithine plus CO2 and two equivalents of ammonium + explanation: Anchors arginine as the substrate of the community's arginine deiminase pathway. +- preferred_term: ornithine + chebi_term: + id: CHEBI:18257 + label: ornithine + relevance: > + Ornithine is the product of arginine catabolism via the arginine deiminase pathway in hCom2. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: conversion of arginine to ornithine plus CO2 and two equivalents of ammonium + explanation: Anchors ornithine as the product of arginine deiminase activity. +- preferred_term: carbon dioxide + chebi_term: + id: CHEBI:16526 + label: carbon dioxide + relevance: > + CO2 is released during arginine fermentation by the community. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: conversion of arginine to ornithine plus CO2 and two equivalents of ammonium + explanation: Anchors CO2 as a product of the arginine deiminase pathway. +- preferred_term: ammonium + chebi_term: + id: CHEBI:28938 + label: ammonium + relevance: > + Ammonium is generated during arginine fermentation by the community. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: conversion of arginine to ornithine plus CO2 and two equivalents of ammonium + explanation: Anchors ammonium as a product of the arginine deiminase pathway. +- preferred_term: methionine + chebi_term: + id: CHEBI:16811 + label: methionine + relevance: > + Methionine is among the amino acids whose removal affected growth of hCom2 strains, + indicating community amino-acid utilization. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: methionine, histidine, isoleucine, arginine, valine, and tyrosine removal + explanation: Names methionine among the amino acids hCom2 strains depend on. +- preferred_term: histidine + chebi_term: + id: CHEBI:27570 + label: histidine + relevance: > + Histidine is among the amino acids whose removal affected growth of hCom2 strains. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: methionine, histidine, isoleucine, arginine, valine, and tyrosine removal + explanation: Names histidine among the amino acids hCom2 strains depend on. +- preferred_term: isoleucine + chebi_term: + id: CHEBI:24898 + label: isoleucine + relevance: > + Isoleucine is among the amino acids whose removal affected growth of hCom2 strains. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: methionine, histidine, isoleucine, arginine, valine, and tyrosine removal + explanation: Names isoleucine among the amino acids hCom2 strains depend on. +- preferred_term: valine + chebi_term: + id: CHEBI:27266 + label: valine + relevance: > + Valine is among the amino acids whose removal affected growth of hCom2 strains. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: methionine, histidine, isoleucine, arginine, valine, and tyrosine removal + explanation: Names valine among the amino acids hCom2 strains depend on. +- preferred_term: tyrosine + chebi_term: + id: CHEBI:18186 + label: tyrosine + relevance: > + Tyrosine is among the amino acids whose removal affected growth of hCom2 strains. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VITRO + snippet: methionine, histidine, isoleucine, arginine, valine, and tyrosine removal + explanation: Names tyrosine among the amino acids hCom2 strains depend on. +- preferred_term: bile acid + chebi_term: + id: CHEBI:3098 + label: bile acid + relevance: > + Primary and secondary bile acids in the gut are transformed by the community; + fecal bile-acid levels were used to compare hCom2 to a native microbiome. + evidence: + - reference: PMID:36070752 + supports: SUPPORT + evidence_source: IN_VIVO + snippet: primary and secondary bile acid levels in feces + explanation: Anchors bile acids as community-transformed gut metabolites. diff --git a/references_cache/PMID_25036636.md b/references_cache/PMID_25036636.md index 00ca2c8e..c0f465cf 100644 --- a/references_cache/PMID_25036636.md +++ b/references_cache/PMID_25036636.md @@ -24,3 +24,8 @@ Key snippets used in curated records: - "62% final moisture of the medium compared to 83% in plates that were not dried" - "fresh cheeses have a relatively low pH before rind development occurs" - "onto cheese curd agar" + + +Full text (re-fetched 2026-06-17 via NCBI BioC PMC): + +Cheese rind communities provide tractable systems for in situ and in vitro studies of microbial diversity SUMMARY Tractable microbial communities are needed to bridge the gap between observations of patterns of microbial diversity and mechanisms that can explain these patterns. We developed cheese rinds as model microbial communities by characterizing in situ patterns of diversity and by developing an in vitro system for community reconstruction. Sequencing of 137 different rind communities across 10 countries revealed 24 widely distributed and culturable genera of bacteria and fungi as dominant community members. Reproducible community types formed independent of geographic location of production. Intensive temporal sampling demonstrated that assembly of these communities is highly reproducible. Patterns of community composition and succession observed in situ can be recapitulated in a simple in vitro system. Widespread positive and negative interactions were identified between bacterial and fungal community members. Cheese rind microbial communities represent an experimentally tractable system for defining mechanisms that influence microbial community assembly and function. INTRODUCTION While the importance of microbial communities for ecosystem function and human health is becoming increasing clear, the task of dissecting the formation and function of these communities remains extremely difficult. Microbial communities are often challenging to manipulate experimentally due to high species diversity, low culturability, and an inability to easily simulate their natural environment. As a result, the mechanisms that underlie the assembly of microbial communities remain elusive. Thus, in addition to advances in the direct study of complex microbial communities in situ, identification and characterization of experimentally tractable model ecosystems could facilitate work towards a mechanistic understanding of community formation in much the same way that the study of model organisms such as Escherichia coli and Saccharomyces cerevisiae has allowed mechanistic insight into molecular and cellular biology. One set of potential model ecosystems are the multi-species microbial communities that form during the production of fermented foods. Foods such as beer, wine, bread, pickled vegetables, chocolate, and cheese all involve the reproducible metabolism of substrates by microbial communities (as reviewed in). These communities often form under controlled conditions in discrete units, which allow for the measurement and manipulation of migration into the community, environmental conditions, and growth substrates. Many replicate communities are produced and are easily sampled at various stages, which can allow study of temporal dynamics of community formation. Finally, since these communities are reproducibly cultivated on a known substrate, conditions for isolating community members and recreating community formation in the lab can be designed to closely resemble conditions in situ. In the production of traditionally-aged cheeses, a biofilm, commonly known as a rind, forms on the surface of the cheese as it ages (Figures 1 and S1). Previous work has provided a preliminary view of the microbial diversity of rinds from a few artisan cheeses; these rinds are made up of a collection of bacterial and fungal species that come from raw milk, starter cultures added by the cheesemakers, the aging environment, and in some cases, unknown sources. Rind biofilms have similar properties to the multi-species biofilms that colonize the surfaces of diverse environments, and provide an opportunity to study the processes and mechanisms involved in multi-species biofilm formation. Because humans manipulate both stochastic (e.g. dispersal) and deterministic mechanisms (e.g. biotic and abiotic factors) of cheese rind microbial community assembly, we hypothesized that these communities could be developed for the study of microbial diversity in situ and experimental dissection of patterns of diversity in vitro. Here we present a large-scale in situ characterization and in vitro reconstruction of the microbial communities from cheese rinds. We use high-throughput sequencing of these multi-species communities to examine taxonomic diversity and functional potential, and to reveal temporal patterns of community assembly. We demonstrate that these communities are composed of phylogenetically diverse bacteria and fungi that can be easily cultured. Using a culture-based system that mimics the normal conditions of community formation, in vitro communities can be manipulated based on environmental changes predicted from in situ measurements, co-culture experiments reveal widespread bacterial-fungal interactions, and the temporal dynamics of community assembly can be reconstructed using a minimal set of species. Collectively, our work suggests that this system has the potential to bridge in situ and in vitro studies of microbial diversity to better understand the patterns and underlying mechanisms of microbial community assembly and function. RESULTS Rind type and moisture, not geography, correlate with microbial diversity of rind communities Because cheesemaking spans continents and encompasses a variety of cheese styles, widespread sampling of in situ patterns of rind microbial diversity could reveal major factors influencing community formation across geographic and environmental gradients. We used PCR-based amplicon sequencing to characterize the bacterial and fungal diversity of 137 different cheeses made in 10 different countries across Europe and the United States. For each cheese type, triplicate wheels were sampled (n=362), and data on sample origin (geography, animal), milk treatment (raw or pasteurized), pH, moisture, and salinity were recorded (Table S1). Across all communities sampled, only 14 bacterial and 10 fungal genera were found at greater than 1% average abundance (Figure 2, S2 and Table S2A). The number of these dominant genera (those >1% average abundance) per sample is on average 6.5 bacterial genera (range: 1–13) and 3.2 fungal genera (range: 1–7). Given the dominance of a limited number of genera, it might be expected that the majority of the community would originate from starter cultures (Figure S1). However, on average across all samples, we find that at least 60% of the bacteria and 25% of the fungi present are not starter cultures, and therefore originate from environmental sources (Table S2A). For most uninoculated microbial groups, their function in the context of the community or in the production of cheese is largely unexplored. For example, we identified two bacterial genera, Yaniella and Nocardiopsis, which have never been reported in food microbial ecosystems. We also find that halotolerant γ-Proteobacteria such as Vibrio, Halomonas, and Pseudoalteromonas that are typically associated with marine environments, are widespread in cheese communities (Figure 2). Previous studies identified these γ-Proteobacteria in individual cheeses, but our large-scale survey demonstrates that they occur in cheeses made in all of the geographic regions where we sampled. One possible source of these marine microbes is the sea salt used in cheese production, as marine γ-Proteobacteria have been detected both in brine tanks of cheese production facilities and in sea salt-producing areas in Korea. Many of the 24 dominant genera that we identified are widely distributed across the samples, but their abundance within each rind community is variable (Figure 2). This divergence in community composition is best explained by the rind type of the cheese (washed, bloomy, and natural; PERMANOVA pseudo-F=16.64, P<0.001) (Figure 3A), while country of origin, milk treatment, or milk source are only weakly associated with community divergence (Figure S3A). These three rind types are a result of three main approaches to aging cheese (Figure S1). Bloomy rind cheeses, such as Brie and Camembert, are heavily inoculated with fungi to create a dense rind that is usually white in appearance (Figure 1A). Natural rind cheeses (Figure 1B), such as clothbound cheddars, St. Nectaire, and Tomme de Savoie, are largely untouched during aging. Washed rind cheeses (Figure 1C), such as Taleggio, Gruyere, and Epoisses, are initially produced in a manner similar to bloomy or natural rind cheeses, but are then washed repeatedly during aging with a salt solution. The hybridization of styles in washed rind cheese aging may explain why the composition of these cheeses is interspersed throughout the bloomy and natural rind communities (Figure 3A). If the microbes that colonize rind communities are dispersal limited, diversity could be shaped in part by stochastic processes; cheeses made and aged in the same geographic regions would have more similar community composition than those aged further apart. However, across our entire dataset for Europe and the United States, community composition is not significantly correlated with geographic distance (Mantel r=0.04, P=0.07; Figure S3C). In fact, cheeses made in geographically distant parts of the world can have strikingly similar rind communities (Figure S3B), demonstrating that these microbial communities can assemble reproducibly regardless of the cheesemaking region. In contrast to a limited role for geography, environmental conditions do correlate with variation in community composition. During the process of aging cheeses, surface moisture, pH, and salinity are carefully controlled, and some of these variables are significantly different across the three rind types (Figure 3B). We find that moisture is the best predictor of rind community composition, with principal coordinate one (PC1) being significantly associated with the gradient in surface moisture measured across natural, washed and bloomy rind cheeses (r2=0.35, P<0.0001; Figure 3C). This association between moisture and composition is also supported by Mantel tests, where rind biofilm moisture (Mantel r=0.21, P<0.001), and to some extent pH (Mantel r=0.06, P=0.02), but not salinity (Mantel r=−0.02, P=0.57) are correlated with community composition. Dominant genera show contrasting responses to this gradient in surface moisture. The fungus Galactomyces and four genera of Proteobacteria, both found in high abundance on moist bloomy rinds (Figure 2), are positively correlated with moisture (Figure 3D) while several molds, Actinobacteria, and Staphylococcus, which are abundant on dry natural rinds (Figure 2), are negatively associated with moisture (Figure 3D). Abiotic conditions have a strong influence on rind community diversity, but interactions between microbes could also play a role. We used our independent bacterial and fungal amplicon datasets to quantify co-occurrence patterns across individual bacterial and fungal genera and found evidence for both strong positive and negative associations (Figure 3E). These could be explained by positive or negative interactions between species and/or shared environmental niches. Total community composition, as measured by Bray-Curtis dissimilarity, is correlated between our bacterial and fungal datasets (Mantel r=0.20, P<0.001). A measure of community richness, the total number of bacterial and fungal community operational taxonomic units (OTUs, or clusters of closely related sequences) is also correlated (r2=0.13, P<0.001; Figure 3F), suggesting that species interactions or environmental factors select for communities with similar compositions. Metagenomics reveals putative functions of uninoculated organisms in cheese rind microbial communities In addition to understanding how taxonomic diversity varies with rind type, we investigated functional potential amongst cheese rind communities. Shotgun metagenomic data revealed an uneven distribution of genes from fungi vs. bacteria across the three rind types, with lower abundance of fungi in washed rind cheeses (Figure S4A). As with taxonomic diversity, functional potential clusters by rind type and is correlated with moisture (Figure 4A). Unlike taxonomic diversity, functional potential is also correlated with pH. We identified metabolic pathways that were enriched in either washed, natural or bloomy rind cheeses (Figure 4B, Table S4A). We found several pathways associated with flavor production significantly enriched in washed rind cheeses. This group of cheeses is notorious for having particularly pungent aromas. Cysteine and methionine metabolism, known to contribute to the production of volatile sulfur compounds such as methanethiol, are enriched in these communities. Pathways for valine, leucine, and isoleucine degradation, which can contribute to sweaty and putrid aromas, are also enriched in washed rind cheeses (Table S4A). The widespread distribution and high abundance of marine-associated γ-Proteobacteria, enriched in both washed and bloomy rind cheeses (Table S2B), was an unexpected finding in our survey of taxonomic diversity. We used our shotgun metagenomic data to explore the functional potential of these microbes in cheese rind communities. The enzyme that converts methionine to methanethiol, methionine-gamma-lysase (MGL) [EC:4.4.1.11], is a key step in the production of sulfur compounds in cheese. However, to date, a cheese-associated MGL gene has only been identified in Brevibacterium linens. Our metagenomic sequencing uncovered novel mgl sequences with high sequence similarity to various γ-Proteobacteria in cheeses from both Europe and North America (Table S4B). Most of these novel mgl sequences belonged to Pseudoalteromonas spp. (Figure 4C, Figure S4B). We mapped metagenomic reads from 3 cheeses where Pseudoalteromonas was abundant to the reference genome of Pseudoalteromonas haloplanktis (99.5% pairwise identity, Table S4C). Additionally, Pseudoalteromonas is known to have many cold-adapted enzymes that function in the polar seawater where this bacterium typically grows. These enzymes could be advantageous in the cold environments where cheeses are aged and stored. From our metagenomic reads, we identified homologs of a previously characterized secreted cold-adapted lipase and protease (Table S4C), which could contribute to lipolysis and proteolysis and subsequent flavor formation in cheeses. Collectively, these metagenomic insights into the potential function of Pseudoalteromonas suggest that this and other uninoculated yet abundant microbes could play key roles in cheese rind microbial communities. Cheese rind communities are highly culturable One major limitation in the study of most microbial communities is the difficulty in culturing all abundant taxa and recreating ecologically relevant conditions for use in experimental systems. A number of cheese rind-associated genera have previously been cultured using standard lab media. We plated serial dilutions of the three rind types from representative cheeses. From these samples, we were able to culture at least one representative isolate from each of the 24 dominant genera observed by sequencing (Table S1). Reconstruction of communities in vitro confirms the importance of abiotic manipulations on community divergence Our amplicon survey of microbial diversity suggested that a common pool of cheese rind microbes exists across many geographic regions and that local manipulations of the abiotic environment by cheesemakers selects for specific microbial communities. Many studies have shown that the composition of microbial communities is strongly associated with environmental variables. However, experimentally demonstrating that abiotic factors lead to divergence in community structure is difficult, often because not all microbes in the community are culturable or because the community is prohibitively complex. Using the culture collection described above, we proceeded to test whether divergent communities could develop from a common pool of species using an in vitro system. We inoculated the surface of replicate in vitro cheeses with a community consisting of approximately 200 cells each of 6 bacterial species and 5 fungal species (Table S5). These represent the most abundant taxa present on each of the three rind community types. We manipulated the environment of these initially identical in vitro communities by applying four treatments: 1) a bloomy rind treatment where the fungus Galactomyces was added at 50 times higher initial inoculum to simulate the high fungal inoculum added to bloomy rind cheeses, 2) a washed rind treatment where the community was washed twice a week with a sterile 20% NaCl solution, 3) a natural rind treatment where the communities were subjected to a drier environment (62% final moisture of the medium compared to 83% in plates that were not dried), and 4) a control group in which no manipulations were carried out after initial inoculation. After a four-week incubation at 12°C, the washed and natural communities diverged in composition from the control (PERMANOVA pseudo-F=19.23, P<0.05; Figure 5A), while the bloomy treatment did not (P=0.20), suggesting that our abiotic manipulations had a greater impact on community composition than altering initial inputs of a dominant fungal component of the bloomy rind community (Galactomyces). Some patterns of microbial abundance observed in the final in vitro communities mirror patterns observed in our amplicon survey. Moisture was strongly associated with variation in community composition, with the dry environment of the natural rind treatment enriching for the yeast Debaryomyces, the bacterium Staphylococcus, and the mold Penicillium (Figure 5B, S5B, Table S5), which are all highly abundant in natural rind communities (Figure S2C). The decrease in moisture leads to an increase in salt concentration both in situ (Figure S3D) and in vitro (Figure S5A), and these bacteria and yeasts are all known to tolerate high salt and low moisture conditions. The fungus Galactomyces was only present in the higher moisture in vitro samples (Figure 5A and S5B), matching patterns observed on cheese rinds where this microbe was positively correlated with moisture (Figure 3D). Consistently lower fungal abundance was observed in situ in washed rind communities (Figure S4B), and a similar pattern of reduced fungal abundance was observed in our in vitro experiments (Figure 5C). Bacterial-fungal interactions are widespread between community members Our analysis of in situ sequencing data suggested widespread positive and negative associations between bacterial and fungal genera (Figure 3E). To experimentally measure the frequency, type (positive or negative), and strength of bacterial-fungal interactions, we co-cultured the most abundant fungi and bacteria in pairwise combinations in vitro. Many bacteria had strong growth responses, both positive and negative, to the presence of fungi (Figure 6A). Several bacteria grew poorly without the presence of a fungal partner (Corynebacterium, Halomonas, Pseudomonas, Pseudoalteromonas, Vibrio), demonstrating that the cheese environment may be unable to support growth of some bacteria, a deficiency rectified by the presence of several different genera of fungi. Since both our in vitro medium and fresh cheeses have a relatively low pH before rind development occurs, usually in the range of pH 5.0, we examined the role of pH in the growth of our isolates in vitro. A large body of previous work has measured the increase in the pH cheese as a result rind development, with a number of species, mainly yeasts, identified as playing a major role in deacidification. In agreement with this previous work, we observed that yeasts and filamentous fungi deacidify the cheese curd medium (Figure 6B). This deacidification could be a mechanism underlying the positive growth responses of many bacteria to the presence of a fungus. For example, those bacteria that had the strongest growth responses to fungi (Corynebacterium, Halomonas, Pseudomonas, Pseudoalteromonas, and Vibrio) also show strong positive growth responses on cheese curd agar where the pH had been adjusted to neutral (Figure 6C). The growth of several other bacterial species (Staphylococcus, Arthrobacter, Brevibacterium, Brachybacterium, Serratia) was inhibited by the presence of fungi. In contrast, most of the fungi did not demonstrate a measurable growth response to the presence of bacteria (Figure 6D). An exception is the response to the bacterium Arthrobacter, which inhibited the growth of all three filamentous fungi, but not the three yeast species, demonstrating the presence of specific bacterial-fungal interactions related to fungal taxonomy. Interestingly, a reddish-pink pigment was secreted into the cheese curd agar medium when this bacterium grew in the presence of filamentous fungi (Figure 6E, S6B, and S6C) but not yeast. Comparison of the results of bacterial-fungal interactions from our in vitro assays to the predictions based on correlations from in situ patterns reveals many discrepancies (Table S6). A number of possible reasons could explain these differences. For example, in the case of interactions that were predicted from in situ data but were not observed in vitro, co-occurrence may be explained by the environmental preferences of individual species, instead of direct interactions. Alternatively, pair-wise interaction assays may not recapitulate patterns observed in situ where other species present could contribute to or modulate interactions. Temporal dynamics of rind community assembly are reproducible and can be recapitulated in vitro Understanding the temporal dynamics of microbial community diversity is essential for dissecting community assembly. In our survey of rind microbial communities, we considered only one time point for each cheese (Figure 2). However, the rind biofilm changes visibly over time (Figure S7A and S7D), and previous studies have observed patterns of microbial succession on the surface of aging cheeses. We used amplicon sequencing to measure temporal patterns of bacterial and fungal diversity of one natural rind community from a cheese made and aged in Vermont. Intensive sampling of three batches of cheese over a 63-day aging period demonstrates that patterns of succession are highly reproducible (Figure 7A and 7B; Table S7). At the first time point, the community consisted primarily of Proteobacteria, the bacterium Leuconostoc, and the yeast Candida, which can be found at low levels in raw milk. While Candida persisted in the fungal portion of the community, the Proteobacteria were succeeded by Staphylococcus within the first seven days. As the rinds matured, bacterial taxa Brevibacterium and Brachybacterium and fungal taxa Penicillium and Scopulariopsis emerged consistently as a significant fraction of the community (on average, >1% in mature cheeses). Principal coordinate analysis shows a reproducible trajectory of all three communities over time (Figure 7B), with the most rapid changes in composition occurring at early timepoints, which is consistent with previous observations of primary succession. In order to work towards defining the mechanisms that govern succession in a rind community, we recapitulated succession using in vitro rind communities. We identified six core members of the in situ natural rind community (Staphylococcus, Brevibacterium, Brachybacterium, Candida, Penicillium, and Scopulariopsis), defined as taxa present in situ at an average abundance of >1% in at least 50% of the time points sampled (Table S7). We inoculated approximately equal numbers of each together onto cheese curd agar and followed membership of the in vitro rind communities over time. As we observed in situ, Staphylococcus and Candida dominated in vitro rind communities at early time points, and Brevibacterium, Brachybacterium, Penicillium, and Scopulariopsis grew to detectable levels at later time points (Figure 7C and 7D). Like succession in situ, succession in vitro was highly reproducible amongst replicates and exhibited rapid changes in community population size and diversity at early time points (Figure 7B, 7D, and S7E). However, we did observe a few notable differences between succession in an in situ and an in vitro natural rind community. Succession appears to proceed much more quickly in vitro, with late-blooming taxa Brevibacterium and Penicillium appearing at 6–8 days instead of 21 days as observed in situ. The early appearance of these taxa may be the result of a much higher initial ratio of late- to early-blooming community members than occurs in situ. Moreover, Penicillium and Brachybacterium grow to represent a much higher percentage of the community in vitro than they do in situ. These differences in succession could reflect differences in environment, or again, differences in the initial amounts of these species. The pH of the in vitro cheese rose to 7 by day 10, while in situ, pH equilibrated at ~6.5 (Figure S7B and S7E). DISCUSSION Our work presents cheese rind microbial communities as an experimentally tractable system for exploring fundamental questions about how microbial communities assemble and function. Rind communities are widespread and accessible, and our in situ work shows that reproducible communities of bacteria and fungi form in geographically distant parts of the world. Our in vitro experiments demonstrate that we can culture community members and then recreate and easily manipulate communities in the lab. This in situ – to – in vitro approach enabled us to observe major patterns of community composition, potential interactions, and patterns of community succession, and then experimentally reconstruct communities and begin to test the role of the abiotic environment and identify species interactions. The tractability of this system can be leveraged in future studies to dissect important unresolved questions in microbial ecology including the molecular mechanisms of species interactions within communities, factors that influence the stability of communities, the causes and consequences of evolution within microbial communities, and the role of stochastic and deterministic forces in community formation. Previous work on the microbial diversity of spontaneous food fermentations suggests that the tractable properties of this system likely exists across many fermented foods, which may provide additional microbial systems that can link in situ analysis of patterns of diversity to in vitro dissection of community structure and function. Comparison of the in situ observations (Figures 2, 3E and 7A) to in vitro experimental results (Figures 5, 6 and 7C) reveals several qualitative differences in the results of community composition, succession, and species interactions. A number of factors could explain these differences. For example, we assumed equal initial population sizes of all species for in vitro communities, but it is unlikely that all community members are present in equal numbers at the beginning of community formation in situ, as is clear from our sampling of a nascent rind community (Figure 7A). We generally chose a single strain to represent each genus observed in situ, and we excluded rare taxa from our communities, however some of these additional members or alternative strains may have important functional traits that can impact community composition. In terms of the abiotic environment, the surface to volume ratio is much higher in our in vitro system than on a wheel of cheese, which likely raises the effective concentration of metabolites excreted by community members and accelerates the rate at which nutrient sources are exhausted and pH is modified. Future work that focuses on the systematic manipulation of community membership, growth substrate, and the environment could reveal the impact of these factors on community formation. Due to differences in the type of data collected by the in situ (sequencing-based) and in vitro (culture-based) approaches, a direct, quantitative comparison of the data was not possible in this study, but could lead to a better understanding of the commonalities and differences between in situ and in vitro communities. Our culture collection and in vitro system provide an opportunity to examine the extent, nature, and mechanisms of species interactions within communities. Future studies to dissect the mechanisms of bacterial-fungal interactions could provide comparative insights into interactions that occur in less tractable systems where similar pairs of bacteria and fungi co-occur. For example, many of these microbial communities have substantial taxonomic similarity with the microbial communities that form on human skin surfaces, suggesting that mechanisms of community formation discovered in this cheese system could directly apply to other microbial biofilms. This work will be facilitated by genetic tools and other resources that exist for many of the dominant genera that we detected (e.g.). Cheese rinds are dependent on human intervention in order to form, and some of the microbes in cheese ecosystems have signatures of domestication. However, the domesticated nature of these communities does not detract from their potential to provide important insight into microbial community ecology. First, many of the microbes that co-occur within rind communities also co-occur in their source environment, such as the teat of a cow or a cave environment and have likely undergone evolutionary processes in the same biotic and abiotic context. The mixing of coevolved community members with novel species, such as the marine γ-Proteobacteria, can provide opportunities to explore evolutionary dynamics between novel partners within microbial communities. Just as the study of different varieties of organisms with domesticated genomes has pointed to key phenotype-genotype correlations in genome biology, the study of these semi-natural microbial communities can provide key links between variation across microbial communities and the underlying forces shaping this diversity. EXPERIMENTAL PROCEDURES Sample Collection For the survey of rind diversity, a total of 362 cheese rind samples were collected across 137 different types of cheeses by scraping the rind surface with a sterile razor blade. When possible, we sampled up to three wheels for each cheese to account for wheel-to-wheel variation. To observe natural rind succession, a total of 90 rind samples representing triplicate wheels from three batches at 10 timepoints were collected. Rind samples were stored at −20°C until they were analyzed. Microelectrodes were used to measure pH and salt, and percent moisture was measured by weighing a small sample of rind before and after drying for 48 hours at 60°C. Amplicon and shotgun metagenomic sequencing DNA was extracted from rind samples using a PowerSoil DNA extraction kit (MoBio, Carlsbad, CA). Bacterial amplicon libraries were prepared by amplifying the V4 region of 16S rRNA as previously described. The same approach was used for generating ITS amplicons of fungi except that the primers ITS1f and ITS2 were used to target the ITS1 region. Amplicon data were analyzed using QIIME and LEfSe. Shotgun libraries were prepared using an Apollo 324 system (IntegenX, Pleasanton, CA) with NEXTflex DNA barcodes (Bioo Scientific, Austin, TX). For each cheese, 5.2 million sequences were uploaded to MG-RAST for annotation. The KEGG database was used to generate annotations. LEfSe was used to identify pathways that were enriched in particular samples. In vitro rind community experiments For each in vitro experiment, rind isolates were pooled in PBS and inoculated onto the surface of 10% cheese curd agar poured into wells of 96-well microplates, which were then sealed with a sterile, breathable film, incubated at room temperature for 2 days, and then at 12°C for the remainder of the incubation period. To monitor growth, cheese curd agar plugs were removed from the 96-well plates, homogenized in PBS supplemented with 0.05% Tween 80, serially diluted, and plated onto appropriate media to determine the number of CFUs for each species. Plates were incubated at room temperature, and after colony and foci formation, CFUs of each strain were counted. To create divergent rind communities in vitro, approximately 200 cells each of 7 bacterial species and 4 fungal species that represented the most abundant taxa of each of the three major rind types were used. One fourth of the experimental communities were not manipulated (‘Control’). One fourth of the communities were inoculated with fifty-fold more Galactomyces to approximate the treatment of bloomy rind cheeses (‘Bloomy’). One fourth of the experimental communities were put into chambers with a desiccant to simulate the drier environment of natural rind cheeses (‘Dry’). The remaining fourth of the experimental communities were washed twice a week with a 20% NaCl solution (‘Washed’). To measure the impacts of co-culturing bacteria and fungi in pairwise combinations, approximately 500 CFUs of each strain was inoculated onto cheese curd agar to measure growth alone. To measure growth responses to a partner, we added 500 CFUs of an ‘interacting strain’ to wells with 500 CFUs of a ‘responder strain.’ To measure the impacts of pH on growth, each species was grown separately in the same in vitro conditions described above with standard cheese curd agar (pH5) and pH-adjusted cheese curd agar (pH7). To track in vitro succession, 200 CFUs each of representatives of the six most abundant bacterial and fungal genera observed during in situ natural rind succession were used. Statistical Analysis Standard statistical analyses were conducted in XLStat (v.2013.5.09), PAST (v.2.17c), and the Hmisc package in R (v.2.15.2). All data in figures or in text are presented as mean +/− one standard error of the mean unless otherwise indicated. Supplementary Material ACCESSION NUMBERS Reference sequences of strains have been deposited in GenBank, and amplicon and shotgun metagenomic data have been deposited in MG-RAST (see Table S2A for accession numbers). SUPPLEMENTAL INFORMATION Supplemental Information includes Extended Experimental Procedures, seven figures, and seven tables. AUTHOR CONTRIBUTIONS B.E.W., J.E.B., and R.J.D designed experiments, analyzed data, and wrote the manuscript. B.E.W., J.E.B., and R.J.D. cultured isolates used in all experiments and shown in Table S2A. B.E.W. and J.E.B. developed methods for in vitro community reconstruction and analysis. B.E.W. developed methods for analysis of fungal amplicon data, and performed experiments and/or contributed to data in Figures 1–7. J.E.B. performed experiments and/or contributed to data in Figures 5–7. M.S. performed experiments and/or contributed to data in Figures 2 and 7. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Tracking footprints of artificial selection in the dog genome Identification and functional analysis of the gene encoding methionine-γ-lyase in Brevibacterium linens Recent advances in cheese microbiology House microbiome drives microbial landscapes of artisan cheesemaking plants Microbial biogeography of wine grapes is conditioned by cultivar, vintage, and climate Biodiversity of the bacterial flora on the surface of a smear cheese QIIME allows analysis of high-throughput community sequencing data Multiple recent horizontal transfers of a large genomic region in cheese making fungi The human microbiome: at the interface of health and disease The cold-active Lip1 lipase from the Antarctic bacterium Pseudoalteromonas haloplanktis TAC125 is a member of a new bacterial lipolytic enzyme family PhP protease from Pseudoalteromonas haloplanktis TAC125: Gene cloning, recombinant production in E. coli and enzyme characterization Microbial diversity of cave ecosystems The microbial engines that drive Earth’s biogeochemical cycles Identification of microbiota present on the surface of Taleggio cheese using PCR–DGGE and RAPD–PCR From animalcules to an ecosystem: Application of ecological concepts to the human microbiome The diversity and biogeography of soil bacterial communities Topographic diversity of fungal and bacterial communities in human skin ITS primers with enhanced specificity for basidiomycetes-application to the identification of mycorrhizae and rusts Bacterial biofilms: from the natural environment to infectious diseases Beyond biogeographic patterns: processes shaping the microbial landscape Marine Pseudoalteromonas species are associated with higher organisms and produce biologically active extracellular agents Big questions, small worlds: microbial model systems in ecology KEGG: kyoto encyclopedia of genes and genomes New tools for the genetic manipulation of filamentous fungi Global patterns in bacterial diversity Scanning electron and light microscopic study of microbial succession on Bethlehem St Sequencing-based analysis of the bacterial and fungal composition of kefir grains and milks from multiple sources Xenobiotics shape the physiology and gene expression of the active human gut microbiome Coping with cold: the genome of the versatile marine Antarctica bacterium Pseudoalteromonas haloplanktis TAC125 Detailed analysis of the microbial population in Malaysian spontaneous cocoa pulp fermentations reveals a core and variable microbiota The metagenomics RAST server–a public resource for the automatic phylogenetic and functional analysis of metagenomes Development of host and vector for high-efficiency transformation and gene disruption in Debaryomyces hansenii The Arthrobacter arilaitensis Re117 genome sequence reveals its genetic adaptation to the surface of cheese Microbial interactions within a cheese microbial community Assessment of the microbial diversity at the surface of Livarot cheese using culture-dependent and independent approaches Distribution and identification of halophilic bacteria in solar salts produced during entire manufacturing process Patterns and processes of microbial community assembly Genes but not genomes reveal bacterial domestication of Lactococcus lactis An update on the molecular genetics toolbox for staphylococci High-throughput sequencing for detection of subpopulations of bacteria not previously associated with artisanal cheeses The complex microbiota of raw milk The uncultured microbial majority The genomic code: inferring Vibrionaceae niche specialization Complete genome sequence of Corynebacterium variabile DSM 44702 isolated from the surface of smear-ripened cheeses and insights into cheese ripening and flavor generation Metagenomic biomarker discovery and explanation A meta-analysis of changes in bacterial and archaeal communities with time Unraveling microbial interactions in food fermentations: from classical to genomics approaches Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping Cow teat skin, a potential source of diverse microbial populations for cheese production Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics Microbial communities form on the surfaces of naturally aged cheeses Cross-sections through naturally aged cheeses show rind biofilms growing on the surface of the cheese curd. (A) A bloomy rind biofilm, (B) a natural rind biofilm, and (C) a washed rind biofilm. See also Figure S1. Distribution of abundant genera across cheese rind communities Each column represents averaged data for multiple wheels of an individual cheese. Top panel shows bacterial (16S rDNA) data and bottom panel shows fungal (internal transcribed spacer or ITS) data. Columns show relative abundance of genera within each cheese. Communities were clustered using a UPGMA tree and asterisks indicate clusters that were supported with >70% jackknife support. Only those genera that had an average abundance of 1% or greater across all samples are indicated; genera less than 1% abundance are combined and shown in black. See also Figure S2, Table S1, and Table S2. Abiotic and biotic drivers of rind community composition (A) A combined dataset of 16S rDNA and ITS amplicons was processed in QIIME, and Bray-Curtis dissimilarity was used to generate a principal coordinate analysis of rind microbial communities. Each green, orange, or blue circle represents averaged community composition data for each natural, washed, or bloomy rind cheese sampled. Separation of rind communities is driven by genera that are specifically enriched in each of the three rind types (Table S2B). (B) Bloomy, natural and washed rind cheeses have different surface environments. Bars represent mean (+/−SEM). A double asterisk indicates significant differences (P<0.005) in an ANOVA. NS = not significant (P>0.05). (C) Plots of PC1 versus three environmental variables show that moisture is significantly correlated with rind type. (D) Taxonomic groups show different responses to gradients in moisture across cheese rinds. A plot of Pearson’s r depicts significant (P<0.05, with false discovery rate correction) negative and positive correlations between abundance of particular genera and % moisture. (E) Spearman rank correlations of OTUs highlight non-random associations between bacterial and fungal genera. Significant (P<0.05, adjusted for multiple comparisons using Holm’s method) positive and negative associations are indicated with a bold boundary. (F) Fungal and bacterial richness are positively correlated across cheese rind communities. Each dot represents mean fungal and bacterial richness per cheese. See also Figure S3, Table S1, and Table S3. Functional diversity of cheese rind microbial communities (A) Procrustes analysis shows similar clustering of samples (M2=0.391) using either taxonomic (amplicon) or functional (whole genome shotgun sequencing) data from 22 rind metagenomes. Plots of principal coordinate one versus environmental data (smaller ordination plots) reveal significant relationships between functional composition and both moisture (r2=0.29, P<0.01) and pH (r2=0.41, P<0.001). (B) Relative abundance of 56 KEGG pathways identified by LEfSe as significantly enriched in bloomy, natural, or washed rind cheeses. Plotted are those pathways that were >1% abundance across the entire dataset. Each column represents a pathway and plotted is the relative distribution of that pathway across the three different rind types. (C) Maximum likelihood phylogeny of amino acid sequences of methionine gamma-lyase (MGL), methionine-alpha-deamino-gamma-mercaptomethane-lyase (MGMML), cysteine gamma-lyase (CGL), cysteine beta-lyase (CBL), and cysteine gamma-synthase from prokaryotic and eukaryotic organisms. Colored dots indicate habitats where organisms are found. Node labels indicate bootstrap support; only those with >60% support are shown. Three novel MGL sequences with high similarity to the marine bacterium Pseudoalteromonas haloplanktis, were recovered from three cheese metagenomes (highlighted in grey box). See also Figure S4 and Table S4. Reconstruction of divergent rind communities in vitro (A) Principal coordinates analysis of replicate in vitro communities demonstrates that rind microbial communities diverge in composition when exposed to abiotic manipulations (PERMANOVA pseudo-F=19.23, P<0.0001). The bloomy treatment (addition of 50 times more Galactomyces CFUs to initial inoculum) did not significantly alter community composition compared to control communities (P=0.20). (B) Relative abundance of the fungal (top) and bacterial (bottom) taxa in the initial inoculum added to all treatments and at the time of harvest for control, bloomy, natural, and washed rind treatments. (C) CFUs of fungi (top) and bacteria (bottom) of the final communities. A double asterisk indicates significant differences (P<0.005) based on Tukey’s honestly significant difference test. Bars represent mean (+/−SEM). See also Figure S5 and Table S5. Bacterial-fungal interactions among cheese rind microbial species (A) Bacterial responses to co-culture with one of 6 fungal species. Red asterisks indicate a statistically significant (Dunnett’s test, P<0.05) decrease in abundance in the two-species co-culture treatment relative to growth alone (black bars). Green asterisks indicate statistically significant (P<0.05) increase in abundance in the two-species co-culture treatment relative to growth alone (black bars). Bars represent mean (+/−SEM). (B) pH of the cheese curd medium when different fungal species were grown. The pH of the medium in all fungal treatments was significantly higher compared to the uninoculated control (Dunnett’s test, P<0.0001). (C) Response of bacterial and fungal species grown alone on cheese curd agar at pH 5 and pH7. Bars represent mean (+/−SEM). (D) Fungal responses to co-culture with one of 11 bacterial species. Asterisk colors correspond to same system used in (A). (E) Photograph of select wells of pair-wise interaction assay. Wells with fungi grown alone and fungi grown with the bacterium Arthrobacter are shown. Panels on the top show top-down views of the 96-well plate where the surface of the microbial biofilm can be seen. Bottom panel shows the underside of each well for the corresponding top-down views showing how pigment production can easily be observed. See also Figure S6 and Table S6. Succession within a natural rind community is highly reproducible (A and B) Reproducible succession in an in situ rind community was observed as three batches of a natural rind cheese aged (1–63 days). (A) Relative abundance of community members was determined by amplicon sequencing of bacterial 16S rDNA (top panel) and the fungal ITS region (bottom panel). Each column represents the average of three wheels from the same batch. (B) The combined dataset of 16S rDNA and ITS amplicons was processed in QIIME, and Bray-Curtis dissimilarity was used to generate a principal coordinates analysis of rind microbial communities. Principal coordinate 1 was plotted versus time. Each point represents the average of triplicate wheels +/− standard deviation. (C and D) Reproducible succession was observed as in vitro natural rind communities aged (0–63 days). Colony forming units (CFUs) of each species were determined by plating serial dilutions of homogenized in vitro cheeses in triplicate at each timepoint. (C) CFUs were used to determine the relative abundance of bacterial (top panel) and fungal (bottom panel) community members. (D) Growth curves were plotted for each community member. Each point represents the average of triplicates, +/− standard deviation. See also Figure S7 and Table S7. Highlights Twenty-four genera of culturable bacteria and fungi dominate cheese rinds Abiotic factors, and not geography, correlate with community composition A reproducible succession of species leads to communities in situ and in vitro In vitro assays reveal extensive species interactions between community members \ No newline at end of file diff --git a/references_cache/PMID_36070752.md b/references_cache/PMID_36070752.md new file mode 100644 index 00000000..b0172ff1 --- /dev/null +++ b/references_cache/PMID_36070752.md @@ -0,0 +1,7 @@ +# PMID:36070752 + + + +Full text (re-fetched 2026-06-17 via NCBI BioC PMC): + +Design, construction, and in vivo augmentation of a complex gut microbiome SUMMARY Efforts to model the human gut microbiome in mice have led to important insights into the mechanisms of host-microbe interactions. However, the model communities studied to date have been defined or complex but not both, limiting their utility. Here, we construct and characterize in vitro a defined community of 104 bacterial species composed of the most common taxa from the human gut microbiota (hCom1). We then used an iterative experimental process to fill open niches: germ-free mice were colonized with hCom1 and then challenged with a human fecal sample. We identified new species that engrafted following fecal challenge and added them to hCom1, yielding hCom2. In gnotobiotic mice, hCom2 exhibited increased stability to fecal challenge and robust colonization resistance against pathogenic Escherichia coli. Mice colonized by hCom2 versus a human fecal community are phenotypically similar, suggesting that this consortium will enable mechanistic interrogation of species and genes on microbiome-associated phenotypes. In brief The development of a complex community of bacteria that represent the most common taxa from the human microbiome enables further mechanistic study of genes, pathways and species influence host physiology and health. Graphical Abstract INTRODUCTION Experiments in which a microbial community is transplanted into germ-free mice have opened the door to studies of mechanism and causality in the microbiome. These efforts fall into two categories based on the nature of the transplanted community: complete, undefined communities (i.e., fecal samples) versus incomplete but defined communities (i.e., synthetic communities). Fecal transplantation studies have shown that the microbiome plays a role in a variety of host phenotypes including the response to cancer immunotherapy, caloric harvest, colonization resistance to enteric pathogens, and neural development. While illuminating, a limitation of this format is that it is difficult to ‘fractionate’ an undefined community, making it challenging to discover which species are involved in a phenotype of interest. Synthetic communities are less well developed as model systems for the gut microbiome. Pioneering efforts have shown that a synthetic community can model the impact of diet on the microbiome, identified genes required for Bacteroides thetaiotaomicron growth in the mouse intestine in the presence of a 15-member community, and demonstrated that complex communities composed of species isolated from a single donor can stably colonize mice. More recent studies with defined communities have revealed mechanistic insights into immune modulation, glycan consumption, and other complex phenotypes driven by the microbiome. Although synthetic communities enable precise control over composition and manipulations such as strain dropouts and gene knockouts, the communities used are typically of low complexity (<20 strains), limiting their ability to model the biology of a native-scale microbiome. An ideal model system for the gut microbiome would capture the advantages of both approaches: near-native complexity would allow a model microbiome to capture properties of an ecosystem that are missing from simpler model systems, including emergent phenomena such as resilience to perturbation and cooperative metabolism. Moreover, complex consortia are a promising starting point for in vivo studies of the gut microbiome, for which they are better suited to model community-level phenomena such as immune modulation and the formation of structured multispecies biofilms. Complete definition (i.e., communities composed entirely of known organisms) would enable reductionist experiments to probe mechanism. The ability to construct communities with defined composition is especially relevant in the context of experiments testing whether phenotypes can be transferred to germ-free mice via fecal transplant. At present, since transplanted communities are typically undefined, it is difficult to uncover the mechanisms underlying these phenomena. A defined model system of sufficient complexity would enable reductionist follow-up experiments, bringing the gut microbiome in line with other model systems in which mechanistic studies are possible. To this end, we sought to create a community that is defined, enabling precise manipulations, and complex enough to exhibit emergent features of a complete community such as stability upon engraftment and colonization resistance. We started by constructing a complex defined community that contains the most prevalent bacterial species in the human gut microbiome (hCom1). We demonstrate that the assembly of this 104-member community is reproducible even for very low abundance species. By systematically perturbing this community and its growth medium, we uncover strain-nutrient and strain-strain (e.g. syntrophic) interactions that underlie its composition. We then colonize germ-free mice with hCom1, showing that it adopts a stable, highly reproducible configuration in which its constituent species span six orders of magnitude of relative abundance. We augment the community by filling open niches using an iterative, ecology-based process, and show that the enlarged community (hCom2) is more resilient to perturbation and resistant to pathogen colonization. Finally, we demonstrate that mice colonized by hCom2 are phenotypically similar to mice harboring an undefined human fecal sample, suggesting that our consortium and augmentation process lay the foundation for developing complete, defined models of the human gut microbiome. RESULTS Designing and building a complex synthetic community We set out to design a community composed of the most common bacterial species in the human gut microbiome. We analyzed metagenomic sequence data from the NIH Human Microbiome Project (HMP) to determine the most prevalent organisms—those that were present in the largest proportion of subjects, regardless of abundance. Although the HMP is not broadly representative of microbiomes from diverse geographies and ethnicities, this data set was well suited to our purposes since it was sequenced at very high depth, enabling us to identify low-abundance organisms that are nevertheless highly prevalent. After rank-ordering bacterial strains by prevalence, we found that ~20% (166/844) were present in >45% of the HMP subjects. Of these 166 strains, we were able to obtain 99 from culture collections or individual laboratories (Figure 1A; omitted strains are listed in Table S1). The profiled strains of three additional species were unavailable, so we used alternative strains of the same species (Lactococcus lactis subsp. lactis Il1403, Bacteroides xylanisolvens DSM 18836, and Megasphaera sp. DSM 102144). We added two additional strains to enable downstream experiments: Ruminococcus bromii ATCC 27255, a keystone species in polysaccharide utilization; and Clostridium sporogenes ATCC 15579, a model gut Clostridium species for which genetic tools are available. These 104 strains—a community termed ‘hCom1’—are prevalent and abundant in Western human gut communities (Data S1). Notably, unlike other defined communities used to model the gut microbiome, our consortium is within ~2-fold of the estimated number of species in a typical human gut (STAR Methods). A streamlined strain growth protocol simplified the assembly of hCom1 and single-strain dropouts (STAR Methods). We found that each of our 104 strains can be propagated in Mega Medium (MM), Chopped Meat Medium (CMM), or both (Key Resources Table). Growth rates, carrying capacities, and time of entry into stationary phase varied widely across strains and media. To simplify the process of community assembly while ensuring that slow-growing strains were actively dividing, each strain was inoculated from a frozen stock into liquid medium and passaged every 24 h for a total of 2–3 days. Before mixing individually cultured strains, we adjusted the volumes of each culture to achieve similar optical densities. A subset of the strains did not reach the diluted culture density of the remaining strains (STAR Methods); we added these cultures undiluted. We confirmed that our starting cultures were pure using metagenomic sequencing and high accuracy read mapping, as described in the next section. Development of a highly accurate metagenomic read-mapping pipeline Having assembled a community of 104 species, we next addressed how to quantify the abundance of each strain accurately, a major challenge given our expectation that some strains would be present at low abundance. Various strains in the community have identical 16S hypervariable sequences in the V3-V4 region, ruling out 16S amplicon-based methods. We considered designing a custom amplicon-based pipeline, but such an approach would require the design and validation of new primer sets for future communities. As an alternative, we sought to use metagenomic sequencing to quantity community composition. To test the performance of existing metagenomic analysis tools, we generated three ‘ground truth’ data sets. The first two consisted of simulated reads generated from the assembled genome sequences of each strain: one in which all 104 strains were equally abundant (to test sensitivity and specificity), and another in which strain abundance varied over six orders of magnitude (to test dynamic range). The third set consisted of actual reads derived from sequencing each strain individually using the same protocol as in subsequent community analyses. This data set allowed us to account for biases introduced by library construction and sequencing. We found that metagenomic read mappers based on a combination of Bowtie2 and SAMtools were sensitive but inaccurate: there was substantial mis-mapping of reads from one strain to others, such that whole-genome sequencing data from an individual strain was often interpreted as having arisen from multiple strains. Read mis-mapping from any abundant strain could therefore create noise that exceeds signal from low-abundance strains, degrading accuracy. In contrast, algorithms that focus on a few universal genes or unique k-mers such as MetaPhlAn2, MIDAS, Kraken2/Bracken, IGGsearch, or Sourmash were generally accurate to the species level, but since they only use a small fraction of the reads (<1%), their ability to detect low-abundance or closely related strains is limited. To address these challenges, we developed a new algorithm, NinjaMap (Data S2). Taking advantage of the fact that every strain in our community has been sequenced (Table S2), NinjaMap can quantify strain abundances with high accuracy across six orders of magnitude (STAR Methods). In brief, NinjaMap considers every read from a sample. If a read does not match perfectly to any of the genomes in the community (typically 3–4% of the reads), it is tabulated but not assigned. If a read has a perfect match to only one strain, it is assigned unambiguously to that strain. If a read matches more than one strain perfectly, it is temporarily placed in escrow. After all unambiguous assignments are made, an initial estimate of the relative abundance of each strain is computed. Reads in escrow are then fractionally assigned in proportion to the relative abundance of each strain, normalized by the total size of the genomic regions available for unique mapping to avoid bias in favor of strains with large or phylogenetically distinct genome sequences. Finally, relative abundances are computed. To assess the performance of NinjaMap, we conducted two tests. First, we assessed the degree of read mis-mapping from and into each strain’s ledger. We quantified how many reads from strain 1 were mis-assigned to strains 2–104 (which would underestimate the abundance of strain 1 in a community), and how many reads from strains 2–104 were mis-assigned to strain 1 (which would overestimate the abundance of strain 1). For simulated reads, most instances of these two types of read mis-mapping collectively resulted in relative abundance errors < ~10−5 (Data S2, Star Methods). For actual reads, mismapping was more frequent but still typically below a threshold of 10−4 (i.e., 0.01% relative abundance); mis-mapping likely arose either from deviations between the database genome sequence and the actual sequence of the strain in our collection, or from the process of sample preparation and sequencing (Data S2) (STAR Methods). The expected contribution to relative abundance from mismapping in a community context can be even lower for some strains (Data S2). Second, we used NinjaMap to analyze simulated reads from a 104-strain community. We found that this tool can accurately quantify strains with abundances as low as 10−6 in the context of a mixed community of known composition (Data S2), in agreement with the analysis of single-isolate samples. Thus, NinjaMap is capable of quantifying strains accurately over a wide dynamic range of relative abundances. Community construction is highly reproducible We began by measuring the degree of reproducibility in community composition data by constructing and propagating the 104-member community multiple times in vitro. We included technical replicates to assess variation in bacterial growth, DNA extraction, and sequencing, and biological replicates to determine the impact of differences in the preparation of the inocula. We propagated the communities for 48 h and extracted DNA for sequencing at 0, 12, 24, and 48 h. The range of cell densities at t=0 spanned multiple orders of magnitude (Figure 1B), with a mean log10(relative abundance) of −2.5±0.8 for all detectable strains. 95/104 strains were detectable at t=0; the remaining strains, which grew poorly when cultured individually, were below the limit of detection or had abundances that could potentially be explained by read mis-mapping. The communities reached a relatively stable configuration by 12 h (Figure 1B), with a remarkable degree of reproducibility among biological replicates (Figure 1C). Notably, very low-abundance strains (<10−4) were only slightly more variable than high-abundance strains. Technical replicates were even more similar (Figure 1D), indicating that community growth, DNA extraction, and sequencing contributed only modestly to variability. Taken together, these results indicate that community composition is robust to experimental variation. A nutrient drop-out screen to map strain-nutrient interactions in the community We next sought to explore the network of strain-nutrient interactions in the community. Although much is known about polysaccharide foraging by gut commensals, far less is known about amino acid utilization, so we performed the experiment in a defined growth medium (SAAC, STAR Methods) from which we could remove one amino acid at a time. Since amino acids are often utilized in pairs, eliminating one at a time from a complete background rather than adding one at a time to a null background has greater potential to reveal phenotypes relevant to community function. Moreover, performing this screen in the context of a diverse community (as opposed to the traditional practice of analyzing the growth of isolated strains) enables the potential study of community-dependent effects such as nutrient competition or mutualism-dependent nutrient utilization. To map strain-amino acid interactions, we constructed the 104-member community (STAR Methods) and used it to inoculate 20 defined growth media, each deficient in a single amino acid, as well as complete SAAC (Figure 2A). Samples were taken at 48 h and metagenomic sequencing data were analyzed to determine the impact of amino acid deficiency on the relative abundance of each strain. Global analysis of strain-amino acid interactions To identify strain-amino acid interactions, we tabulated strains whose relative abundance deviated significantly from the mean across conditions, taking advantage of the fact that most amino acid dropouts had little effect on most strains (Figure 2B, STAR Methods). When the community was propagated in the complete defined medium, relative abundances spanned >6 orders of magnitude. 36% of the strains were present at 10−4–10−2 relative abundance, 8 strains were >10−2 and 50 were <10−4 (Figure 2B). In agreement with simulated results, NinjaMap was sensitive to strains with relative abundances as low as 10−6, enabling us to quantify the 56% of strains that were below the 10−3 limit of detection commonly used for metagenomic analyses. Our system is therefore capable of studying low-abundance microbes, some of which are known to have large biological impacts. To identify significant responses, we calculated the standard deviation of the relative abundance of each strain across experiments and computed z-scores (Figure 2C, STAR Methods). Strain-amino acid interactions that were previously identified in monoculture studies were also observed in our community format. Anaerostipes caccae, whose growth is stimulated by methionine, decreased in relative abundance in a community grown in methionine-deficient medium (z=−3.48). Likewise, C. sporogenes expansion was impeded by the absence of leucine (z=−2.56), a substrate it oxidatively decarboxylates to isovalerate to generate electrons. These observations demonstrate that even though >100 strains are competing for the same nutrients, the effects of eliminating one amino acid on the growth of one strain are readily observable in the context of a complex and diverse community. Most strains responded to amino acid removal in ≤4 cases (Figure 2B). Moreover, relative abundances displayed low variability, with a mean standard deviation of log10(relative abundance) across strains <0.43. Only three strains, all of which are Firmicutes, were responsive to removal in >4 cases: Lactococcus lactis DSM 20729, Clostridium sporogenes ATCC 15579, and Lactobacillus ruminis ATCC 25644 (Data S3, Table S3). Thus, under these growth conditions, most strains are largely insensitive to amino acid removal while a small minority are highly responsive. We note that the response of a strain to amino acid removal may be direct (e.g. due to utilization for energy) or indirect (e.g. amino acid removal impacts an interacting strain). Amino acids varied widely in terms of their impact on community composition (Figure 2D). More than half of the strains responded to cysteine removal, likely due to its effect as a reducing agent. More than 5% of the strains responded to methionine, histidine, isoleucine, arginine, valine, and tyrosine removal, while for eight amino acids there were no significant changes to the community at all (Figure 2D). Interestingly, there were large differences among similar amino acids: no strains responded to lysine removal, while 10.6% and 7.6% of the strains responded to histidine and arginine removal, respectively. The removal of isoleucine, leucine, and arginine had a particularly large impact on community structure: C. sporogenes and L. lactis, the two most abundant strains when grown in complete defined medium, decreased >500-fold in relative abundance when any of these amino acids were removed (Figure 2E); this sensitivity was also observed in a biological replicate experiment (Data S3). Taken together, our data suggest that certain amino acids are ‘keystone’ nutrients that play an important role in determining community composition. C. sporogenes uses arginine to generate ATP Among the 86 candidate strain-amino acid interactions revealed by our screen, we were particularly intrigued by those involving C. sporogenes. Although C. sporogenes can oxidize and reduce aromatic amino acids, its relative abundance was unaffected by the removal of phenylalanine, tyrosine, or tryptophan (Data S3). In contrast, the removal of leucine, isoleucine, and arginine each had large impact on the fitness of C. sporogenes in the community. The second strongest phenotype was a decrease in relative abundance in the absence of arginine (Figures 2E, S2C); while C. sporogenes is known to metabolize arginine, no impact of arginine on growth or energy metabolism had been observed in prior work. To validate and characterize this interaction, we compared C. sporogenes growth in complete defined versus arginine-deficient medium. Although C. sporogenes grew well in complete defined medium, it exhibited a large growth defect in the absence of arginine (Figure 2F), indicating that this amino acid is an important substrate for growth. C. sporogenes can use other amino acids as substrates to support ATP synthesis. Hypothesizing that the same is true for arginine, we incubated wild-type C. sporogenes in a culture medium deficient in substrates for ATP synthesis. Upon addition of arginine, intracellular ATP levels rose sharply (Figure 2G), indicating that C. sporogenes generates ATP (directly or indirectly) from arginine. To identify the enzymes involved in this process, we parsed the C. sporogenes genome for pathways known to capture energy from arginine. This search yielded candidate genes for each of the three steps in the arginine deiminase pathway (Figure 2H), which catalyzes the net conversion of arginine to ornithine plus CO2 and two equivalents of ammonium, generating one equivalent of ATP. Using a method we recently developed to construct scarless deletions in C. sporogenes, we generated strains deficient in the putative arginine deiminase (CLOSPO_00894, Δadi) or ornithine carbamoyltransferase (CLOSPO_02415, Δotc). The Δotc mutant was unable to generate ATP in response to arginine provision, consistent with a role for the arginine deiminase pathway in C. sporogenes energy production (Figure 2G). In contrast, the Δadi mutant showed no defect in arginine-induced ATP production (Data S3), suggesting the possibility of an alternative pathway to generate citrulline from arginine. Consistent with these observations, the Δotc mutant (but not the Δadi mutant) was growth-deficient complete defined medium (Figure 2F, Data S3). The deficiency was partial, suggesting that an alternative pathway can generate energy from arginine under these conditions. Together, these results show that arginine metabolism by the arginine deiminase pathway contributes directly to the cellular ATP pool, augmenting our understanding of how amino acid metabolic pathways contribute to the fitness of a gut commensal within a complex community. Attributes of a complex defined community in gnotobiotic mice Our central goal in designing hCom1 was to enable mechanistic studies of the microbiome in the context of host colonization. As a starting point for in vivo work, we colonized germ-free Swiss-Webster (SW) mice with hCom1 (Figure 3A), which we prepared by propagating each strain individually and mixing OD-normalized cultures (STAR Methods). We sampled fecal pellets from the mice weekly for eight weeks, enumerated community composition in the inoculum and each fecal sample by metagenomic sequencing, and performed read analysis using NinjaMap. Our analysis yielded two main conclusions. First, almost all strains in the inoculum colonized the mouse gut (Figure 3B-C). We confirmed the presence of 103/104 strains in the inoculum; of these, 101 strains were detected in the mice at least once. The three strains we failed to detect in mice—Ethanoligenens harbinense YUAN-3, Clostridium methylpentosum DSM 5476, and Ruminococcus albus 8—were slow-growing and difficult to cultivate. While strain relative abundances spanned >6 orders of magnitude, nearly all strains exhibited low variation across 20 mice in four cages, with coefficient of variation (CV, standard deviation/mean) <0.4. Second, the community quickly reached a stable configuration (Figure 3D). Averaged across mice, relative abundances remained largely constant two weeks after colonization, with Pearson’s correlation coefficient >0.95 at each time point with respect to the composition in week 8. After the first week, relative abundances stayed within a narrow range for the duration of the experiment (mean CV<0.2 across the 96 strains that remained above the limit of detection). Large shifts in relative abundance were rare: only 27/312 (8.7%) week-to-week strain-level changes were >10-fold. An ecology-based process to fill open niches in the community Although hCom1 is composed of prevalent species from the human gut microbiome, it is not as complex or phylogenetically rich as a human fecal community; the process that dictated its membership was not designed to ensure completeness by any functional or ecological criteria. To create a defined community that better models the gut microbiome, we sought to augment hCom1 by increasing the number of niches it fills in the gastrointestinal tract (Figure 4A). We designed an experimental strategy based on the principle of colonization resistance, an ecological phenomenon in which resident organisms exclude invading species from occupied niches. We colonized germ-free mice for four weeks with hCom1, presumably filling the metabolic and anatomical niches in which its species reside. We then challenged these mice with one of three undefined fecal samples (Hum13), reasoning that invading species that would otherwise occupy a niche already filled by hCom1 would be excluded, whereas invading species whose niche was unfilled would be able to cohabit with hCom1. After four additional weeks, we used metagenomic sequencing to analyze community composition from fecal pellets. To determine which species from each fecal sample colonized in the presence of hCom1, we analyzed the composition of fecal pellets collected in weeks 5–8 to assign species as ‘input’ (hCom1-derived) or ‘invader’ (fecal sample-derived). For this analysis we used MIDAS, an enumeration tool that—unlike NinjaMap—does not require prior knowledge of the constituent strains. MIDAS and NinjaMap reported highly concordant relative abundance profiles using sequencing reads from hCom1-colonized mice, although—as expected—MIDAS was less sensitive since it utilizes only 1% of sequencing reads (Star Methods, Data S4). We used MIDAS for subsequent analyses of samples that were partially or completely undefined. Using MIDAS, we cannot determine whether a strain present both pre- and post-challenge was derived from hCom1 (i.e., the original strain colonized persistently) or the fecal sample (i.e., a new strain displaced the original strain). To gain further insight into strain displacement versus persistence, we recruited reads from samples taken four weeks post-challenge (week 8) to a database composed of the hCom1 genome sequences, using only reads that were 100% identical to one or more of the genomes. We focused our analysis on genomes with high depth of coverage (≥10X). More than 60% of these strains were covered broadly (≥95%) by perfectly matching reads, indicating that most strains present pre- and post-challenge were either hCom1-derived or a closely related strain (Data S4). As expected, mice challenged by saline instead of a fecal sample showed no evidence of new species post-challenge (Figure 4B). In hCom1-colonized mice challenged by a fecal sample, an average of 89% of the genome copies from week 8 (and 58% of the MIDAS bins, a rough proxy for species) derived from hCom1 (Figure 4B). The remaining 11% of the genome copies (and 42% of the MIDAS bins) represent new species that joined hCom1 from one of the fecal samples. Despite the addition of new species, the architecture of the community remained intact (Figure 4C): the relative abundances of the hCom1-derived species present post-challenge were highly correlated with their pre-challenge levels (Pearson’s r >0.85) (Figure 4D). Thus, hCom1 is broadly but not completely resilient to a human fecal challenge. Designing and constructing an augmented community The observation that only a small fraction of the post-challenge communities was composed of new species led us to hypothesize that we could improve the colonization resistance of hCom1 by adding the invading species, thereby improving its ability to fill niches in the gut. Twenty-four bacterial species entered hCom1 from ≥2 of the 3 fecal samples used as a challenge (Table S4); we focused on these species, reasoning that they were more likely to fill conserved niches in the community. We were able to obtain 22/24 from culture collections and we included all of them in the new community (hCom2). At the same time, we omitted seven species that either failed to colonize initially or were displaced in all three groups of mice (Figure S4), reasoning that they were incompatible with the rest of hCom1 or incapable of colonizing the mouse gut under the dietary conditions in which the experiment was performed. Thus, the new community contains 97 strains from hCom1 plus 22 new strains, for a total of 119 (Figure 4A, Figure S1, Table S2). These 22 strains are primarily Firmicutes or species of Alistipes. Many represent taxa that are phylogenetically under-represented in hCom1, suggesting that they might be able to occupy niches left open by the members of hCom1 (Figure S1). We colonized four groups of germ-free SW mice with hCom2, collecting fecal pellets weekly (Figure 4A). As before, we measured community composition by analyzing metagenomic sequencing data with NinjaMap (Figure 5A, Table S4). The gut communities of hCom2-colonized mice rapidly reached a stable configuration (Pearson’s r with respect to week 8 >0.97) (Figure S2). 100 of the 119 strains were above the limit of detection; hCom1-derived strains colonized at similar relative abundances in the context of the augmented community (with similarly low CVs across mice) (Figure 5B). The species that were new to hCom2 exhibited a wide range of relative abundances; Bacteroides rodentium became the most abundant species, whereas the least abundant of the new species, Blautia sp. KLE 1732, had a mean abundance ~10−4 (Figure 5B). The augmented community is more resilient to human fecal challenge Our goal in constructing hCom2 was to improve its completeness as assessed by its ability to occupy niches in the gut. To test whether hCom2 is more complete than hCom1, we challenged hCom2-colonized mice at the beginning of week 5 with the same fecal samples used to challenge hCom1, enabling us to compare results between the challenge experiments. Importantly, the 22 strains used to augment hCom1 were obtained from culture collections rather than the fecal samples themselves, reducing the likelihood that hCom2 and the fecal samples have overlapping membership at the strain level (Garud et al. 2019). Indeed, by recruiting sequencing reads to the genomes of the new organisms in hCom2, we found that 17/22 were covered broadly (≥95%) by perfectly matching reads, consistent with the view that they were derived from hCom2 and not the fecal challenge (Data S4). An average of 96% of the genome copies (and 81% of the MIDAS bins) from week 8 derived from the strains in hCom2 (Figure 5C), demonstrating that the colonization resistance of hCom2 is markedly improved over hCom1 (Figure 5D). The remaining 4% of reads (and 19% of MIDAS bins) represent species that engrafted in the presence of hCom2 (Figures 5D, S2). Strikingly, nearly all of the species that invaded hCom2 also invaded hCom1 (Figure 5E, Table S4); we were either unable to obtain an isolate for inclusion in hCom2 or the species invaded hCom1 from only 1 of the 3 fecal samples used as a challenge, falling below our threshold for inclusion. These species represented virtually all of the remaining genome copies. We conclude that more extensive augmentation, based on the results of the first challenge experiment, would likely have enhanced colonization resistance further. Moreover, compared to hCom1, the composition of hCom2 post-challenge was more similar to its pre-challenge state (Pearson’s r >0.95, Figure 5F). Taken together, these data show that hCom2 is more stable and complete than hCom1, and that the augmentation process is robust and fault-tolerant in identifying species that can occupy unfilled niches. In the previous experiment, we challenged hCom2-colonized mice with Hum1–3, the same fecal communities used in the initial augmentation experiment (Figure 4). We next sought to determine whether hCom2 is resilient to challenge by unrelated fecal communities. hCom2-colonized mice were challenged with Hum4–6, which are compositionally distinct from Hum1–3 (Figure 4A). hCom2 was somewhat less stable to challenge by unrelated fecal samples: an average of 81% of the genome copies from week 8 (and 58% of the MIDAS bins) derived from hCom2 (Figure 5D). Thus, hCom2 is broadly but not completely resilient to challenge by unrelated fecal samples. The architecture of hCom2 resembles that of a complete, undefined human fecal consortium Our original goal in building a complex defined community was to develop a model system for the gut microbiome. Having demonstrated that hCom2 is stable and resilient to invasion, we sought to assess whether it has the functional attributes of a model system. We started by asking how its architecture—the relative abundances of its constituent taxa— compares to that of a human fecal community. We colonized germ-free mice with three human fecal samples (Hum1–3; hereafter, ‘humanized’) and compared their community compositions to those of mice colonized with hCom2. The gut communities of hCom2-colonized and humanized mice were similar in three ways (Figures 5G-H, S3). First, relative abundances spanned at least five orders of magnitude, with some strains consistently colonizing at >10% and others at <0.001%. Second, the distribution of log relative abundances was centered at ~0.01%, indicating that the majority of strains in the community would be missed by enumeration tools that have a limit of detection of 0.1%. Third, relative abundances by taxon are similar down to the genus level (Figure S3). Thus, the architecture of hCom2 resembles that of a human fecal community in the mouse gut. Reproducibility of colonization We next addressed the question of biological reproducibility, which is a threshold requirement for an experimental model system. We started by analyzing data from the second fecal challenge experiment (with Hum1–3) to assess the technical reproducibility of community composition in mice colonized by hCom2. At week 4, strain abundances in 20 mice across 4 cages colonized by the same hCom2 inoculum were highly similar (pairwise Pearson’s correlation coefficients 0.96±0.01, Data S5). Biological reproducibility was a greater concern. Given the complexity of hCom1 and hCom2, variability in the growth of individual strains could lead to substantial differences in the composition of inocula constructed on different days. To determine the extent to which this variability affects community architecture in vivo, we compared community composition in four groups of mice colonized by replicates of hCom2 constructed independently on different days (Figure 6A-B). The communities displayed a striking degree of similarity in relative abundance profiles after 4 weeks (Pearson’s correlation coefficient >0.95 between all pairs of biological replicates). We conclude that a relatively constant nutrient environment enables input communities with widely varying relative abundances to reach the same steady state configuration, consistent with ecological observations in other microbial communities. This high degree of biological reproducibility will be enabling for the use of complex defined communities as experimental models. To further investigate the potential for hCom2 to function as a model microbiome, we assessed its composition in a second strain of mice. Since the experiments to develop hCom2 used outbred SW mice, we chose 129/SvEv, an inbred mouse strain. We colonized germ-free 129/SvEv mice with hCom2 and collected fecal pellets after 4 weeks of colonization. Community composition was highly correlated with that of SW mice (Pearson correlation coefficient >0.95) (Data S5). These data indicate that hCom2, like the human gut microbiome, is robust to changes in host genotype. hCom2-colonized mice are phenotypically similar to humanized mice We performed three additional experiments to determine the degree to which hCom2-colonized mice resemble germ-free mice colonized by a human fecal community. Since our defined communities are composed of human fecal isolates, we colonized germ-free mice with hCom2 or an undefined human fecal community and assayed phenotypes after 4 weeks (Figure 6A). First, fecal pellets from each mouse were serially diluted and plated on Columbia blood agar to estimate the bacterial cell density in each community. Each group contained 1011-1012 colony forming units per gram of feces (Figure 6C), similar to previously reported estimates from humans and from conventional and humanized mice. Thus, hCom2 colonizes the mouse gut to a similar extent as a normal murine or human fecal community. Next, we sought to determine whether mice colonized by hCom2 harbor a similar immune cell profile to that of humanized mice. We extracted and stained colonic immune cells and assayed them by flow cytometry. Most immune cell subtypes, including CD4+ T cells, IgA+ B cells, macrophages, CD11b+ dendritic cells, and monocytes, were similarly abundant in humanized and hCom2-colonized mice (Figure 6D, Data S5), indicating that—at least in broad terms—hCom2-colonized mice are immunologically comparable to humanized mice. Finally, to determine whether hCom2-colonized and humanized mice harbor a similar profile of microbiome-derived metabolites, we analyzed fecal pellets and urine samples using targeted metabolomics. Aromatic amino acid metabolite levels in urine (Figure 6E) and primary and secondary bile acid levels in feces (Figure 6F) were comparable between hCom2-colonized and humanized mice. Taken together, these data suggest that hCom2 is a reasonable model of gut microbial metabolism. hCom2 exhibits robust colonization resistance against pathogenic Escherichia coli To demonstrate its utility as a model system, we used hCom2 to study an emergent property of gut communities: their ability to resist colonization by pathogens and pathobionts. To test whether hCom2 exhibits colonization resistance, we studied invasion by Escherichia coli ATCC 43894, an enterohemorrhagic E. coli (EHEC). We chose this strain for three reasons. First, EHEC is responsible for life-threatening diarrheal infections and hemolytic uremic syndrome, and enteric colonization by other E. coli strains has been linked to malnutrition and inflammatory bowel disease. Second, colonization resistance to E. coli and other Enterobacteriaceae has been studied in detail, but the commensal strains responsible and mechanisms by which they act are incompletely understood. Finally, hCom2 harbors no Enterobacteriaceae and only three species of Proteobacteria (Desulfovibrio piger, Bilophila wadsworthia, and Burkholderiales bacterium 1–1-47), so resistance to E. coli colonization would require a mechanism other than exclusion by a close relative occupying the same niche. To test whether hCom2 is capable of resisting EHEC engraftment, we colonized germ-free SW mice with hCom2 or one of two other communities: a 12-member community (12Com) similar to one used in previous studies or an undefined fecal community from a healthy human donor (Figure 7A). hCom2 and 12Com do not contain any Enterobacteriaceae. To test whether non-pathogenic Enterobacteriaceae enhance colonization resistance to EHEC, we colonized two additional groups of mice with variants of hCom2 and 12Com to which a mixture of seven non-pathogenic Enterobacteriaceae strains were added (Escherichia coli MITI 27, Escherichia coli MITI 117, Escherichia coli MITI 135, Escherichia coli MITI 139, Escherichia coli MITI 255, Escherichia coli MITI 284, and Enterobacter cloacae MITI 173; termed ‘Enteromix’). After four weeks, we challenged with EHEC and assessed invasion by selective plating under aerobic growth conditions (Figure 7A). Consistent with previous reports, the undefined human fecal community conferred robust resistance against EHEC colonization (Figure 7B-C). In contrast, 12Com allowed much higher levels of EHEC growth; the addition of Enteromix to 12Com improved the phenotype but did not restore full EHEC resistance (Figure 7B). Despite lacking Enterobacteriaceae, hCom2 exhibited a similar level of EHEC resistance to that of an undefined fecal community (Figure 7B). Thus, hCom2 is sufficiently complete to exhibit comparable levels of colonization resistance to a native fecal community. As a starting point for identifying which species in hCom2 are responsible for EHEC colonization resistance, we constructed four communities in which we dropped out, in turn, all of the species in the phyla Firmicutes, Verrucomicrobia, Actinobacteria, and Proteobacteria. We colonized mice with these phylum dropout communities and then challenged them with EHEC (Figure 7D). The ΔActinobacteria (missing 10 strains) and ΔVerrucomicrobia communities (missing 1 strain, Akkermansia muciniphila) resisted EHEC comparably to hCom2 (Figure 7E-F). However, the ΔProteobacteria and ΔFirmicutes communities were more susceptible. Thus, despite the lack of Enterobacteriaceae in hCom2, the absence of the three more distantly related species of Proteobacteria was sufficient to confer sensitivity to EHEC invasion. The ΔFirmicutes community was highly sensitive to EHEC invasion (Figure 7E); the defect resulted in a large survival difference between hCom2-colonized and ΔFirmicutes-colonized mice (Figure 7E, right). These results indicate either that either Firmicutes play a role in EHEC resistance or that a change in community architecture induced by their removal renders the community sensitive to invasion. Further studies with more precise strain dropout experiments could uncover strains that confer resistance and may enable more targeted microbial therapy against EHEC colonization and infection. DISCUSSION By developing a community that is both defined and reasonably complex, we have generated a model system that captures much of the biology of a native microbiome. Future refinements are needed, including additional bacterial strains to occupy unfilled niches as well as archaea, fungi, and viruses, all of which are important components of the native ecosystem. The computational pipeline we developed for read mapping makes it possible to analyze complex defined communities with high precision and sensitivity. Community structure can be quantified across six orders of magnitude in relative abundance, enabling the interrogation of low-abundance community members that play important roles in community function and dynamics. The degree of technical and biological reproducibility (Figure 6B) is remarkable in a system this complex, which bodes well for future experimental efforts. The process by which we augmented a defined community revealed two unexpected findings. First, a community composed of strains from >100 distinct donors can be stable in vivo. It remains to be seen whether there are appreciable differences in stability—or in fine-scale genomic and phenotypic adaptation—between communities composed of isolates from a single donor (in which strains have coexisted for years) versus multiple donors (in which strains have no prior history together). If a collection of strains with no common history can form a stable consortium, it will be interesting to determine the role of priority effects (i.e., order of arrival) and spatial and metabolic niche occupancy. Second, the process we introduce here for filling open niches is surprisingly robust and fault tolerant. Most notably, nearly all of the fecal community-derived strains that invaded hCom1—Alistipes, Blautia, Bilophila, Oscilibacter, and Proteobacteria—were under-represented phylogenetically within hCom1 (Figure S1). Moreover, most of the strains that invaded hCom2 had previously invaded hCom1, indicating that niche filling is deterministic. Importantly, the augmentation process caused relatively little perturbation to the structure of the existing community (notable exceptions are shown in Table S4), suggesting that it will result in a progressive improvement of the community. While the augmentation process can only fill niches that are conserved from mice to humans, the observation that most of our human strains engrafted suggests that many niches are conserved. If we had broadened our strain inclusion criteria, there is a reasonable likelihood we could have improved colonization resistance further after just one round of augmentation. To further enhance niche filling and stability, it would help to subject hCom2 to further rounds of augmentation using fecal samples from additional donors, ideally in the presence of a varying diet. It might also be possible to improve niche occupancy, for example, in the setting of intestinal inflammation by performing the augmentation process in a murine model of inflammatory bowel disease. There is a pressing need for a common model system for the gut microbiome that is completely defined and complex enough to capture much of the biology of a full-scale community. We showed that hCom2 is a reasonable starting point for such a system: in spite of its complexity, it colonizes mice in a highly reproducible manner. Moreover, hCom2 faithfully models the carrying capacity, immune cell profile, and metabolic phenotypes of humanized mice. There remain some modest differences in metabolic and immune profiles, and the community is still missing certain taxa that will likely be important to add. Nonetheless, taken together, our findings suggest that hCom2 is a reasonable starting point for a model of the gut microbiome. One of the most interesting possibilities for such a system would be to enable reductionist experiments downstream of a community transplantation experiment (e.g., to identify strains responsible for a microbiome-linked phenotype). Although we did not identify the strains responsible for colonization resistance to EHEC, we did find that removing species of Proteobacteria or Firmicutes rendered the community EHEC-sensitive. Follow-up experiments in which one or several strains at a time are eliminated from the community could narrow further from the phylum level to individual strains. Efforts to identify the strains responsible for other microbiome-linked phenotypes including response to cancer immunotherapy, caloric harvest, and neural development, would be of great interest. Limitations of the study Our study has three important limitations. First, while Com2 is stable to challenge with the fecal communities used to augment it, it is less stable to challenge with unrelated fecal communities. These data suggest that subsequent rounds of backfill—using a variety of unrelated fecal samples in series or in parallel—is a promising path toward an even stabler variant of hCom2. Second, it is unclear how many more bacterial strains (or other components) may be necessary to model the full functional capacity of a native human microbiome. Prior estimates of the number of species in a typical human microbiome range from ~150–300. Nonetheless, the observation that a defined community of just 119 strains exhibits remarkable stability bodes well for future efforts. We estimate that hCom2 is within 2-fold of native-scale complexity (STAR Methods), so a full-scale system is experimentally feasible. As a starting point for efforts to build such a system, hCom2 will provide a standard for assessing the genomic and functional completeness of model communities, with the ultimate goal of modeling native-scale human microbiomes. Third, strain-level variation among communities underlies some of the phenotypic differences conferred on the host by the microbiome. hCom2 represents just one consortium of strains, so neither hCom2 nor any other single community can model the impact of strain-level variation on host phenotype. However, we think that a defined community is a promising starting point for probing strain-level differences: a collection of communities that are identical but harbor different strains of a species of interest would be an ideal way to probe the impact of strain variation—or even individual genes—on phenotype. STAR★METHODS RESOURCE AVAILABILITY Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Michael Fischbach (fischbach@fischbachgroup.org). Materials availability C. sporogenes strains are available on request. The strains used in this study are available from the sources listed in the Key Resources Table. Data and code availability Metagenomic and whole-genome sequencing datasets generated for this study are available at the Sequence Read Archive. The ninjamap code used in this study can be found at the following github location: https://github.com/FischbachLab/ninjaMap/releases/tag/cheng_et_al and the associated docker containers are available at https://hub.docker.com/repository/docker/fischbachlab/ninjamap. EXPERIMENTAL MODEL AND SUBJECT DETAILS Bacterial strains and culture conditions Bacterial strains were selected based on HMP sequencing data. We obtained all species from publicly available repositories; the mean relative abundance and prevalence of each strain were quantified using the 81 samples from healthy human patients from North America. The 166 strains that appeared in ≥37 of the 81 samples were considered for inclusion in the community. We were able to obtain 104 of these strains from public repositories and academic laboratories; the origin of each strain is listed in the Key Resources Table. Preparation of synthetic community for storage and for experiments For all community experiments, strains were cultured in anaerobic conditions (10% CO2, 5% H2, 85% N2) in 2-mL 96-well plates for 24–48 h in their respective growth media (Key Resources Table): Mega Medium supplemented with 400 μM vitamin K2, or Chopped Meat Medium supplemented with Mega Medium carbohydrate mix and 400 μM vitamin K2. For strain storage, 200 μL of liquid culture were aliquoted 1:1 into sterile 50% glycerol in a 1-mL 96-well plate. The plate was covered with an airtight silicone fitted plate mat, edges were sealed with O2-impervious yellow vinyl tape, and the plate was frozen at −80 °C. Each storage plate includes 3–4 “sentinel” wells containing only growth medium that were used to monitor potential contamination during revival. Preparation of synthetic community for in vitro experiments From frozen stocks in 96-well plates, 100 μL of each strain were used to inoculate 900 μL of fresh autoclave-sterilized media of the appropriate type for each strain in 2.2-mL 96-well deep well plates (Thomas Scientific, Cat. #1159Q92). All culturing was done in an anaerobic chamber (Coy Laboratories) at 10% CO2, 5% H2, and 85% N2 atmosphere. Strains were diluted 1:10 every 24 h for 2 days into fresh growth medium in 2.2-mL deep well plates, and then diluted 1:10 into 4 mL of the appropriate medium in 5-mL 48-well deep well plates (Thomas Scientific, Cat. #1223T83). After 24 h, the optical density at 600 nm (OD600) of each well was measured. As the spectrophotometer does not accurately measure OD values >1, individual strain cultures were diluted 1:10 to quantify OD600. Stocks were diluted to a final OD600 of 0.1 using fresh growth medium. Equal volumes of each stock were pooled to create a 104-member synthetic community. The community was centrifuged at 5000 × g for 5 min, washed, and resuspended in an equivalent volume of PBS to generate the pooled community working stock. SAAC medium was made containing all amino acids at 1 mM concentration except for cysteine, which was added at 4.126 mM (Table S6). Twenty similar media were made in which one amino acid at a time was removed. 1.6 mL of each medium were aliquoted in triplicate and inoculated with the pooled community at 1:100 dilution. Four 100-μL aliquots of each culture were collected at 48 h and processed for metagenomic sequencing. Preparation of synthetic community for in vivo experiments For all germ-free mouse experiments, strains were cultured and pooled in the following manner: From frozen stocks in 96-well plates, 100 μL of each strain were used to inoculate 900 μL of fresh autoclave-sterilized media of the appropriate type for each strain in 2.2-mL 96-well deep well plates (Thomas Scientific, Cat. #1159Q92). All culturing was done in an anaerobic chamber (Coy Laboratories) at 10% CO2, 5% H2, and 85% N2 atmosphere. Strains were diluted 1:10 every 24 h for 2 days into fresh growth medium in 2.2-mL deep well plates, and then diluted 1:10 into 4 mL of the appropriate medium in 5-mL 48-well deep well plates (Thomas Scientific, Cat. #1223T83). After 24 h, the OD600 of each well was measured after diluting individual strain cultures 1:10. Based on these measurements of OD600 and enumeration of colony forming units (CFUs), we found that an OD600 of 1.3 corresponds to ~109 cells/mL for E. coli. Using this estimate, we pooled appropriate volumes of each culture corresponding to 2 mL at OD600=1.3, centrifuged for 5 min at 5000 × g, and resuspended the pellet in 2 mL of 20% glycerol that had been pre-reduced for at least 48 h. For each inoculum preparation cycle, up to 18 of the 119 strains did not reach OD600~1.3. For these strains, the entire 4-mL culture volume was used for pooling (the following paragraph contains details on these 18 strains). Volumes were scaled up accordingly if more inoculum was required for an experiment. Following pooling and preparation, 1.2 mL of the synthetic community were aliquoted into 2-mL Corning cryovials (Corning, Cat. #430659), removed from the anaerobic chamber, and transported to the vivarium where each vial was uncapped and its contents orally gavaged into mice within 1 min of uncapping. Each mouse received 200 μL of the mixed community inoculum. For the initial augmentation experiments, we used freshly prepared inoculum; for all subsequent experiments, the inoculum was frozen in cryovials at −80 °C. On the day of the experiment, the inoculum was defrosted and administered by oral gavage. The target for the inoculation procedure was that each mouse should receive ~108 cells of each bacterial strain in a 200 μL volume, for a total of ~1010 bacterial cells since hCom1 and hCom2 harbor 104 and 119 strains, respectively. Eighteen of the 119 strains did not always grow to a high enough OD to match the post-dilution OD of the other strains. We added these mono-cultures undiluted to the mixed culture. Of these 18 strains, four never reached the target culture density (Ethanoligenens harbinense DSMZ 18485, Slackia heliotrinireducens DSM 20476, Ruminococcus albus strain 8, and Ruminococcus flavefaciens FD-1). The remaining 14 strains (Clostridium sp. L2–50, Clostridium sp. M62/1, Clostridium leptum DSM 753, Butyrivibrio crossotus DSM 2876, Blautia hydrogenotrophica DSM 10507, Veillonella dispar ATCC 17748, Collinsella stercoris DSM 13279, Megasphaera sp. DSMZ 102144, Prevotella buccae D17, Slackia exigua ATCC 700122, Adlercreutzia equolifaciens DSM 19450, Alistipes ihumii AP11, Burkholderiales bacterium 1_1_47, and Blautia sp. KLE 1732) exhibited variable growth. When they did not meet the target OD, we added the complete undiluted monoculture to the pooled community mixture. Of note, normalization by OD can be fraught given differences in cell size and shape. A titration curve relating CFUs to optical density would be more accurate. However, even with the OD-based method we used, our community data were reproducible in vitro (Figure 1C-D) and in vivo (Figure 6B). Collection and preservation of human fecal samples For all experiments, human fecal samples were preserved in the same manner for inoculation into germ-free or hCom1/2-colonized mice. Specifically, freshly voided human feces was collected in a sterile container and transported into the anaerobic chamber within 5–10 min. The fecal sample was weighed, mixed 1:1 with an equivalent volume of pre-reduced PBS, and stored at −80 °C. Preparation of human fecal samples For human fecal challenge experiments, a fecal mixture was defrosted in the anaerobic chamber and diluted 1:100 into pre-reduced PBS. One milliliter was aliquoted into pre-reduced 2-mL Corning cryovials, removed from the anaerobic chamber, and transported to the vivarium, where each vial was uncapped and orally gavaged into mice within 1 min of uncapping. Each mouse received 200 μL of the bacterial mixture. Feces contains ~1011 colony forming units per gram of feces; based on the dilutions performed, we estimate that each mouse received 108-1010 bacterial cells in the fecal challenge. For all non-challenge fecal colonization experiments, the preserved fecal mixture was defrosted in the anaerobic chamber and diluted 1:2 into pre-reduced PBS. One millilter of the resulting mixture was aliquoted into pre-reduced 2-mL Corning cryovials, removed from the anaerobic chamber, and transported to the vivarium, where each vial was uncapped and orally gavaged into mice within 1 min of uncapping. Each mouse received 200 μL of the bacterial mixture, equivalent to 1010–1011 bacterial cells per mouse. Preparation of 12Com Cultures of the 12 strains in 12Com (Bacteroides thetaiotaomicron VPI-5482, Bacteroides caccae ATCC 43185, Bacteroides ovatus ATCC 8483, Bacteroides uniformis ATCC 8492, Bacteroides vulgatus ATCC 8482, Clostridium scindens ATCC 35704, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Eggerthella lenta DSM 2243, Eubacterium rectale ATCC 33656, Parabacteroides distasonis ATCC 8503, and Ruminococcus torques ATCC 27756) were prepared in their respective growth media and propagated anaerobically for 24 h to OD600~1.3. Two milliliters of each strain were pooled, centrifuged for 5 min at 5000 × g, and the pellet was resuspended in 2 mL of 20% pre-reduced glycerol and frozen in 1-mL aliquots in 2-mL Corning cryovials. Preparation of Enteromix Six strains of non-pathogenic Escherichia coli (strains MITI 27, MITI 117, MITI 135, MITI 139, MITI 255, MITI 284) and one strain of Enterobacter cloacae (MITI 173) were isolated from the fecal sample of a healthy human donor by mass spectrometry-guided enrichment culture. Strains were stored at −80 °C in 25% glycerol. To prepare cultures for mouse colonization, strains were grown overnight in BHI broth (Fisher Scientific, Cat. # B99070), diluted 1:10 into 5 mL BHI broth, and cultured to OD600=1.3. Two milliliters of each strain were pooled, centrifuged for 5 min at 5000 × g, and the pellet was resuspended in 200 μL of 20% pre-reduced glycerol. One hundred microliters of this mixture were added to a tube containing 1 mL of previously prepared hCom2 or 12Com inoculum to create hCom2+Enteromix or 12Com+Enteromix, respectively. Each mouse was orally gavaged with 220 μL of the appropriate community. The estimated amount of each Enteromix strain administered to mice was 109 cells per 20 μL dose. METHOD DETAILS Metagenomic sequencing The same experimental pipeline was used for sequencing bacterial isolates and synthetic communities. Bacterial cells were pelleted by centrifugation under anaerobic conditions. Genomic DNA was extracted using the DNeasy PowerSoil HTP kit (Qiagen) and quantified in 384-well format using the Quant-iT PicoGreen dsDNA Assay Kit (Thermofisher). Sequencing libraries were generated in 384-well format using a custom low-volume protocol based on the Nextera XT process (Illumina). Briefly, the concentration of DNA from each sample was normalized to 0.18 ng/μL using a Mantis liquid handler (Formulatrix). If the concentration was <0.18 ng/μL, the sample was not diluted further. Tagmentation, neutralization, and PCR steps of the Nextera XT process were performed on a Mosquito HTS liquid handler (TTP Labtech), leading to a final volume of 4 μL per library. During the PCR amplification step, custom 12-bp dual unique indices were introduced to eliminate barcode switching, a phenomenon that occurs on Illumina sequencing platforms with patterned flow cells (Sinha et al. 2017). Libraries were pooled at the desired relative molar ratios and cleaned up using Ampure XP beads (Beckman) to achieve buffer removal and library size selection. The cleanup process was used to remove fragments <300 bp or >1.5 kbp. Final library pools were quality-checked for size distribution and concentration using a Fragment Analyzer (Agilent) and qPCR (BioRad). Sequencing reads were generated using a NovaSeq S4 flow cell or a NextSeq High Output kit, in 2×150 bp configuration. 5–10 million paired-end reads were targeted for isolates and 20–30 million paired-end reads for communities. Constructing high quality genome assemblies We obtained the latest RefSeq assembly for each strain in our community and assessed its quality based on contig statistics from Quast v. 5.0.2 and SeqKit v. 0.12.0, using GTDB-tk v. 1.2.0 for taxonomic classification. A ‘combination score’ was calculated as a linear combination of the completeness and contamination scores (completeness–5×contamination) derived from the CheckM v. 1.1.2 lineage workflow; such a score has been used previously, along with the metrics described here (https://gtdb.ecogenomic.org/faq#gtdb_selection_criteria), to include or exclude genomes in the GTDB release 89 database. Genomes that contained any number of Ns, >100 contigs, GTDB lineage warnings, multiple matches, or had CheckM completeness <90, contamination >10, and combination score <90 were resequenced and reassembled. Our hybrid assembly pipeline contains a workflow for de novo and reference-guided genome assembly using both Illumina short reads and PacBio or Nanopore long reads. The workflow has three main steps: read pre-processing, hybrid assembly, and contig post-processing. Read pre-processing included 1) quality trimming/filtering (bbduk.sh adapterFile=“adapters,phix” k=23, hdist=1, qtrim=rl, ktrim=r, entropy=0.5, entropywindow=50, entropyk=5, trimq=25, minlen=50), with adaptors and phix removed with kmer right trimming, kmer size of 23, Hamming distance 1 (allowing one mismatch), quality trimming of both sides of the read, filtering of reads with an average entropy <0.5 with entropy kmer length of 5 and a sliding window of 50, trimming to a Q25 quality score, and removal of reads with length <50 bp; 2) deduplication (bbdupe.sh); 3) coverage normalization (bbnorm.sh min=3) such that depth <3x was discarded; 4) error correction (tadpole.sh mode=correct); and 5) sampling (reformat.sh). All pre-processing was carried out using BBtools v. 38.37 for short reads. For long reads, we used filtlong v. 0.2.0 (fitlong --min_length 1000 --keep_percent 90 --length_weight 10) to discard any read <1 kb and the worst 10% of read bases, as well as to weigh read length as more important when choosing the best reads. Hybrid assembly was performed by Unicycler v. 0.4.8 with default parameters using pre-processed reads. After assembly, the contigs from the assembler were scaffolded by LRScaf v. 1.1.9 with default parameters. If the initial assembly did not produce the complete genome, gaps were filled by long reads TGS-GapCloser v. 1.0.1 with default parameters. If no long reads were available, short paired-end reads were assembled de novo using SPAdes v. 3.13.1 with the --careful option to reduce the number of mismatches and short indels during assembly of small genomes. Assembly quality was assessed based on the CheckM v. 1.1.2 lineage. If contamination was detected, contigs corresponding to the genome of interest were extracted from the contaminated assembly using MetaBAT2 v. 2.2.14 with default parameters. Finally, the assembled genomes were evaluated using the same criteria as the RefSeq assemblies, and the assembly for each species with the best overall quality metrics was chosen as the reference assembly. This procedure resulted in the replacement of eight genomes: two from a PacBio/Illumina hybrid assembly, one from a Nanopore/Illumina hybrid assembly, one from a reference-guided Illumina assembly, and four from short-read assemblies of the respective isolate samples followed by binning (Table S2). Generating and normalizing the NinjaMap database The first step in the pipeline was to assess the uniqueness of each genome in the community. We generated error-free in silico reads such that each genome was uniformly covered at 10x depth. Each such genome read set was aligned to all genomes in the community. The uniqueness of a genome was defined as the fraction of the genome that did not have reads cross-mapped from another strain; uniqueness values were between 0 and 1, such that more unique genomes have a value closer to 1. The uniqueness value of a strain was used to normalize its final relative abundance in any community sample. All genome sequences were combined into one fasta file and a Bowtie2 v. 2.3.5.1 index was computed for future alignments. The database and strain weights were recomputed each time the community or a genome was updated. NinjaMap alignment scoring A primary goal of the NinjaMap algorithm is to analyze and tabulate every input read. A successful match was defined as a read aligned to a genome at 100% identity across 100% of the read length. If a read was uniquely matched to a single strain, its mate pair was also recruited as long as it had at least one match to the same strain. If exactly 1 strain was a perfect match for both reads, the pair was considered a “primary pair” and a score of 1 was given for each read. If >1 or 0 strains were a match for both reads, both reads were placed in escrow and analyzed separately as described below. By prioritizing paired-read scoring, noise was significantly reduced while ensuring that as many reads as possible were considered for abundance estimates. Once preliminary strain abundances were calculated based on primary pairs, reads in escrow were then assigned fractionally to the strains to which they aligned perfectly. The fractional assignment was calculated based on the primary read abundances of each strain, normalized by the size of the unique region of each genome within the database, such that the total contribution for a read was 1. In some cases, an individual escrowed read matched to a strain without any matches to primary pairs; such reads were discarded and not used in the final estimates. Finally, the total score for each strain in the database was normalized by the number of reads that aligned to the database, so that the relative abundances of all strains summed to 1. Generating simulated sequencing reads In silico data were generated to evaluate the Ninjamap algorithm in the absence of genome assembly errors and sequencing quality issues. Grinder v. 0.5.4 was applied to each genome to generate error-free reads with the following parameters: -read_distribution 140, -insert_size 800, -mate_orientation FR, -delete_chars ‘-~*NX’, -mutation_dist uniform 0, -random_seed 1712, abundance_model uniform, -qual_levels 33 31, -fastq_output 1. The -coverage_fold parameter was adjusted based on the cases described below. Uniform abundance isolate dataset This dataset was created to test the sensitivity and specificity of the algorithm against our database of genomes. In silico data were generated for each genome with uniform coverage of 10x or 100x. Variable abundance community dataset In silico reads were generated for each genome at 10x, 0.1x, and 0.001x uniform coverage. Three datasets of mixed community reads were generated including every genome at a coverage randomly selected from the three levels. The observed relative abundance of each genome in our database was calculated using the NinjaMap algorithm and compared to the expected relative abundance based on coverage level, which ranged from ~3×10−6 to 0.03. Augmenting the NinjaMap database The additional genomes added to hCom1 to create hCom2 were evaluated using the same criteria as the RefSeq assemblies, and the assembly for each species with the best overall quality metrics was chosen as the reference assembly. This procedure resulted in the replacement of 85 genomes: two obtained from a PacBio/Illumina hybrid assembly, 69 from a Nanopore/Illumina hybrid assembly, one from a reference-guided Illumina assembly, and seven from short-read assemblies of the respective isolate samples followed by binning (Table S2). Metagenomic read mapping Paired-end reads from each sample were aligned to the hCom1 or hCom2 database using Bowtie2 with maximum insert length (-maxins) set to 3000, maximum alignments (-k) set to 300, suppressed unpaired alignments (--no-mixed), suppressed discordant alignments (--no-discordant), suppressed output for unaligned reads (--no-unal), required global alignment (--end-to-end), and using the “--very-sensitive” alignment preset (command: --very-sensitive -maxinsX 3000 -k 300 --no-mixed --no-discordant --end-to-end --no-unal). The output was piped into Samtools v. 1.9, which was used to convert the alignment output from SAM output stream to BAM format and then sort and index the BAM file by coordinates. Alignments were filtered to only keep those with >99% identity for the entire length of the read. The median percentage of unaligned reads was 4.95% (range 4.10%−8.35%). To assess the origin of these reads, we performed a BLAST v. 2.11.0+ search through the ncbi/blast:latest docker image with parameters “-outfmt ‘6 std qlen slen qcovs sscinames staxids’ -dbsize 1000000, -num_alignments 100” from a representative sample against the ‘NCBI - nt’ database from 2021–02-16. We then filtered the BLAST results to obtain the top hits for a given query. Briefly, the script defined top hits as ones that had an e-value ≤1e-30, percent identity ≥99% and were within 10% of the best bit score for that query. To visualize and summarize the output, we used the ktImportTaxonomy script from the Krona package with default parameters. Reads were aggregated by NCBI taxon ID and separately by genus. We found that most of the hits were from taxa that are closely related to the organisms in our community, while others were from the mouse genome. We conclude that our experiments did not suffer from any appreciable level of contamination. Sensitivity of NinjaMap Our data provide several quantitative estimates of the sensitivity of NinjaMap: First, when considering the mismapping of sequencing data for a single isolate to other strains, error rates were typically 10−5-10−4 for both simulated and actual (Data S2) data. The expected contribution to relative abundance from mismapping in a community as calculated from the mismapping rates of isolates was also typically ~10−5-10−4 (Data S2). Thus, for a strain in a 100-member community with average relative abundance of 10−2, the contribution to relative abundance from mismapping is likely to be even lower (10−7-10−6). Second, in strain dropout experiments that are not included in this version of manuscript, strains with average relative abundance ~10−5 (e.g., A. stercorihominis, S. heliotrinireducens, C. stercoris, A. putredinis), displayed similar coefficients of variation (standard deviation/mean) as more abundant strains, indicating that noise to due to mismapping was small. In addition, these strains were not detected by Ninjamap in their own dropouts, indicating that the sensitivity to them was well below 10−5. The maximum level of a strain in its own dropout that we think is real signal is 10−6. Third, as our in silico data show (Data S2), mismapping does occur (for instance, due to inaccuracies in some genome assemblies such that a missing/contaminated sequence will result in the strain 1 assembly mapping to other strains that contain those sequences). In most cases we expect, based on our isolate sequencing data, that mismapping will contribute a very low fraction of a species’ reported relative abundance. With those estimates in mind, we have set a permissive lower threshold for the NinjaMap data (10−7) and have adjusted all of our plots to make that the lower limit. We acknowledge that it is possible, in rare cases, for an abundant strain that displays an unusually high degree of mismapping to introduce noise that would interfere with real low-abundance strain signals. We expect that this problem will abate as some of our lower-quality genome assemblies are improved. Amino acid dropout experiment and data analysis Strains were passaged by diluting 1:10 into fresh growth medium every 24 h for 2–3 days. The day before amino acid dropout experiments, cultures were diluted 1:10 into 1 mL of fresh medium and grown for 24 h as inoculation working stocks. Strains were diluted 1:10 into 150 μL of the appropriate culture medium and a plate reader was used to measure absorbance at 600 nm. Stocks were diluted to a final OD600 of 0.1 using fresh growth medium. If a culture did not reach an OD600 of 0.1, the entire culture was used as the working stock for community assembly. Equal volumes of each stock were pooled to create a 104-member synthetic community. The community was centrifuged at 5000 × g for 5 min, washed, and resuspended in an equivalent volume of PBS to generate the pooled community working stock. SAAC medium was made containing all amino acids at 1 mM concentration except for cysteine, which was added at 4.126 mM (Table S5). Twenty similar media were made in which one amino acid at a time was removed. 1.6 mL of each medium were aliquoted in triplicate and inoculated with the pooled community at a 1:10 or 1:100 dilution. Four 100-μL aliquots of each culture were collected at 48 h and processed for metagenomic sequencing. Read fractions were rescaled to sum to 1, thereby reflecting the relative abundances of reads mapped to one of the 104 genomes in our database. The effect of removal of an amino acid on a strain was estimated by calculating the z score , where Rk,j is the log10(relative abundance) of strain k in sample j and μk and σk are the mean and standard deviation, respectively, of log10(relative abundance) for strain k across all samples except the cysteine dropout. The cysteine dropout sample was excluded from the calculation of μk and σk because this sample was an obvious outlier. We expect that the outlier effect of cysteine dropout is likely due to its role in maintaining redox balance. We used z-scores rather than a direct comparison to the complete medium because most strains exhibited only small variations in relative abundance in most conditions. Data points that could be explained by mismapping were removed. Putative interactions were identified based on |zj,k|>2, i.e. amino acid dropouts that changed the log10(relative abundance) of strain k by ≥2 standard deviations relative to its mean. A few strains varied in relative abundance by several orders of magnitude; as a result, σk was large, so putative interactions would be missed using z-scores. To identify clusters of strains that responded similarly or amino acids that elicited a similar response, we normalized Rk,j for each strain across samples by subtracting μk and performed hierarchical clustering of both strains and amino acid dropouts on a dataset including strains that were detected in all 20 amino acid dropout samples and in complete SAAC medium. Constructing C. sporogenes mutants C. sporogenes deletion mutants were constructed using a previously reported protocol; the strains and primers used for each mutant are listed in Table S6. In brief, from plasmids CS_OTC and CS_ADI, which harbor targeting and repair templates unique to each gene, we amplified DNA sequences encoding the gRNA locus (the gRNA plus adjacent elements and the repair template) and ligated the amplicon into the pMTL82254 backbone. These repair templates consist of 700- to 1200-bp sequences flanking the 40- to 100-bp sequence targeted for excision. To construct the Δadi strain, a gRNA fragment was purchased from Quintara and amplified with primers fwd_pMTL82254_NotI and rev_gRNA_flank1. The two flanking regions were amplified from C. sporogenes genomic DNA using the primers 5rev_flank1 and 5fwd_flank1_flank2 for flank 1 and 5rev_flank1_flank2 and 5fwd_flank1_flank2 for flank 2. Next, the flanking regions were joined by amplifying with primers fwd_gRNA_flank1 and rev_flank2. The amplified gRNA fragment was attached to the joined flank construct by amplifying with primers fwd_pMTL82254_NotI and rev_pMTL82254_AscI. Finally, the pMTL82254 plasmid and the construct containing the gRNA, flank1, and flank2 regions were digested with NotI and AscI and ligated with T4 ligase (NEB). The final construct was named CS_ADI. To make the Δotc strain, the gRNA fragment was purchased from Quintara and amplified with fwd_pMTL82254_NotI and rev_OTC_gRNA_flank1. The two flanking regions were amplified from C. sporogenes genomic DNA using the primers fwd_OTC_gRNA_flank1 and rev_OTC_flank1_flank2 for flank 1 and fwd_OTC_flank1_flank2 and rev_OTC_flank2 for flank 2. Next, the flanking regions were joined by amplifying with the primers fwd_OTC_gRNA_flank1 and rev_OTC_flank2. The amplified gRNA fragment was attached to the joined flank construct by amplifying with fwd_pMTL82254_NotI and rev_pMTL82254_AscI. Finally, the pMTL82254 plasmid and the construct containing the gRNA, flank1, and flank2 regions were digested with NotI and AscI and ligated with T4 ligase (NEB). The final construct was named CS_OTC. CS_OTC or CS_ADI was electroporated into Escherichia coli S17 cells and conjugated into C. sporogenes strain ATCC 15579 using a previously described method. In brief, a single colony of wild-type C. sporogenes was used to inoculate 2 mL of TYG broth (3% (w/v) tryptone, 2% (w/v) yeast extract, 0.1% (w/v) sodium thioglycolate) and incubated anaerobically in an atmosphere consisting of 10% CO2, 5% H2, and 85% N2. E. coli S17 cells with CS_OTC or CS_ADI were grown in LB broth supplemented with 250 μg/mL erythromycin at 30 °C with shaking at 225 rpm. After 17–24 h, 1 mL of this culture was centrifuged at 1000 × g for 1 min and washed twice with 500 μL of PBS (40 mM potassium phosphate, 10 mM magnesium sulfate, pH 7.2). The pellet was transferred into the anaerobic chamber and 250 μL of C. sporogenes overnight culture were added and mixed with the cell pellet. Thirty-microliter aliquots of the mixture were plated on a pre-reduced TYG agar plate in eight spots. The plate was tilted to coalesce the spots and incubated for 24 h. Biomass from the plate was scraped using a sterile inoculation loop and suspended in 250 μL of pre-reduced PBS. One hundred microliters of the cell suspension were plated on TYG agar containing 10 μg/mL erythromycin and 250 μg/mL D-cycloserine to isolate single colonies. One colony was picked, sequence verified, and used as the starting point for the next conjugation. In the second conjugation, E. coli S17 cells containing pMTL83153_fdx_Cas9 were grown in LB broth supplemented with 25 μg/mL chloramphenicol at 30 °C with shaking at 225 rpm. After washing, the pellet was moved into the anaerobic chamber and 250 μL of an overnight culture of C. sporogenes harboring the CS_OTC vector were thoroughly mixed with the E. coli cell pellet. Thirty-microliter aliquots of the mixture were plated on a pre-reduced TYG agar plate in eight spots. The plate was tilted to coalesce the spots and incubated for 72 h. Biomass from the plate was scraped using a sterile inoculation loop and resuspended in 250 μL of pre-reduced PBS. One hundred microliters of the cell suspension were plated on each of two pre-reduced TYG agar plates containing 10 μg/mL erythromycin, 15 μg/mL thiamphenicol, and 250 μg/mL D-cycloserine. C. sporogenes colonies typically appeared after 36–48 h, and 8–10 colonies were re-streaked on pre-reduced TYG agar plates containing 10 μg/mL erythromycin, 15 μg/mL thiamphenicol, and 250 μg/mL D-cycloserine to isolate single colonies. The isolated colonies were used to inoculate pre-reduced TYG broth supplemented with 10 μg/mL erythromycin and 15 μg/mL thiamphenicol, and genomic DNA was isolated using a Quick DNA fungal/bacterial kit (Zymo Research). Primers ADI_532_fwd and ADI_22_rev or OTC_5_up_fwd and OTC_930_down_rev (Table S6) were used to verify deletions. ATP assay An aliquot from a frozen stock of C. sporogenes was used to inoculate 5 mL of TYG broth and grown to stationary phase (~24 h). Cells were diluted 1:1000 into 20 mL of TYG broth and grown to late-log phase (~16 h). Cells were harvested by centrifugation (5,000 × g for 10 min at 4 °C) and washed twice with 20 mL of pre-reduced PBS. One hundred microliters of cells were seeded into rows of a 96-well microtiter plate (12 wells per condition). Two hundred microliters of pre-reduced 2 mM substrate (arginine) in phosphate washing buffer, or 200 μL of buffer alone, were dispensed into rows of a separate 96-well microplate. At t=0, 100 μL of substrate or buffer were added to the cells and mixed gently by pipetting. At t=−5 min, −1 min, 30 s, 1 min, 2 min, 5 min, 10 min, 20 min, 30 min, 45 min, 60 min, and 90 min, 10 μL of cells were extracted and mixed with 90 μL of DMSO to quench the reaction and liberate cellular ATP. For the time points t=−5 min and −1 min (prior to the addition of buffer or substrate), 5 μL of cell suspension were harvested and 5 μL of either buffer or substrate were added to the cell-DMSO mixture to bring the total volume to 100 μL. The ATP content from 10 μL aliquots of lysed cells was measured using a luminescence-based ATP determination kit (Invitrogen, Cat. #A22066). Absolute ATP levels were calculated using a calibration curve with known concentrations of ATP. Reproducibility and colonization experiments Groups of five 6- to 8-week-old female germ-free SW mice were colonized for 4 weeks with hCom1 or hCom2 and fecal pellets were sampled after 4 weeks. These fecal pellets were subjected to DNA extraction, metagenomic sequencing, and NinjaMap read mapping to estimate strain relative abundances. Augmentation experiment Individual strains were cultured in their respective media (Key Resources Table), normalized, and pooled to form the synthetic community as described in ‘Preparation of bacterial synthetic community.’ Mice were orally gavaged with a freshly prepared culture of the synthetic community three days in a row and were sampled weekly for 4 weeks. After 4 weeks, mice were orally gavaged with fecal sample from one of three healthy human donors (one donor per 5 mice) or PBS as a control. For the fecal challenge experiment with samples Hum4–6, mice were orally gavaged only once with a frozen, then thawed culture of hCom2. MIDAS analyses MIDAS was run using the database v. 1.2 with default parameters on each library. To determine which invading species to use in augmenting hCom1, a relative abundance threshold of 10−4 and minimum read count of 2 were applied. A species was selected to augment hCom1 if it was present above the threshold in ≥2 of the 3 challenge groups. For all other analyses, the MIDAS output was used without any filtering (STAR Methods). MIDAS sensitivity analysis To determine the sensitivity of MIDAS for analyses of strains in our communities, we generated error-free 150-bp paired-end reads in silico for each genome. Each simulated read set was individually processed by MIDAS. While most genomes were identified correctly and assigned to a single MIDAS bucket, 22 strains from hCom1 and hCom2 cross-mapped to multiple buckets. As expected, MIDAS was unable to separate closely related strains, with 14 MIDAS buckets from hCom1 and 17 from hCom2 recruiting reads from more than one strain (Table S7). Analyzing strain displacement versus persistence To determine the coverage of genomes from hCom1 and hCom2 in week 8 samples after a fecal challenge, reads were aligned to two Bowtie2 databases, hCom1 (version SCv1.2) and hCom2 (version SCv2.3). Each alignment file was filtered to only include alignments with 99% or 100% identity at 100% alignment length. Alignments at 99% identity were performed to recruit reads from any strain that was very similar but not identical. The breadth of coverage (i.e., the percentage of the genome covered by at least 1 read) and the depth of coverage (the average number of reads covering positions in the genome) was calculated for each organism in each sample at both identity thresholds. Results from the MIDAS analysis of each sample were combined with MIDAS bucket strain contributions from the sensitivity analysis and strain coverage metrics. Most of the high abundance strains had high coverage depth and breadth of coverage at 99% and 100% identity, suggesting that the original strains (or highly similar variants) were present in the samples at week 8. Bacterial load estimates Six to 8-week-old female germ-free SW mice were colonized for 4 weeks with hCom1, hCom2, or one of two human fecal samples, and fecal pellets were sampled after 4 weeks. Female germ-free and conventional SW mice of the same age were sampled at the same time. Each colonization cohort contained 5 mice. For each mouse, two fecal pellets were collected in a pre-weighed 1.5-mL Eppendorf tube containing 200 μL of transport medium. After collection and weighing, the mass of the tube prior to sampling was subtracted to calculate fecal weight. Samples were transferred into the anaerobic chamber and each pellet was crushed with a 1-mL pipette tip and vortexed at maximum speed for 30 s to create a homogenous mixture. This mixture was serially diluted 1:10 twelve times; each dilution was plated on pre-reduced Columbia blood agar plates and incubated at 37 °C. After 24 h, colonies were counted for each dilution. Fecal pellets were also subjected to DNA extraction, metagenomic sequencing, and NinjaMap analysis to estimate strain relative abundances. Immune profiling Six to 8-week-old female germ-free C57BL/6 mice were colonized for 2 weeks with hCom2, a human fecal sample, or PBS as a negative control and fecal pellets were collected after 2 weeks. Mice were then sacrificed, colonic tissue was dissected, and immune cells were isolated using the Miltenyi Lamina Propria kit and Gentle MACS dissociator. Immune cells were stained using the antibodies listed in the Key Resources Table at 1:200 dilution and assessed using a LSRII flow cytometer. Fecal pellets were subjected to DNA extraction, metagenomic sequencing, and NinjaMap analysis to estimate strain relative abundances. Metabolomics Cohorts of 6–8-week-old female germ-free SW mice were colonized for 4 weeks with hCom1, hCom2, or one of two human fecal samples. Urine and fecal pellets were sampled after 4 weeks. Female germ-free and conventional SW mice of the same age were sampled at the same time. Fecal pellets were subjected to DNA extraction, metagenomic sequencing, and NinjaMap analysis to estimate strain relative abundances. Sample preparation for LC/MS analysis For urine samples, 5 μL of urine were diluted 1:10 with ddH2O and mixed with 50 μL of internal standard water solution (20 μM 4-chloro-L-phenylalanine and 2 μM d4-cholic acid). After centrifugation for 15 min at 4 °C and 18,000 × g, 50 μL of the resulting mixture were used for quantification of creatinine using a Creatinine Assay Kit (Abcam, Cat. #ab204537) as described in the manufacturer’s protocol. The remaining 50 μL were filtered through a Durapore PVDF 0.22-μm membrane using Ultrafree centrifugal filters (Millipore, UFC30GV00), and 5 μL were injected into the LC/MS. For fecal pellets, ~40 mg wet feces were pre-weighed into a 2-mL screw top tube containing six 6mm ceramic beads (Precellys® CK28 Lysing Kit). Six hundred microliters of a mixture of ice-cold acetonitrile, methanol, and water (4/4/2, v/v/v) were added to each tube and samples were homogenized by vigorous shaking using a QIAGEN Tissue Lyser II at 25 Hz for 10 min. The resulting homogenates were subjected to centrifugation for 15 min at 4 °C and 18,000 × g. One hundred microliters of the supernatant were combined with 100 μL of internal standard water solution (20 μM 4-chloro-L-phenylalanine and 2 μM d4-cholic acid). The resulting mixtures were filtered through a Durapore PVDF 0.22-μm membrane using Ultrafree centrifugal filters (Millipore, UFC30GV00), or a MultiScreen Solvinert 96 Well Filter Plate (Millipore, MSRLN0410), and 5 μL were injected into the LC/MS. Liquid chromatography/mass spectrometry (LC/MS) For aromatic amino acid metabolites, analytes were separated using an Agilent 1290 Infinity II UPLC equipped with an ACQUITY UPLC BEH C18 column (1.7 μm, 2.1 mm × 150 mm, Waters Cat. #186002352 and #186003975) and detected using an Agilent 6530 Q-TOF equipped with a standard atmospheric-pressure chemical ionization (APCI) source or dual Agilent jet stream electrospray ionization (AJS-ESI) source operating under extended dynamic range (EDR 1700 m/z) in negative ionization mode. For the APCI source, the parameters were as follows: gas temperature, 350 °C; vaporizer, 350 °C; drying gas, 6.0 L/min; nebulizer, 60 psig; VCap, 3500 V; corona, 20 μA; and fragmentor, 135 V. For the AJS-ESI source, the parameters were as follows: gas temperature, 350 °C; drying gas, 10.0 L/min; nebulizer, 40 psig; sheath gas temperature, 300 °C; sheath gas flow, 11.0 L/min; VCap, 3500 V; nozzle voltage, 1400 V; and fragmentor, 130 V. Mobile phase A was H2O with 6.5 mM ammonium bicarbonate, and B was 95% MeOH with 6.5 mM ammonium bicarbonate. Five microliters of each sample were injected via autosampler into the mobile phase, and chromatographic separation was achieved at a flow rate of 0.35 mL/min with a 10 min gradient condition (t=0 min, 0.5% B; t=4 min, 70% B; t=4.5 min, 98% B; t=5.4 min, 98% B; t=5.6 min, 0.5% B). For bile acids, compounds were separated using an Agilent 1290 Infinity II UPLC equipped with a Kinetex C18 column (1.7 μm, 2.1 mm × 100 mm, Phenomenex, Cat. #00D-4475-AN) and detected using an Agilent 6530 Q-TOF equipped with a dual Agilent jet stream electrospray ionization (AJS-ESI) source operating under extended dynamic range (EDR 1700 m/z) in negative ionization mode. The parameters of the AJS-ESI source were as follows: gas temperature, 300 °C; drying gas, 7.0 L/min; nebulizer, 40 psig; sheath gas temp, 350 °C; sheath gas flow, 10.0 L/min; VCap, 3500 V; nozzle voltage, 1400 V; and fragmentor, 200 V. Mobile phase A was H2O with 0.05% formic acid, and B was acetone with 0.05% formic acid. Five microliters of each sample were injected via autosampler into the mobile phase and chromatographic separation was achieved at a flow rate of 0.35 mL/min with a 32 min gradient condition (t=0 min, 25% B; t=1 min, 25% B; t=25 min, 75% B, t=26 min, 100% B, t=30 min, 100% B, t=32 min, 25% B). Online mass calibration was performed using a second ionization source and a constant flow (5 μL/min) of reference solution (119.0363 and 966.0007 m/z). The MassHunter Quantitative Analysis Software (Agilent, v. B.09.00) was used for peak integration based on retention time (tolerance of 0.2 min) and accurate m/z (tolerance of 30 ppm) of chemical standards. Quantification was based on a 2-fold dilution series of chemical standards spanning 0.05 to 100 μM (aromatic amino acid metabolites) or 0.001 to 100 μM (bile acids) and measured amounts were normalized by weights of extracted tissue samples (pmol/mg wet tissue) or creatinine level in the urine sample (μM/mM creatinine). The MassHunter Qualitative Analysis Software (Agilent, version 7.0) was used for targeted feature extraction, allowing mass tolerances of 30 ppm. E. coli colonization resistance 6–8-week-old female germ-free SW mice were orally gavaged with 200 μL of hCom1, hCom2, a fecal sample from a healthy human donor, or 12Com, or with 220 μL of hCom2+Enteromix or 12Com+Enteromix, and fecal pellets were sampled weekly for 4 weeks. After 4 weeks, mice were orally gavaged with a 200-μL mixture containing 109 CFUs of EHEC and fecal pellets were sampled on days 0 (pre-EHEC infection), 2, 4, 6, and 14. After collection, all fecal samples were prepared aerobically. Specifically, fecal pellets were weighed and 10X (w/v) PBS was added to the tube. Each pellet was crushed with a 1-mL pipette tip and vortexed at maximum speed for 30 s to create a homogenous mixture. This mixture was serially diluted 1:10 six successive times and 5 μL of each dilution were plated on McConkey-Sorbitol agar. Plates were incubated at 37 °C for 16–18 h. The resulting colonies were enumerated and verified to be EHEC by metagenomic sequencing. Fecal pellets were also subjected to DNA extraction, metagenomic sequencing, and NinjaMap analysis to estimate strain relative abundances. Estimation that hCom2 is within two-fold of native-scale complexity We came to this estimate in two ways, both of which have important caveats but generally support our claim. A compilation of estimates from the literature. Historic (1970–1980s) estimates were based on traditional culture-based techniques (Guarner and Malagelada, 2003). For example, Moore et al attempted to Gram-stain and culture (aerobically and anaerobically) all of the organisms from 20 healthy human stool samples (Holdeman, 1975). This attempt yielded 1147 unique strains and 113 morphologically and metabolically distinct organisms, which (per their statistical estimate) accounted for 94% of the viable cells in volunteer stool biomass. More recent metagenomic sequencing analyses have expanded upon these diversity estimates. One study performed metagenomic sequencing on 124 European volunteers with species-level resolution, and uncovered 1000–1150 unique bacterial species, 18 of which were detected in all individuals, 57 in ≥90% and 75 in ≥50% of individuals (the authors termed these the ‘common bacterial core species’). An analysis of the human microbiome metagenomic sequencing database involving 81 healthy US volunteers with strain-level resolution showed that there were 79 shared strains in 100% of individuals and 525 unique strains. Interestingly an analysis of the supplemental data showed that the 79 shared strains from the analysis in encompass all 75 strains of the set of “common bacterial core species” in. Further analysis of the supplemental information and tables from showed that metagenomic sequencing uncovered 108–348 unique strains per individual. These metagenomic observations have been recapitulated with 16S sequencing. Faith et al performed low-error amplicon 16S sequencing (LEA-Seq) of the V4 region in combination with metagenomic sequencing of 37 stool microbiomes from healthy US individuals. This study had strain-level resolution, and review of the supplemental information and tables showed that study individuals harbored 195–243 unique strains; the authors posited that “…on average 60% of the approximately 200 microbial strains harbored in each adult’s intestine is retained in their host over the course of a five-year sampling period.” The caveats of these estimates are that three elements varied in each case: 1) the samples assessed, 2) the methods used to make the estimate, and 3) the level of resolution at which a taxon was called. Thus, the literature examples lack internal consistency. Our own estimate. Using MIDAS, we performed an analysis of the average number of species-level bins in each of the samples included in this study, as shown below: Sample Number of MIDAS bins hCom1 59 hCom2 79 H1-FMT (humanized mice) 85 H2-FMT (humanized mice) 87 H3-FMT (humanized mice) 94 H1-fecal (fecal sample) 145 H2-fecal (fecal sample) 199 H3-fecal (fecal sample) 180 The number of MIDAS bins identified in fecal samples from mice colonized with hCom1 or hCom2 was between 63% (59/94) and 93% (79/85) of the number of MIDAS bins in mice colonized with Hum1–3, and between 30% (59/199) and 54% (79/145) of the number of MIDAS bins in Hum1–3 fecal samples. The most important caveat of this analysis is that it is based on the taxonomic resolution of a MIDAS ‘bin’, which corresponds roughly to the species level. As a consequence, strain-level variation (including multiple strains of a species) is not taken into account, and any species that are not present in the MIDAS database are not counted. Having noted those caveats, both estimates are consistent with the view that hCom2 is within ~2-fold of the species-level complexity of a native community. QUANTIFICATION AND STATISTICAL ANALYSIS For the analysis of communities in vitro, the statistical details of experiments can be found in the figure legends. Reported n values are the total samples (cultures) per group. Unless otherwise stated, p-values were not corrected for multiple hypothesis testing. Benjamini-Hochberg corrections, hypergeometric tests, Student’s t-tests (unpaired or two-tailed), and Kruskal-Wallis tests were performed in MATLAB. For the analysis of communities in vivo, relative abundances were calculated from the output of NinjaMap or MIDAS without rarefying the total number of reads across samples. Relative abundances at each time point were averaged across the 4–5 mice that were co-housed in the same isolator and subjected to the same fecal challenge. Correlation coefficients were calculated after setting undetected bins to a minimum value (10−6 and 10−7 for MIDAS and NinjaMap, respectively) and performing a log10 transformation. Mice were not considered in fecal challenge analyses if sequence reads in a sample from any week were of poor quality or abnormally variable. This filtering affected one of five mice in all groups except for fecal challenge experiment 1, Hum3 (2 mice affected) and fecal challenge experiment 2, Hum1 (0 mice affected). Further details of statistical analyses can be found in the corresponding figure legends. All statistical analyses and tests were performed in MATLAB, and scripts for analyses are available at https://github.com/FischbachLab. Supplementary Material DECLARATION OF INTERESTS The other authors have no competing interests. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain REFERENCES Grinder: a versatile amplicon and shotgun sequence simulator High-throughput cultivation of stable, diverse, fecal-derived microbial communities to model the intestinal microbiota SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing Model microbial communities for ecosystems biology Microbiota-mediated colonization resistance against intestinal pathogens Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile Dissecting the contribution of host genetics and the microbiome in complex behaviors Bacterial metabolism of bile acids promotes generation of peripheral regulatory T cells GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database Biosynthesis and metabolism of arginine in bacteria Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation A gut bacterial pathway metabolizes aromatic amino acids into nine circulating metabolites Predicting a human gut microbiota’s response to diet in gnotobiotic mice The long-term stability of the human gut microbiota Identifying gut microbe-host phenotype relationships using combinatorial communities in gnotobiotic mice Identifying personal microbiomes using metagenomic codes A metabolic pathway for bile acid dehydroxylation by the gut microbiome Emergent simplicity in microbial community assembly Identifying genetic determinants needed to establish a human gut symbiont in its habitat Extensive personal human gut microbiota culture collections characterized and manipulated in gnotobiotic mice Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients Depletion of microbiome-derived molecules in the host using Clostridium genetics QUAST: quality assessment tool for genome assemblies Regional variation limits applications of healthy gut microbiome reference ranges and disease models The effects of micronutrient deficiencies on bacterial species from the human gut microbiota Genetic manipulation of gut microbes enables single-gene interrogation in a complex microbiome MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies The prevalence of species and strains in the human microbiome: a resource for experimental efforts Fast gapped-read alignment with Bowtie 2 Intestinal colonization resistance Rationally designed bacterial consortia to treat chronic immune-mediated colitis and restore intestinal homeostasis Ecological and evolutionary forces shaping microbial diversity in the human intestine Commensal Enterobacteriaceae Protect against Salmonella Colonization through Oxygen Competition The Sequence Alignment/Map format and SAMtools Bracken: estimating species abundance in metagenomics data Bacteroides in the infant gut consume milk oligosaccharides via mucus-utilization pathways The devil lies in the details: how variations in polysaccharide fine-structure impact the physiology and evolution of gut microbes The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients The impact of a consortium of fermented milk strains on the gut microbiome of gnotobiotic mice and monozygotic twins Effects of diet on resource utilization by a model human gut microbiota containing Bacteroides cellulosilyticus WH2, a symbiont with an extensive glycobiome Mouse models of Escherichia coli O157:H7 infection and shiga toxin injection Microbial syntrophy: interaction for the common good An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography New insights from uncultivated genomes of the global human gut microbiome Recovery of the Gut Microbiota after Antibiotics Depends on Host Diet, Community Context, and Environmental Reservoirs The Stickland reaction Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation A multidimensional perspective on microbial interactions Adherent-invasive Escherichia coli in inflammatory bowel disease CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life A complete domain-to-species taxonomy for Bacteria and Archaea Interspecies Competition Impacts Targeted Manipulation of Human Gut Bacteria by Fiber-Derived Glycans Gut microbiota alteration is characterized by a proteobacteria and fusobacteria bloom in kwashiorkor and a bacteroidetes paucity in marasmus A human gut microbial gene catalogue established by metagenomic sequencing Lrscaf: improving draft genomes using long noisy reads Gut microbiota from twins discordant for obesity modulate metabolism in mice Environment dominates over host genetics in shaping human gut microbiota Gut microbiome influences efficacy of PD-1-based immunotherapy against epithelial tumors Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation Dissimilatory amino Acid metabolism in human colonic bacteria The ancestral and industrialized gut microbiota and implications for human health Vitamin Biosynthesis by Human Gut Butyrate-Producing Bacteria and Cross-Feeding in Synthetic Microbial Communities Pathogenic and non-pathogenic Escherichia coli colonization and host inflammatory response in a defined microbiota mouse model sourmash: a library for MinHash sketching of DNA MetaPhlAn2 for enhanced metagenomic taxonomic profiling Quantitative microbiome profiling links gut community variation to microbial load Endogenous Enterobacteriaceae underlie variation in susceptibility to Salmonella infection Deciphering microbial interactions in synthetic human gut microbiome communities Regulation of the arginine dihydrolase pathway in Clostridium sporogenes To engraft or not to engraft: an ecological framework for gut microbiome modulation with live microbes Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads Challenges in microbial ecology: building predictive understanding of community function and dynamics Fermentation of isoleucine and arginine by pure and syntrophic cultures of Clostridium sporogenes Improved metagenomic analysis with Kraken 2 Genetic determinants of in vivo fitness and diet responsiveness in multiple human gut Bacteroides The altered schaedler flora: continued applications of a defined murine microbial community Social interaction in synthetic and natural microbial communities TGSGapCloser: fast and accurately passing through the Bermuda in large genome using error-prone third-generation long reads Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon A complex gut bacterial community. (A) A phylogenetic tree of the 104 strains in the community based on a multiple sequence alignment of conserved single-copy genes. The community was designed by identifying the most prevalent strains in sequencing data from the NIH Human Microbiome Project (HMP). Colored squares indicate the phylum of each strain: Firmicutes = red, Actinobacteria = blue, Verrucomicrobia = orange, Bacteroidetes = green, and Proteobacteria = purple. Also shown are the prevalence and relative abundances of each strain in the data set from the NIH HMP (n=81 subjects). The prevalence is the fraction of subjects in which the strain was detected. The distribution of log10(relative abundance) across subjects is shown with the mean denoted by a white line for each strain. Ruminococcus bromii ATCC 27255 and Clostridium sporogenes ATCC 15579 were added to the community despite low prevalence in the HMP samples. (B) The community reaches a stable configuration quickly. The community was propagated in vitro in SAAC medium to test the stability of its composition. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point. Strains are colored according to their rank-order abundance in the community at 48 h. By 12 h, the relative abundances of strains in the community spanned six orders of magnitude and remained largely stable through 48 h. (C) Communities generated from two inocula prepared on different days (i.e., biological replicates) have a similar architecture at 48 h. (D) Communities generated from the same inoculum (i.e., technical replicates) have a nearly identical composition at 48 h. In (C) and (D), the color of each circle represents the phylum of the corresponding species, and circles with gray outlines and faint colors represent strains whose presence could be explained by read mis-mapping. Systematic analysis of strain-amino acid interactions. (A) Schematic of the amino acid dropout experiment. Frozen stocks of the 104 strains were used to inoculate cultures that were grown for 24 h, diluted to similar optical densities (to the extent possible), and pooled. The mixed culture was used to inoculate one of twenty defined media lacking one amino acid at a time. After 48 h, communities were sequenced and analyzed by NinjaMap to determine changes relative to growth in the complete defined medium. (B) Community composition is impacted by amino acid dropout. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point. Strains are colored according to their rank-order abundance in the community grown in complete defined medium (SAAC). Strains whose relative abundance could be explained by read mis-mapping from a more abundant strain in the same sample are plotted with a gray outline. Undetected strains were set to 10−7 for visualization. (C) Heat map showing the hierarchically clustered z-scores for each strain (x-axis) across amino acid dropouts (y-axis). The z-score was calculated based on the standard deviation of strain abundance across all samples except the cysteine dropout (STAR Methods). The Firmicutes L. lactis, C. sporogenes, and L. ruminis grew less robustly in the absence of Leu and Ile. Strains whose abundances could be explained by mis-mapping from a higher-abundance strain were not shown. (D) The effect of amino acid removal varies widely across amino acids. The fraction of strains with |z|>2 is shown for each amino acid dropout (n=66). (E) The absence of leucine or arginine leads to a large decrease in C. sporogenes relative abundance. Strains are colored according to their rank-order abundance in the community grown in complete defined medium. Only strains that were detected in at least one of the three samples were included (n=92). C. sporogenes is highlighted in black. L. lactis is highlighted in white. Undetected strains were set to 10−7 for visualization. (F) C. sporogenes growth in complete defined medium is dependent on the presence of arginine (Arg), and ornithine transcarbamoylase (otc) is partially responsible for Arg metabolism. Wild type C. sporogenes and a Δotc mutant were grown in complete defined medium +/− Arg. Growth curves depict the mean of 3 replicates. Error bars represent 1 standard deviation. (G) C. sporogenes requires otc to produce ATP from arginine. Intracellular ATP levels in C. sporogenes incubated in PBS containing 2 mM Arg are shown. (H) A proposed pathway for Arg metabolism in C. sporogenes. Based on these data, we propose that Arg is converted to citrulline by the putative Arg deiminase CLOSPO_00894; citrulline is then hydrolyzed to ornithine and carbamoyl phosphate by the putative ornithine transcarbamoylase CLOSPO_02415, leading to the production of ATP. Colonizing germ-free mice with a complex gut bacterial community. (A) Schematic of the experiment. Frozen stocks of the 104 strains were used to inoculate cultures that were grown for 24 h, diluted to similar optical densities (to the extent possible, STAR Methods), and pooled. The mixed culture was used to colonize germ-free Swiss-Webster (SW) mice by oral gavage. Fecal samples were collected weekly at weeks 1–5 and week 8, subjected to metagenomic sequencing, and analyzed by NinjaMap to measure the composition of the community at each time point. (B) Relative abundances for most strains are tightly distributed. Each column depicts the relative abundance of an individual strain across all mice at week 4. (C) Average relative abundances of the inoculum versus the communities at week 4. Strains in the community spanned >6 orders of magnitude of relative abundance when colonizing the mouse gut. Dots are colored by phylum according to the legend in panel B. Data represent the average of all mice in the experiment. (D) hCom1 reaches a stable configuration by week 2. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point averaged over 5 mice co-housed in a cage. Strains are colored according to their rank-order relative abundance at week 4. Challenging hCom1 with human fecal communities to identify strains that fill open niches. (A) Schematic of the experiment. Mice were colonized by freshly prepared hCom1 and housed for four weeks, presumably filling the metabolic and anatomical niches accessible to the strains in the community. At the beginning of week 5, the mice were challenged with one of three fecal communities from a healthy human donor or with PBS as a control; we reasoned that fecal strains that would otherwise occupy a niche already filled by hCom1 would be excluded, whereas fecal strains whose niche was unfilled would be able to cohabit with hCom1. After four additional weeks, we used metagenomic sequencing coupled with MIDAS to analyze community composition from fecal pellets collected at weeks 1–5 and 8. We then identified strains that colonized in the presence of hCom1 to augment the community to create hCom2, which were then used for another round of challenge experiments (Figure 5). (B) hCom1 is broadly but not completely resistant to fecal challenge. All plots represent MIDAS bins, a rough proxy for species-level taxa. Top row: blue squares in the waffle plots indicate species that derive from hCom1, and gray squares represent species from the fecal communities. Bottom row: pie charts representing the total relative abundance of MIDAS bins that derive from hCom1 versus the fecal communities. An average of 89% of the genome copies from week 8, comprising 58% of the MIDAS bins, derived from hCom1. The remaining 11% of the genome copies, and 42% of the MIDAS bins, represent new species that joined hCom1 from one of the fecal samples. (C) Despite the addition of new strains, the architecture of the community remains intact. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point averaged over the 5 co-housed mice that were challenged with fecal community Hum1. Strains are colored according to their rank-order relative abundance at week 4. Gray circles represent invading species derived from fecal community Hum1, defined as any species not present in weeks 1–4 in the group of mice shown. (D) The relative abundances of the hCom1-derived species present post-challenge are highly correlated with their pre-challenge levels. Pearson’s correlation coefficient with respect to the average relative abundance in weeks 2 and 3 are shown for the PBS control and 3 fecal community challenges, averaged across mice that received the same challenge. Correlation coefficients are shown for the 104 hCom1 species (solid lines) and for all species including invaders (dashed lines). An augmented community with improved resilience to fecal challenge. (A) Comparing the architecture and strain-level relative abundances of hCom1 and hCom2. Each column depicts the relative abundance of an individual strain from hCom2 across all samples at week 4. 100 of the 119 strains were detected; those that are new to hCom2 are colored red. (B) Averaged relative abundances of the strains in hCom1 versus hCom2 at week 4. Strains that are new to hCom2 are indicated by a gray outline. Dots are colored by phylum according to the legend in panel B. (C) The architecture of hCom2 is largely unaffected by fecal challenge with Hum1–3. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point averaged over the 5 co-housed mice that were challenged with fecal community Hum1. Strains are colored according to their rank-order relative abundance at week 4. Gray circles represent invading species, defined as any species not present in weeks 1–4 in the group of mice shown. (D) Left: hCom2 is more resilient to fecal challenge than hCom1. Top row: blue squares in the waffle plots indicate MIDAS bins that derive from hCom2; gray squares represent MIDAS bins from the fecal communities. Bottom row: pie charts representing the percentage of MIDAS bins that derive from hCom2 versus the fecal communities. An average of 96% of the genome copies (and 81% of the MIDAS bins) come from hCom2 in the Hum1–3 challenges, demonstrating that the resilience of the community was improved markedly by augmentation with strains identified from the initial challenge (Figure 4). Right: hCom2 is broadly resilient to challenge by unrelated fecal samples (Hum4–6). In these challenges, an average of 81% of the genome copies (and 58% of the MIDAS bins) come from hCom2. (E) Nearly all invading strains at week 8 were repeat invaders from the first fecal challenge (Table S4). The dots representing invading strains are shown in full color; dots representing hCom2-derived strains are partially transparent. Dots that represent repeat invaders from the first fecal challenge experiment have a thick black border. (F) The relative abundances of the hCom2-derived species present post-challenge are highly correlated with their pre-challenge levels. Pearson’s correlation coefficient with respect to the average relative abundance in weeks 3 and 4 are shown for the PBS control and 3 fecal community challenges, averaged across mice that received the same challenge. Correlation coefficients are shown for the 119 species in hCom2 (solid lines) and for all species including invaders (dashed lines). (G) hCom2 resembles a fecal consortium more closely than hCom1. Averaged relative abundances of MIDAS bins are shown for hCom1- and hCom2-colonized mice versus mice colonized by a fecal community from one of three healthy human donors (Hum1–3). The phylum-level architecture of hCom2 is more closely correlated to that of humanized mice than hCom1 (Figure S3). (H) Pairwise correlation coefficients of phylum-level relative abundance vectors were higher between hCom2-colonized and Hum1–3 humanized mice than between hCom1-colonized and Hum1–3 humanized mice. hCom2-colonized mice are phenotypically similar to humanized mice. (A) Schematic of the experiment. Germ-free SW mice were colonized with freshly prepared hCom2 or a fecal sample from a healthy human donor. One cohort of mice was sacrificed at two weeks for immune cell profiling; another was sacrificed at four weeks for targeted metabolite analysis. (B) The architecture of hCom2 in mice is highly reproducible. Left: community composition is highly similar across four biological replicates. Each dot is an individual strain; the collection of dots in a column represents the community at 4 weeks averaged over 5 mice co-housed in a cage. Strains are colored according to their average rank-order relative abundance across all samples. Right: Pearson’s pairwise correlation coefficients for technical and biological replicates. (C) hCom2-colonized, hCom1-colonized, and humanized mice have similar bacterial cell densities in vivo. Fecal samples from hCom2-colonized, hCom1-colonized, humanized, specific pathogen-free (SPF), or germ-free (GF) mice were homogenized and plated anaerobically on Columbia Blood Agar to enumerate colony forming units. (D) Immune cell types and numbers were broadly similar between hCom2-colonized and humanized mice. Colonic immune cells were extracted from hCom2-colonized, humanized, or germ-free mice (all C57BL/6), stained for cell surface markers, and assessed by flow cytometry. Statistical significance was assessed using a Student’s two tailed t-test (**: p<0.05). (E) hCom2-colonized mice and humanized mice have a similar profile of microbiome-derived metabolites. Urine samples from hCom2-colonized and humanized mice were analyzed by targeted metabolomics to measure a panel of aromatic amino acid metabolites by LC-MS. Statistical significance was assessed using a Student’s two tailed t-test (*: p<0.05; **: p<0.001). (F) Bile acids were extracted from fecal pellets collected from hCom2-colonized and humanized mice and were quantified by LC-MS. Statistical significance was assessed using a Student’s two tailed t-test (*: p<0.05; **: p<0.001). hCom2 exhibits colonization resistance against enterohemorrhagic E. coli. (A) Schematic of the experiment. We colonized germ-free SW mice with freshly prepared hCom2 or one of two other communities: a 12-member synthetic community (12Com) or a fecal community from a healthy human donor. hCom2 and 12Com do not contain any Enterobacteriaceae; to test whether non-pathogenic Enterobacteriaceae enhance colonization resistance to EHEC, we colonized two additional groups of mice with variants of hCom2 and 12Com to which a mixture of seven non-pathogenic Enterobacteriaceae strains were added (six E. coli and Enterobacter cloacae, Enteromix (EM)). After four weeks, we challenged with 109 colony forming units of EHEC and assessed the degree to which it colonized in two ways: by EHEC-selective plating under aerobic growth conditions, and by metagenomic sequencing with NinjaMap analysis. (B) hCom2 exhibits a similar degree of EHEC resistance to that of a fecal community in mice. Colony forming units of EHEC in mice colonized by the four different communities are shown. As expected, the fecal community conferred robust colonization resistance while 12Com did not. The addition of EM moderately improved the EHEC resistance of 12Com. Despite lacking Enterobacteriaceae, hCom2 exhibited a similar level of EHEC resistance to that of an undefined fecal community. (C) The architecture of hCom2 is stable following EHEC challenge. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point averaged over four co-housed mice. Strains are colored according to their phylum; EHEC is shown in black and members of the Enteromix community are shown in gray. (D) Schematic of the phylum dropout experiment. We colonized germ-free SW mice with four variants of hCom2, each one missing all species from the phyla Actinobacteria, Firmicutes, Proteobacteria, or Verrucomicrobia. After four weeks, we challenged with 109 colony forming units of EHEC and assessed the degree to which it colonized by EHEC-selective plating under aerobic growth conditions, and by metagenomic sequencing with NinjaMap analysis. (E) The ΔActinobacteria and ΔVerrucomicrobia communities retain the ability to resist EHEC invasion, while the ΔFirmicutes and ΔProteobacteria communities are sensitive to EHEC invasion. Right: a large survival difference in ΔFirmicutes-colonized mice compared with hCom2-colonized. (F) The architecture of the phylum dropout communities remains stable following EHEC challenge. Each dot is an individual strain; the collection of dots in a column represents the community at a single time point averaged over four co-housed mice. Strains are colored according to their phylum; EHEC is shown in black. KEY RESOURCES TABLE REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies Myeloid cells: anti-mouse Ly6c (HK1.4), FITC BioLegend Cat. #128006; RRID:AB_1186134 Myeloid cells: anti-mouse CD11b (M1/70), PerCP/Cy5.5 BioLegend Cat. #101228; RRID: AB_893232 Myeloid cells: anti-mouse CD103 (2E7), PE BioLegend Cat. #121406; RRID: AB_1133989 Myeloid cells: anti-mouse CD11c (N418), PE-Cy7 BioLegend Cat. #117318; RRID: AB_493568 Myeloid cells: anti-mouse CD317 (129C1), Alexa647 BioLegend Cat. #127106; RRID: AB_2067120 Fixable Viability dye, APC-eFluor 780 eBioscience 65-0865-14 Anti-mouse IgA (RMA-1), Biotin BioLegend Cat. #407004; RRID: AB_315079 Streptavidin, BV421 BioLegend 405225 Myeloid cells: anti-mouse I-A/I-E (M5/114.15.2), BV510 BioLegend Cat. #107636; RRID: AB_2734168 T cells and epithelial cells: anti-mouse CD45 (30-F11), BV605 BioLegend Cat. #103155; RRID: AB_2650656 Myeloid cells: anti-mouse F4/80 (BM8), BV650 BioLegend Cat. #123149; RRID: AB_2564589 anti-mouse CD16/32 (2.4G2), FC block BD Bioscience Cat. #553141; RRID: AB_394655 T cells: anti-mouse Helios (22F6), FITC BioLegend Cat. #137214; RRID: AB_10662745 B and T cells: anti-mouse CD62L (MEL-14), PerCP/Cy5.5 Biolegend Cat. #104432; RRID: AB_2285839 T cells: anti-mouse IL22 (Poly5164), PE BioLegend Cat. #516404; RRID: AB_2124255 T cells: anti-mouse Foxp3 (FJK-16s), PE-Cy7 eBioscience Cat. #25-5773-82; RRID: AB_891552 T cells: anti-mouse RORgt (B2D), APC eBioscience Cat. #17-6981-82; RRID: AB_2573254 T cells: anti-mouse CD44 (IM7), BV421 BioLegend Cat. #103040; RRID: AB_2616903 T cells: anti-mouse CD4 (RM4-5), BV510 BioLegend Cat. #100559; RRID: AB_2562608 T cells: anti-mouse CD3e (145-2C11), BV605 BioLegend Cat. #100351; RRID: AB_2565842 B cells: anti-mouse CD8a (53.6.7), BV650 BioLegend Cat. #100742; RRID: AB_2563056 Myeloid cells: anti-mouse Ly6c (HK1.4), FITC BioLegend Cat. #128006; RRID:AB_1186134 Myeloid cells: anti-mouse CD11b (M1/70), PerCP/Cy5.5 BioLegend Cat. #101228; RRID: AB_893232 Myeloid cells: anti-mouse CD103 (2E7), PE BioLegend Cat. #121406; RRID: AB_1133989 Bacterial and Virus Strains Strain Name Source Media Alistipes putredinis DSM 17216 DSMZ Chopped Meat Medium Anaerotruncus colihominis DSM 17241 DSMZ Mega Medium Bacteroides caccae ATCC 43185 ATCC Mega Medium Bacteroides coprophilus DSM 18228 DSMZ Mega Medium Bacteroides dorei 5_1_36/D4 BEI Mega Medium Bacteroides eggerthii DSM 20697 DSMZ Mega Medium Bacteroides finegoldii DSM 17565 DSMZ Mega Medium Bacteroides fragilis 3_1_12 BEI Mega Medium Bacteroides intestinalis DSM 17393 DSMZ Mega Medium Bacteroides sp. 1_1_6 BEI Mega Medium Bacteroides sp. 2_1_22 BEI Mega Medium Bacteroides sp. 3_1_19 BEI Mega Medium Bacteroides sp. 9_1_42FAA BEI Mega Medium Bacteroides sp. 2_1_16 BEI Mega Medium Bacteroides sp. D2 BEI Mega Medium Bacteroides thetaiotaomicron VPI-5482 ATCC Mega Medium Bacteroides xylanisolvens DSMZ 18836 DSMZ Mega Medium Bacteroides uniformis ATCC 8492 ATCC Mega Medium Bacteroides pectinophilus ATCC 43243 ATCC Chopped Meat Medium Bacteroides plebeius DSM 17135 DSMZ Chopped Meat Medium Bacteroides coprocola DSM 17136 DSMZ Chopped Meat Medium Bacteroides stercoris ATCC 43183 DSMZ Mega Medium Coprococcus eutactus ATCC 27759 ATCC Chopped Meat Medium Eubacterium dolichum DSM 3991 DSMZ Mega Medium Ruminococcus gnavus ATCC 29149 BEI Mega Medium Eubacterium rectale ATCC 33656 ATCC Mega Medium Clostridium methylpentosum DSM 5476 DSMZ Mega Medium Clostridium nexile DSM 1787 DSMZ Mega Medium Clostridium scindens ATCC 35704 ATCC Mega Medium Clostridium sp. L2-50 BEI Chopped Meat Medium Clostridium sp. M62/1 BEI Chopped Meat Medium Clostridium asparagiforme DSM 15981 DSMZ Mega Medium Clostridium bolteae ATCC BAA-613 ATCC Mega Medium Clostridium hathewayi DSM 13479 DSMZ Mega Medium Clostridium leptum DSM 753 DSMZ Chopped Meat Medium Dorea formicigenerans ATCC 27755 DSMZ Mega Medium Dorea longicatena DSM 13814 DSMZ Mega Medium Coprococcus comes ATCC 27758 ATCC Mega Medium Blautia hansenii DSM 20583 DSMZ Mega Medium Bryantella formatexigens DSM 14469 DSMZ Mega Medium Butyrivibrio crossotus DSM 2876 DSMZ Chopped Meat Medium Ruminococcus torques ATCC 27756 ATCC Mega Medium Parabacteroides merdae ATCC 43184 DSMZ Mega Medium Subdoligranulum variabile DSM 15176 DSMZ Mega Medium Parabacteroides johnsonii DSM 18315 DSMZ Chopped Meat Medium Roseburia intestinalis L1-82 ATCC Mega Medium Ruminococcus obeum ATCC 29174 DSMZ Mega Medium Eubacterium ventriosum ATCC 27560 DSMZ Mega Medium Faecalibacterium prausnitzii A2-165 DSMZ Chopped Meat Medium Parabacteroides sp. D13 BEI Mega Medium Eubacterium hallii DSM 3353 DSMZ Chopped Meat Medium Roseburia inulinivorans DSM 16841 DSMZ Chopped Meat Medium Prevotella buccalis ATCC 35310 DSMZ Chopped Meat Medium Ruminococcus lactaris ATCC 29176 ATCC Chopped Meat Medium Eubacterium eligens ATCC 27750 DSMZ Mega Medium Holdemania filiformis DSM 12042 DSMZ Mega Medium Bacteroides ovatus ATCC 8483 ATCC Mega Medium Bacteroides vulgatus ATCC 8482 ATCC Mega Medium Clostridium spiroforme DSM 1552 DSMZ Chopped Meat Medium Eubacterium biforme DSM 3989 DSMZ Mega Medium Blautia hydrogenotrophica DSM 10507 DSMZ Chopped Meat Medium Clostridium saccharolyticum WM1 DSMZ Mega Medium Parabacteroides distasonis ATCC 8503 ATCC Mega Medium Eubacterium siraeum DSM 15702 DSMZ Chopped Meat Medium Eggerthella lenta DSM 2243 DSMZ Chopped Meat Medium Anaerostipes caccae DSM 14662 DSMZ Mega Medium Bacteroides cellulosilyticus DSM 14838 DSMZ Mega Medium Clostridium hylemonae DSM 15053 DSMZ Mega Medium Acidaminococcus sp. D21 BEI Mega Medium Catenibacterium mitsuokai DSM 15897 DSMZ Mega Medium Collinsella aerofaciens ATCC 25986 ATCC Mega Medium Acidaminococcus fermentans DSM 20731 DSMZ Mega Medium Clostridium bartlettii DSM 16795 DSMZ Mega Medium Ethanoligenens harbinense YUAN-3 DSMZ Chopped Meat Medium Veillonella dispar ATCC 17748 DSMZ Chopped Meat Medium Collinsella stercoris DSM 13279 DSMZ Chopped Meat Medium Prevotella buccae D17 BEI Chopped Meat Medium Mitsuokella multacida DSM 20544 DSMZ Mega Medium Olsenella uli DSM 7084 DSMZ Chopped Meat Medium Slackia heliotrinireducens DSM 20476 DSMZ Chopped Meat Medium Bifidobacterium longum infantis ATCC 55813 BEI Mega Medium Dialister invisus DSM 15470 DSMZ Mega Medium Prevotella copri DSM 18205 DSMZ Chopped Meat Medium Veillonella sp. 6_1_27 BEI Chopped Meat Medium Slackia exigua ATCC 700122 DSMZ Chopped Meat Medium Streptococcus thermophilus LMD-9 ATCC Chopped Meat Medium Desulfovibrio piger ATCC 29098 DSMZ Chopped Meat Medium Lactobacillus ruminis ATCC 25644 ATCC Mega Medium Akkermansia muciniphila ATCC BAA-835 DSMZ Mega Medium Bifidobacterium adolescentis L2-32 BEI Mega Medium Bifidobacterium pseudocatenulatum DSM 20438 DSMZ Mega Medium Solobacterium moorei DSM 22971 DSMZ Chopped Meat Medium Anaerofustis stercorihominis DSM 17244 DSMZ Mega Medium Lactococcus lactis DSMZ 20729 DSMZ Mega Medium Granulicatella adiacens ATCC 49175 DSMZ Mega Medium Clostridium sporogenes ATCC 15579 ATCC Mega Medium Bacteroides dorei DSM 17855 DSMZ Mega Medium Bifidobacterium catenulatum DSM 16992 DSMZ Mega Medium Ruminococcus albus strain 8 Laboratory of Robert Mackie Chopped Meat Medium Ruminococcus flavefaciens FD 1 Laboratory of Robert Mackie Chopped Meat Medium Ruminococcus bromii ATCC (L2-63) ATCC Chopped Meat Medium Veillonella sp. 3_1_44 BEI Chopped Meat Medium Bifidobacterium breve DSM 20213 DSMZ Mega Medium Megasphaera sp. DSMZ 102144 DSMZ Mega Medium Adlercreutzia equolifaciens DSM 19450 DSMZ Chopped Meat Medium Alistipes finegoldii DSM 17242 DSMZ Mega Medium Alistipes ihumii AP11 Laboratory of Emma Allen Vercoe Chopped Meat Medium Alistipes indistinctus YIT 12060 DSMZ Mega Medium Alistipes onderdonkii DSM 19147 DSMZ Chopped Meat Medium Alistipes senegalensis JC50 DSMZ Chopped Meat Medium Alistipes shahii WAL 8301 DSMZ Chopped Meat Medium Bacteroides rodentium DSM 26882 DSMZ Chopped Meat Medium Bilophila wadsworthia ATCC 49260 ATCC Chopped Meat Medium Blautia sp. KLE 1732 BEI Chopped Meat Medium Blautia wexlerae DSM 19850 DSMZ Mega Medium Burkholderiales bacterium 1_1_47 Laboratory of Emma Allen Vercoe Chopped Meat Medium Butyricimonas virosa DSM 23226 DSMZ Mega Medium Clostridiales bacterium VE202-03 Laboratory of Kenya Honda Mega Medium Clostridiales bacterium VE202-14 Laboratory of Kenya Honda Mega Medium Clostridiales bacterium VE202-27 Laboratory of Kenya Honda Chopped Meat Medium Clostridium sp. VPI C48-50 ATCC Chopped Meat Medium Intestinimonas butyriciproducens DSM 26588 DSMZ Mega Medium Odoribacter splanchnicus DSM 20712 DSMZ Chopped Meat Medium Oscillibacter sp. KLE 1728 BEI Chopped Meat Medium Ruminococcus gauvreauii DSM 19829 DSMZ Mega Medium Subdoligranulum sp. 4_3_54A2FAA Laboratory of Emma Allen Vercoe Chopped Meat Medium Escherichia coli ATCC 43894 ATCC BHI Escherichia coli MITI 27 Laboratory of Michael Fischbach BHI Escherichia coli MITI 117 Laboratory of Michael Fischbach BHI Escherichia coli MITI 135 Laboratory of Michael Fischbach BHI Escherichia coli MITI 139 Laboratory of Michael Fischbach BHI Escherichia coli MITI 255 Laboratory of Michael Fischbach BHI Escherichia coli MITI 284 Laboratory of Michael Fischbach BHI Enterobacter cloacae MITI 173 Laboratory of Michael Fischbach BHI Eschericia coli S17-1 ƛ-pir Laboratory of Michael Fischbach BHI Clostridium sporogenes ATCC 15579 Δotc Laboratory of Michael Fischbach Mega Medium Clostridium sporogenes ATCC 15579 Δadi Laboratory of Michael Fischbach Mega Medium Chemicals, Peptides, and Recombinant Proteins PBS Gibco 10010023 Tryptone peptone Difco 211921 Bacto yeast extract Difco 212750 Magnesium sulfate heptahydrate Sigma M2773 Sodium bicarbonate Sigma S5761 Calcium chloride Sigma C7902 Resazurin Sigma R7017 Agar Difco DF0140-01-0 Sodium acetate Sigma S2889 Meat extract Sigma 70164 D-glucose Sigma 47829 L-cystine HCl Sigma C7477 Potassium phosphate monobasic Sigma P5655 Potassium phosphate dibasic Sigma P3786 Vitamin K3 Sigma M5625 Hematin Sigma H3281 Tween 80 Sigma P4780 Vitamin mix ATCC MD-VS Trace mineral supplement ATCC MD-TMS D-(+)-cellobiose Sigma C7252 D-(+)-maltose monohydrate Sigma M5885 D-(−)-fructose Sigma F0127 Acetic acid, glacial Sigma A6283 Propionic acid Sigma P5561 Butyric acid Sigma B103500 Isovaleric acid Sigma 129542 Sterilized rumen fluid Bar Diamond Ranch #SRF Chopped meat media Hardy Diagnostics K219 Vitamin K2 Sigma V9378 Ammonium sulfate Sigma A4418 Nitrilotriacetic acid Sigma N9877 Manganese(II) chloride tetrahydrate Sigma M5005 Cobalt (II) hexahydrate Sigma C8661 Calcium chloride dihydrate Sigma 223506 Zinc chloride Sigma Z0152 Copper chloride Sigma 451665 Sodium molybdate dihydrate Sigma M1651 Boric acid Sigma B6768 Sodium selenite Sigma 214485 Nickel chloride hexahydrate Sigma N6136 Sodium tungstate dihydrate Sigma 72069 L-alanine Sigma A7469 L-arginine Sigma A5006 L-asparagine Sigma A4159 L-aspartic Acid Sigma A8949 L-glutamic Acid Sigma 49449 L-glutamine Sigma 49419 L-glycine Sigma G7126 L-histidine Fisher BP382 L-isoleucine TCI I0181 L-leucine TCI L0029 L-lysine Sigma L5751 L-methionine Sigma 64319 L-phenylalanine Sigma P5482 L-proline Sigma 81709 L-serine Sigma S4500 L-threonine Sigma 89179 L-tryptophan Sigma T0254 L-tyrosine Sigma 93829 L-valine Sigma 94619 T4 ligase NEB M0202T AscI NEB R0558 NotI NEB R0189 Bacto tryptone Thermo Fisher 211701 Sodium thioglycolate Sigma 1066910500 D-cycloserine Sigma C6880 Erythromycin Sigma 114-07-8 Thiamphenicol Sigma T0261 Luria Broth agar Fisher BP1425-500 MacConkey agar Sigma M7408 MacConkey sorbitol agar Sigma 88902 Columbia agar with 5% sheep blood BD 221165 Brain Heart Infusion broth Fisher CM1136B Horse blood, defibrinated Fisher 50863761 Glycerol Fisher PRH5433 Potassium chloride Sigma P9541 Magnesium chloride Sigma M1028 Sodium phosphate dibasic Sigma S3264 Sodium chloride Sigma S3014 Uric acid Sigma U2625 Glutathione Sigma G4251 D-tryptophan Sigma T9753 DMEM Thermo Fisher 10566024 Percoll Sigma GE17-5445-01 Methanol Fisher A456 Formic acid Sigma 426229 Ammonium bicarbonate Sigma 9830 Ammonium formate Sigma 70221 Acetonitrile Fisher A955 4-chloro-L-phenylalanine Carbosynth FC13398 d4-cholic acid Sigma 614149 Durapore PVDF 0.22-μm membrane Millipore UFC30GV00) MultiScreen Solvinert 96 Well Filter Plate Millipore MSRLN0410 Lithocholic acid Sigma L6250 Murocholic acid Steraloids C0910-000 Ursodeoxycholic acid Sigma U5127 Hyodeoxycholic acid Sigma H3878 Chenodeoxycholic acid Sigma c9377 Deoxycholic acid Sigma D2510 7-oxocholic acid Sigma SMB00806 Omegamuricholic acid Steraloids C1888-000 Alphamuricholic acid Steraloids C1890-000 Betamuricholic acid Steraloids C1895-000 Gammamuricholic acid Steraloids C1850-000 Cholic acid Sigma C1129 7-betacholic acid TRC U849900 Cholic acid-2,2,4,4-d4 Sigma 614149 Taurolithocholic acid Sigma T7515 Tauroursodeoxycholic acid Sigma 580549 Taurohyodeoxycholic acid Steraloids C0890-000 Taurochenodeoxycholate Sigma T6260 Taurodeoxycholic acid Sigma T0557 Taurobetamuricholic acid Steraloids C1899-000 Tauroomegamuricholic acid Steraloids C1889-000 Taurocholic acid Sigma 86339 Critical Commercial Assays DNeasy Power Soil Kit Qiagen 12955-4 Illumina NextSeq Kit Illumina NextSeq 500/550 v2.5 Illumina NovaSeq kit Illumina NovaSeq 6000 S4 Reagent Kit v1.5 Pico488 dsDNA quantification reagent Lumiprobe 92010 ATP Determination Kit Invitrogen A22066 Quick-DNA Fungal/Bacterial Miniprep Kit Zymogen D6005 GentleMACS Lamina Propria Kit Miltenyi Biotec 130-097-410 Macs SmartStrainers (100 um) Miltenyi Biotec 130-110-917 GentleMACS C tubes Miltenyi Biotec 130-096-334 MACS Buffer Miltenyi Biotec 130-091-222 CK28 Hard Tissue Homogenizing Kit, Beads VWR 10144-556 Foxp3/Transcription Factor Staining eBioscience 00-5523-00 Creatinine Assay Kit Abcam ab204537 Deposited Data To be updated with public accession numbers Experimental Models: Organisms/Strains Mouse: C57BL/6 GF Taconic Biosciences N/A Mouse: SW GF Taconic Biosciences N/A Software and Algorithms NinjaMap This study Quast v. 5.0.2 SeqKit v. 0.12.0 GTDB-tk v. 1.2.0 GTDB release 89 (database) CheckM v. 1.1.2 BBtools https://jgi.doe.gov/data-andtools/bbtools/bbtools-user-guide/ v. 38.37 Unicycler v. 0.4.8 LRScaf v. 1.1.9 TGS-GapCloser v. 1.0.1 SPAdes v. 3.13.1 MetaBAT2 v. 2.2.14 Grinder v. 0.5.4 Bowtie2 v. 2.3.5.1 Samtools Samtools MetaPhlan2 MetaPhlan2 Midas Midas Kraken2 Kraken2 Bracken Bracken Matlab https://www.mathworks.com/products/matlab.html Other 2.2-mL 96-well deep-well plates Thomas Scientific 1159Q92 Silicone fitted plate mat Thomas Scientific SMX-DW96S20 Corning 96-Well Clear Flat Bottom, Polystyrene, sterile Corning 3370 Vinyl Tape Coy 1600330w ACQUITY UPLC BEH C18 Column, 130Å, 1.7 μm, 2.1 mm×100 mm Waters 186002352 ACQUITY UPLC BEH C18 VanGuard Pre-column, 130 Å, 1.7 μm, 2.1 Waters 186003975 ACQUITY UPLC BEH Amide VanGuard Pre-column, 130 Å, 1.7 μm, 2.1 Waters 186004799 Waters ACQUITY UPLC BEH Amide Column, 130Å, 1.7 μm, 2.1 mm×150 mm Waters 186004802 Kinetex C18 column (1.7 μm, 2.1×100 mm) Phenomenex N/A Agilent 1290 Infinity II UPLC Agilent 1290 Infinity II UPLC N/A HIGHLIGHTS We introduce hCom1, a defined community of 104 gut bacterial species We fill open niches in vivo to form hCom2, a defined community of 119 species In gnotobiotic mice, hCom2 exhibited robust colonization resistance against E. coli Mice colonized by hCom2 versus a human fecal community are phenotypically similar \ No newline at end of file