Bianca Calderon
Harvard College 2009

Most genomes are made up of substantial portions of repetitive DNA.  In humans, for example, as much as 49% of the genome consists of such repeats.  Tandem repeats, sequences which are repeated head-to-tail at one specific locus within the genome, are especially interesting because of their high level of variability. The focus of this study is on tandem repeats which occur in the promoter regions of genes in Saccharomyces cerevisiae (brewer’s yeast). Here, I show that repeats in promoters are indeed hyper-variable and often differ between evolutionarily closely related sub-populations of yeast (natural yeast strains). To test the effect of repeats in promoters regions, I conducted experiments with two candidate genes: SDT1 and YKL071w.  I created constructs with varying number of repeats in the promoter regions, studied the effect of repeat size on the transcriptional activity of the respective genes, and found that changes in the number of repeat units in the genes correspond to variation in transcriptional activity.  To test whether these changes in transcription levels resulted in phenotypic changes, growth assays were conducted with SDT1 strains in the presence of 6-azauracil. These assays yielded differences in the length of the lag phase that corresponded with the gene expression of the respective strain. In YKL071w, binding sites for the stress-response transcription factor Yap1 overlap with the variable tandem repeats. Studies of the YKL071w mutant series comparing the gene expression of strains with a deletion of the YAP1 gene and strains with an intact YAP1 open reading frame show a significant reduction in expression in the strains with the deletion, suggesting that transcription factor binding regulates transcription in YKL071w. Together, these results indicate that just as variable repeats located within coding regions allow swift evolution of protein function, repeats in promoters may allow quick evolution of gene expression levels to changing environments and selective pressures.

Introduction

Most genomes contain significant portions of so-called “repetitive DNA” or “repeats”.  As much as 45% of the human genome, for example, consists of such repeats.1 Despite their abundance, repeats have been historically regarded as nonfunctional “junk” DNA.2 While many repeats are found in “gene deserts” (i.e. parts of the genome without any obvious function), some repeats are located within promoter regions of genes and even within open reading frames.3 The prevalence of repeats in genomes suggests that these sequences may have a biological function.4 This study will attempt to characterize the variability and function of repeats, specifically tandem repeats, in the promoter regions of the yeast genome.

Tandem Repeats

Tandem repeats, also known as satellite DNA, are a type of repetitive DNA sequence.  This study will focus on these repeats. Tandem repeats are made up of sequences which are repeated head-to-tail at one specific locus within the genome. For example, the DNA sequence CGACGACGACGA is a tandem repeat made up of four units of three nucleotides (Figure 1).  These sequences were named “satellites” because during separation of genomic DNA in a buoyant density gradient, repetitive DNA sequences form a secondary, or “satellite” band on account of the different density of repetitive DNA compared to the rest of the genome.5

Tandem repeats can be divided into two classes: microsatellites and minisatellites. The first human minisatellites were discovered by A.R. Wyman and R. White.6 Minisatellites have repeat units that are 10-150 bp in length and are typically more stable than microsatellites or short tandem repeats (STRs), which are so named because these sequences have a repeat unit length of 1-9 bp, and repeat length of less than 150 bp.7 Minisatellites are often found near centromeres and telomeres, while microsatellites are distributed throughout the entire genome.8, 9

Three percent of the human genome is comprised of tandem repeats, which is a larger percentage than that composed of coding regions (Table 1). Tandem repeats, however, have been traditionally regarded as junk DNA for a number of reasons. The majority of repetitive sequences are located in intergenic regions rather than in coding regions.4 For example, large satellite regions (greater than 100 kb) are normally found near the centromere.8 In humans, alphoid DNA, a satellite region, occurs at the centromere of each chromosome.10 Alphoid DNA has a repeat unit of 170 bp, and makes up 3% of the DNA sequence in each chromosome.  The modest number of repeats in coding regions led to the belief that repeats were not useful sequences because they do not code for proteins in most cases.11 This idea was strengthened by the unsuccessful attempts of many scientists to find a function for these sequences.12 The lack of complexity in these repeat regions also led to questions about their biological usefulness or function. It is easy to assume that repeats are too simple to be of any value. In addition, repeats are extremely unstable and variable. Some argued that if repetitive DNA was indeed functional, then it would presumably be more conserved. This study, however, will attempt to counter these arguments.

Figure 1
Figure 1. Types of repeats. In this example, the repeat unit CGA is considered a tandem repeat because it is repeated head-to-tail at one specific locus.
Table 1
Table 1. Repeats are more prevalent than coding regions in the human genome. The significant portion of the genome that contains repeats suggests that these sequences serve a biological role.

Tandem Repeats in Promoter Regions

Regulatory (promoter) regions also contain tandem repeats. Repeats located within promoters and other regulatory sequences are less well characterized than repeats located within coding regions.  These repeats, although they are found in non-coding regions, still serve a biological function.  In humans, short alleles of the tandem repeat found in the 1f exon of the gene NOS1 are said to increase the risk of Alzheimer’s disease.13 In Saccharomyces cerevisiae (S288c), tandem repeats occur between the UAS region and the TATA box in four MAL promoters.14 The tandem repeats reduce expression of maltose permease relative to maltase, thereby preventing possible toxic effects that can occur through over-expression of the permease gene. Repeats in promoter regions have a biological purpose and deem further study.

Table 2
Table 2. Differences between natural yeast strains in genes containing tandem repeats in the promoter region. 5695 genes were analyzed in the yeast strains S288c and RM11-1a. 21% of the genes with tandem repeats in the promoter region showed differences in the number of repeat units between the two strains, demonstrating that tandem repeats are naturally variable. In 10% of the genes with tandem repeats in the promoter region, the tandem repeats overlap with at least one transcription factor binding site. One hundred forty three of the genes with tandem repeats in the promoter region did not have enough sequence information available to determine whether the tandem repeats were variable or conserved.

Central Hypothesis and Goals of This Study

We hypothesize that, just as repeats located within coding regions allow swift evolution of protein function, repeats in promoter regions may allow quick adaptation and evolution of gene expression levels to changing environments and selective pressures. We focus on the model eukaryote Saccharomyces cerevisiae because the relative compactness of the genome, the available genetic toolbox, and the extraordinary knowledge about this organism make it a good model for this study. Initial studies completed by the Verstrepen group indicate that more than 1000 S. cerevisiae promoters contain tandem repeats (Table 2).

First, we explored the differences between tandem repeats within promoter regions found in naturally occurring yeast strains to establish whether repeats in promoters are hyper-variable in nature. We hypothesized that there would be significant variation in the repeat regions between these evolutionarily closely related sub-populations of yeast.

To investigate the role of repeats in transcriptional regulation, I created constructs with varying number of repeats in the promoter regions and studied the effect on the transcriptional activity of the respective genes.  For this experimental study, I focused on a few specific genes that carry repeats in their promoter regions. Interesting candidate genes with intra-promoter repeats identified in the in silico screen previously performed by the Verstrepen group include:

YKL071w, which is induced by stress and contains binding sites for the stress-response transcription factor Yap1 in the promoter region that overlap with the variable tandem repeats (Table 3).15

SDT1, a stress-induced pyrimidine nucleotidase whose overexpression suppresses the 6-Azauracil (6-AU) sensitivity of the transcription elongation factor S-II and confers resistance to other pyrimidine derivatives (Table 3).16

In order to examine the mechanisms by which tandem repeats affect gene expression, I studied the dependence of gene expression levels for YKL071w on Yap1. I hypothesized that the expression levels of YKL071w are dependent on Yap1 and that tandem repeats control gene expression through transcription factor binding in this gene.

Together, these three specific aims will reveal how repeats confer variability and “evolvability” of transcriptional regulation in eukaryotes.

Table 3
Table 3. Genes utilized in this study and their functions. A description of the repeat unit, repeat size, and function of the wild type for the genes used in this study (Legendre et al., 2007; Saccharomyces Genome Database).

Materials and Methods

Strains and Growth conditions

The strains used are listed in Supplementary Table 1. YPD medium contained 2% glucose (Sigma-Aldrich), 2% peptone (Difco), and 1% yeast extract (Difco); YPD plates contained 2% glucose, 2% peptone, 1% yeast extract, and 2% agarose (Invitrogen); hygromycin B plates contained 2%glucose, 2% peptone, 1% yeast extract, 2% agarose, and 200 μg ml-1 hygromycin B (Sigma Aldrich); Synthetic Complete (SC) medium contained 0.67% yeast nitrogen base without amino acids and with ammonium sulphate (VWR (BD)), 2% glucose, and 0.08% CSM (Dropout mix; Sunrise Science); SC Uraplates contained 0.67% yeast nitrogen base without amino acids and with ammonium sulphate, 2% glucose, 0.08% CSM (UraDropout mix; Sunrise Science), and 2% bacto agar (VWR (BD)); 5FOA plates contained 0.67% yeast nitrogen base without amino acids and with ammonium sulphate, 0.08% CSM (Dropout mix), 0.1% 5-fluoroorotic acid (Toronto Research Chemicals Inc.), 0.05% uracil (Sigma chemical), 2% glucose, and 2% bacto agar. Yeast cultures were grown in 3 mL of YPD for 16-20 hours at 30 °C in a rotating wheel unless otherwise noted. Plated cultures were incubated at 30 °C for 3 days.

Tranformations

Standard procedures for S. cerevisiae transformations with DNA were used.27 The PCR enzyme utilized was TAKARA ExTaq (TAKARA), and producer guidelines were followed. All oligonucleotides (Sigma-Genosys) used are listed in Supplementary Table 2.

PCR-Based Transformation Strategy

Hph, a gene conferring resistance to the antibiotic hygromycin B, was inserted upstream or downstream of the gene containing variable repeats in S. cerevisiae through transformation and directed integration. Cells containing the hph insertion were then selected for on hygromycin B plates. PCR was carried out on genomic DNA from the strains with the hph insertion to create a product with a different number of tandem repeats than the wild type. This PCR product was transformed into a wild-type strain of S. cerevisiae and selected for on hygromycin B plates. Repeat length in these new strains was determined through PCR and gel electrophoresis. This method allowed us to control the number of repeat units added or deleted.

“Loopout” Transformation Strategy

The gene URA3 was inserted within the tandem repeats of the promoter region of the chosen gene in S. cerevisae by transformation, so that the gene was flanked on both sides by the repeats. The correct transformants were selected for on SC Ura plates. URA3 was then selected against on 5-fluoroorotic acid plates. 5FOA forms a toxic metabolite in strains containing URA3.28 URA3 was looped out due to natural recombination events between the flanking tandem repeats. Loss of the URA3 marker was confirmed by PCR.

RNA Isolation

RNA was extracted from cells by first spheroplasting yeast cells for 1 hour at 37 °C using Solution A (Zymolyase, 1mg/mL (MP Biomedicals); sorbitol, 0.9 M; EDTA pH 7.5, 0.1 M, mercaptoethanol, 14 mM) and then using an ABI 6100 Nucleic Acid Prep Station and reagents (Applied Biosystems).

cDNA Synthesis and Gene Expression

cDNA was prepared using the AffinityScript QPCR cDNA Synthesis Kit (STRATAGENE). Random primers supplied in the kit were used. Real-time PCR using the ABI 7500 system (Applied Biosystems) was carried out with the appropriate enzymes and chemicals from Applied Biosystems as recommended by the supplier. All oligonucleotides (Sigma-Genosys) used are listed in Supplementary Table 1.

Growth Assays

Overnight cultures were grown of the SDT1 mutant series in YPD. These cultures were transferred to SC medium with 0.03 mg/mL 6-azauracil (Sigma-Aldrich). Using the Bioscreen C MBR system (Oy Growth Curves Ab Ltd.), these cultures were incubated for 30 hours at 30 °C with constant shaking. Growth curves were created for each strain using the optical density recorded by the Bioscreen C MBR system using a 600-nm filter.

Results

Tandem Repeats within Promoter Regions are Hyper-Variable in Natural Strains

Tandem repeats within coding regions are known to be variable in nature.4 We hypothesized that tandem repeats found in promoter regions would be naturally variable as well. In order to test this hypothesis, we located and compared the promoter sequences in six naturally occurring yeast strains (S288c, RM11-1A, W303, Sigma, D273, and Y55). Prior to my arrival, the entire S. cerevisiae genome was scanned for tandem repeats using sequence from the Saccharomyces Genome Database (SGD) and the TRF algorithm.17, 18, 19 Of the 5695 genes analyzed in S288c, 25% (1456 genes) contained a tandem repeat in the promoter region (Table 2). Promoter regions were defined as the 1000 nucleotides found upstream (fewer if there is another gene in this region) of the start codon in the open reading frame of the gene. We then investigated the differences between the promoter regions of these genes in the strains S288c and RM11-1a (version 1; Saccharomyces cerevisiae RM11-1a Sequencing Project. Broad Institute of Harvard and MIT). Repeats were classified as variable if the number of complete units differed by at least one between the two strains. 21% (307 genes) of the genes with repeats were found to have variable tandem repeats (Table 2). It is possible that this percentage is actually higher as 143 of the genes containing repeats did not have enough sequence information available to determine whether the tandem repeats were variable or conserved. 69% (1006 genes) of the genes with repeats were found to be conserved.

To find the most variable repeats and to see how conserved promoter regions are, the promoter regions of thirty-three of the variable genes found in the in silico screen, chosen based on their interesting functions, were sequenced in the strains S288c, W303, Sigma, D273, and Y55. The sequences of these five strains, along with the published S. cerevisiae sequence from SGD, were aligned for each gene using ChromasPro (Technelysium Pty Ltd). Through these alignments, I found that the tandem repeat regions were highly variable between the different strains (Figure 2). The gene YKL071w, whose twelve nucleotide repeat unit is ATTAGTAATGAG, contains an extra repeat unit in the strain Y55 (Table 3; Figure 2a). In VPS55, with repeat unit GT, the strains S288c and W303 contain 6.5 additional units (Table 3; Figure 2a). The gene PRE8, whose repeat unit is TTA, is extremely variable (Table 3). In comparison to the strain Y55, the strains D273, Sigma, W303, and SGD have 9.66, 11.66, 12.66, and 14.66 additional repeat units, respectively (Figure 2a). These alignments were also used to see if variability was confined to the tandem repeat region. We found that outside of the repeat, the promoter regions were indeed highly conserved between strains (Figure 2b).

Figure 2a
Figure 2a. Tandem repeats are hyper-variable in natural strains. a. Partial alignments of the YKL071w, VPS55, and PRE8 promoter sequences of five naturally occurring S. cerevisiae strains in addition to the respective promoter sequence published on the Saccharomyces Genome Database (SGD; added to confirm that our sequencing of the same strain (S288c) matched the already published sequence) show that the number of repeat units varies between the strains (Goffeau et al., 1996). Each tandem repeat unit is marked by a red box. Sequence that is conserved between strains is highlighted black, and sequence that differs between strains is white. In YKL071w, the strain Y55 has an extra copy of the repeat unit. The strains S288c and W303 in VPS55 have 6.5 additional repeat units compared to the other strains. Compared to the strain Y55 in PRE8, the strains D273, Sigma, W303, and SGD have 9.66, 11.66, 12.66, and 14.66 additional repeat units, respectively (not all shown here). These differences demonstrate that these tandem repeats vary naturally between strains.
Figure 2b
Figure 2b. An alignment of the YKL071w promoter sequences. Sequences that are conserved between strains are highlighted pink, and sequences that differ between strains are white. Each tandem repeat unit is boxed and numbered. Outside of the tandem repeat region, the sequences between strains are very well conserved.

Generation of Tandem Repeat Variance in an Isogenic Strain Background

To study the role of repeats in transcriptional regulation, I created constructs with varying number of repeats in the promoter regions from the progenitor strain BY4742 using two different transformation strategies (Supplementary Table 1; Figure 3). Promoters were chosen for this experiment based on their variability between the natural strains. I was able to create constructs with repeat length variations from total deletion to addition of repeat units (Figure 3b,d,e).

The “Loopout” strategy is based on natural recombination (Figure 3a). The gene URA3 is inserted through transformation within the tandem repeats of the promoter region of the chosen gene in S. cerevisae. URA3 insertions were selected for by growing strains in SC minus uracil medium. Stably integrated, URA3 is then selected against in a medium (5-fluoroorotic acid) which forms a toxic metabolite in strains containing the URA3 gene. In the strains that were able to grow in the 5-fluoroorotic acid (5FOA) medium, URA3 was looped out due to natural recombination events within the tandem repeats. When URA3 was looped out, it would at times also loop out some of the repeated region. I used gel electrophoresis to determine if the size of the repeat region in these new strains differed from the wild type. In the genes CDC14 and PRE8, I was able to create a number of mutants with a repeat region that varied from the respective wild type (Figure 3b). This method, however, did not allow me to control the size of the repeat alteration, and often yielded strains with a tandem repeat identical to that of the parental strain.

We then developed a PCR-based strategy that allowed us to control the number of repeat units added or deleted (Figure 3c). In this method, hph, a gene conferring resistance to the antibiotic hygromycin B, is inserted upstream or downstream of the gene containing variable repeats in S. cerevisiae. Cells containing the hph insertion are then selected for in hygromycin-containing medium. PCR is carried out on genomic DNA from the strains with the hph insertion to create a product with a different number of tandem repeats than the wild type. This PCR product, which also contains the inserted hph gene, is then transformed into a wild-type strain of S. cerevisiae. Using this strategy, strains were created with repeats ranging from a full deletion of the original repeat to a length double that of the wild-type repeat (Figure 3d,e). The number of repeat units in each strain was determined through gel electrophoresis and sequencing of the tandem repeat region.  With this method, I created mutant series (multiple strains with different numbers of repeat units in the promoter region) in YKL071w and SDT1 (Figure 3e). These genes were chosen because of their interesting functions and successful repeat alterations using this strategy (Table 3).

Figure 3a
Figure 3. Strains created by the PCR-based and “Loopout” strategies. a. In the “Loopout” strategy, the gene URA3 is inserted within the tandem repeats of the promoter region of the chosen gene in S. cerevisae. URA3 is then selected against in 5FOA medium. URA3 is looped out due to natural recombination events within the tandem repeats. When URA3 is looped out, it may also loop out some of the repeated region.
Figure 3b
Figure 3b. PCR products of the tandem repeat regions used to size repeat changes in strains generated using the “Loopout” strategy. 1-6 are mutants of the gene CDC14. Number 1 is wild type of CDC14. 7-11 are mutants of the gene PRE8. Number 11 is the wild type of PRE8.
Figure 3c
Figure 3c. In the PCR-based strategy, the Hygromycin resistance gene (hph) is inserted upstream or downstream of the desired gene in S. cerevisiae. The hph insertion is then selected for in Hygromycin medium. PCR is carried out on genomic DNA from the strains with the hph insertion to create a product with a number of tandem repeats that differs from wild type. This PCR product is then transformed into a wild-type strain of S. cerevisiae.
Figure 3d
Figure 3d. PCR products from strains generated using the PCR- based strategy. 1-2 are the wild type and complete tandem repeat deletion mutant of the gene SDT1, respectively. 3-4 are the wild type and complete tandem repeat deletion mutant of the gene VPS55, respectively. 5-6 are the wild type and complete tandem repeat deletion mutant of the gene WHI5, respectively. 7-8 are the wild type and complete tandem repeat deletion mutant of the gene YKL071w, respectively.

Figure 3e

YKL071w: Transcriptional Regulation through Transcription Factor Binding

Variation in Tandem Repeat Units Affects Gene Expression

To investigate whether altering the number of repeat units affects gene expression, we performed Quantitative real-time PCR (QPCR) on the mutant series of YKL071w, which ranges from wild type to a total deletion of the repeat unit (Figure 4a). Expression was induced through exposure to hydrogen peroxide, which stimulates the oxidative stress response. Gene expression levels were determined by averaging the data from eight experiments for each strain and calculating the standard error. All of the gene expression levels are relative values, normalized to the constitutively expressed gene ACT1. KV1028, where the open reading frame of YKL071w has been completely deleted, shows an expression of 4.44×10-11 (±8.75×10-12) (Figure 4b). KV834, with a complete tandem repeat deletion, has an expression level of 4.14×10-8 (±4.25×10-9). KV835 has a deletion of ½ of its tandem repeat and an expression of 1.18×10-7 (±2.90×10‑8). KV833, where ⅓ of the tandem repeat has been deleted, has expression 2.49×10-7 (±2.46×10-8). The wild type (KV830), with its tandem repeat intact, shows an expression of 3.34×10-7 (±2.32×10-8). These results suggest that the expression level rises as the number of tandem repeat units increases in the promoter region of YKL071w.

Changing the Number of Yap1 Binding Sites Alters Gene Expression

In 10% of the genes containing intra-promoter repeats, the tandem repeats overlap with at least one transcription factor binding site (Table 2). The repeats in YKL071w overlap with the TTAC/GTAA binding motif for Yap1, a transcription factor involved in the oxidative stress response (Figure 5).20 Between natural yeast strains, there is variation in the number of Yap1 binding sites in the promoter region. In YJM789 and Y55, an extra repeat unit results in six Yap1 binding sites in the promoter region as opposed to the five Yap1 binding sites found in the strains RM11‑1a, S288c, Sher (D273), Sigma, and W303. We hypothesized that Yap1 is involved in the transcriptional regulation of YKL071w.

As the number of Yap1 binding sites decreases (corresponding to decreasing repeat units), gene expression decreases in YKL071w (Figure 6). In order to determine if Yap1 actually has an effect on transcription, we deleted the YAP1 gene in each strain of the YKL071w mutant series and performed QPCR on these new strains. Gene expression was induced through exposure to hydrogen peroxide. The data was normalized to ACT1 and then normalized again according to the appropriate wild type so that the expression levels from the strains with an intact YAP1 and from the strains with the YAP1 deleted could be compared. Expression is delineated in units of percentage (in decimal form) of the respective wild type. KV834, with a complete tandem repeat deletion has expression 0.12 (±0.01) while the strain containing the YAP1 deletion has expression 0.05 (±0.003). KV835, containing a ½ deletion of the repeat, shows expression 0.35 (±0.09) while its YAP1 mutant shows expression 0.18 (±0.11). KV833 has a deletion of ⅓ of its tandem repeat and an expression of 0.75 (±0.07), and its YAP1 mutant has an expression level of 0.04 (±0.006). The wild-type strain, KV 830, has expression 1, while its YAP1 mutant shows expression 0.08 (±0.03). In the strains without YAP1, expression is significantly decreased in comparison to the strains containing the gene, regardless of the size of the tandem repeats. These two phenomena suggest that tandem repeats alter gene expression in YKL071w through transcription factor (specifically, Yap1) binding.

Figure 4a
Figure 4. Expression levels of the YKL071w mutant strains. a. The strains created for YKL071w are: KV830-wild type; KV833-deletion of 1⁄3 of the tandem repeat; KV835-deletion of 1/2 of the tandem repeat; KV834-deletion of the entire tandem repeat; KV1028-deletion of the YKL071w open reading frame.
Figure 4b
Figure 4b. Gene expression levels of the YKL071w mutant series were found through quantitative real-time PCR. Expression was induced through exposure to hydrogen peroxide. Because these strains only differ in the number of tandem repeats, these differences in expression suggest that varying the number of tandem repeats results in expression level differences. The expression levels were normalized with the expression levels of the gene ACT1. Error bars denote standard error. *p ≤ 0.05 compared with wild type.
Figure 5
Figure 5. YKL071w tandem repeats overlap with Yap1 binding sites. An alignment of the DNA sequence found in the promoter region of YKL071w in seven natural yeast strains. Each tandem repeat unit is marked in red. The repeat units containing the binding motif for Yap1 (TTAC/ GTAA) are marked (Fernandes et al., 1997). The strains YJM789 and Y55 contain an extra repeat unit, and therefore an extra binding site for Yap1. Natural variation between these strains alters the number of Yap1 binding sites in the promoter region.
Figure 6
Figure 6. Variation in the number of Yap1 binding sites and deletion of YAP1 alter gene expression levels in YKL071w. Gene expression levels of the YKL071w mutant series were found through quantitative real-time PCR. Expression was induced through exposure to hydrogen peroxide. The expression levels were normalized according to ACT1 and the appropriate wild type. Decreasing the size of the tandem repeat in the promoter region of YKL071w also reduces the number of Yap1 binding sites in that region. In the strains where only the tandem repeat has been altered (blue), expression decreases as the number of Yap1 binding sites decreases. The gene YAP1 was deleted in four YKL071w strains (red): KV830-wild type; KV833-deletion of 1⁄3 of the tandem repeat; KV835-deletion of 1/2 of the tandem repeat; KV834-deletion of the entire tandem repeat. Expression is significantly decreased in the strains without YAP1 in comparison to the strains containing the gene. Transcription factor binding, therefore, may regulate transcription in YKL071w. Error bars denote standard error normalized by the respective wild type. *p ≤ 0.05 compared with its equivalent YAP1+ strain.

SDT1: Tandem Repeats Affect Phenotype

Variation in Tandem Repeat Units Affects Gene Expression

To determine whether altering the number of repeat units affects gene expression in SDT1, we performed QPCR on the mutant series, which ranges from a tandem repeat double the length of wild type to a total deletion of the repeat unit (Figure 7a). Expression was induced through exposure to the alkylating agent methyl methanesulfonate (MMS). Gene expression levels were determined by averaging the data from three experiments for each strain and calculating the standard error. All of the gene expression levels are relative values, normalized to the constitutively expressed gene ACT1. KV782, where the tandem repeat of SDT1 has been completely deleted, shows an expression of 0.08 (±0.01) (Figure 7b). KV981, with a ⅔ tandem repeat deletion, has an expression level of 0.11 (±0.02). KV975 has a deletion of ½ of its tandem repeat and an expression of 0.156 (±0.02). KV979, where ⅓ of the tandem repeat has been deleted, has expression 0.162 (±0.005). The wild type (KV534), with its tandem repeat intact, shows an expression of 0.13 (±0.02). KV1380, which contains two copies of the entire tandem repeat, has expression 0.07 (±0.006). These results show that, in SDT1, gene expression increases as the number of tandem repeat units in the promoter region increases until the repeat reaches an optimal size (slightly smaller than that of wild type, exact size unknown). Once the tandem repeat becomes larger than this optimal size, gene expression seems to decrease.

Length of Lag Phases Corresponds to Levels of Gene Expression

To examine the effect of tandem repeats on phenotype, growth assays in the presence of the stress reagent 6-azauracil (resistance to which requires the gene SDT1) were completed with the mutant series of SDT1 (Figure 7c). The length of the lag phase exhibited in each strain was then calculated by determining the time at which exponential growth was initiated. KV534, the wild type, had a lag phase of 16 hours. The strain with two full repeats, KV1380, yielded a 21.25-hour lag phase. KV979, the strain with a ⅓ deletion of the tandem repeat, exhibited a lag time of 13.5 hours.  The lag phase of the strain with ½ of its tandem repeat deleted, KV975, was 15.75 hours. KV981, containing a ⅔ repeat deletion, had a lag time of 23.5 hours.  The mutant with a complete deletion of the repeat, KV782, yielded a 23.25-hour lag phase.  From the expression data, we see that the strains have increased expression as the number of tandem repeat units in the promoter region increases until the repeat reaches a size slightly smaller than that of wild type, beyond which, expression decreases (Figure 7b). The lag times follow this same trend, including similar expression in strains KV981 and KV782 resulting in comparable lag times, suggesting that an alteration in the number of tandem repeats confers a phenotypic change (Figure 7c).

Figure 7a
Figure 7. Expression levels and growth assays for the SDT1 mutant series. a. The strains created for SDT1 are: KV534-Wild type; KV979-deletion of 1⁄3 of the tandem repeat; KV975- deletion of 1⁄2 of the tandem repeat; KV981-deletion of 2⁄3 of the tandem repeat; KV782-deletion of the entire tandem repeat; KV1380-addition of the entire tandem repeat.
Figure 7b
Figure 7b. Gene expression levels of the SDT1 mutant series found through quantitative real-time PCR. Expression was induced through exposure to the alkylating agent methyl methane- sulfonate (MMS). Because these strains only differ in the number of tandem repeats, these differences in expression suggest that varying the number of tandem repeats results in expression level differences. The expression levels were normalized to the expression levels of the gene ACT1. Error bars denote standard error normalized by the respective wild type. *p ≤ 0.05 compared with wild type.
Figure 7c
Figure 7c. Growth curves for the SDT1 mutant series in the presence of 0.03 mg/mL 6-azauracil (6-AU). The length of the lag phase exhibited in each strain compared to the others corresponds to the gene expression levels of the strains relative to each other. This suggests that varying the number of tandem repeats results in phenotypic differences in addition to expression level differences.

Discussion

The results presented here suggest that tandem repeats in promoter regions are hyper-variable and affect transcriptional activity through multiple mechanisms. Natural strains of S. cerevisiae show significant differences in the size of their tandem repeats, whereas the DNA sequences immediately surrounding the repeats are largely conserved, indicating that these repeats are variable. Expression levels change as the number of repeat units is altered, which suggests that tandem repeats are involved in the regulation of transcriptional activity. In addition, these variations in expression correspond to phenotypic differences. One possible mechanism through which tandem repeats may regulate gene expression is transcription factor binding, shown through the reduced expression that is seen in YKL071w strains where the gene coding for the transcription factor Yap1 has been deleted. The variable nature of tandem repeats allows for fluctuation of gene expression levels and quick evolution to novel environments.

Variability in S. cerevisiae Promoter Regions

For millions of years, the majority of genes in different S. cerevisiae strains have been conserved, making any difference in DNA sequence between these strains noteworthy.21 Our finding that many of the genes with tandem repeats in the promoter region varied in the number of repeat units between strains was, therefore, unexpected and interesting when considering how conserved the strains are otherwise (Table 2). This discovery suggests that variability is actually desired in these promoters, and the unstable nature of tandem repeats provides an excellent vehicle by which the promoter region can expand and contract.4 A possible compensating benefit to the instability of these repeat regions could be swift adaptation of gene expression levels to changing environments and selective pressures. In this fashion, tandem repeats in promoter regions may provide a functional diversity in S. cerevisiae similar to that brought about by tandem repeats in coding regions in multiple species, including bacteria, dogs, yeast, and humans.4, 22, 23
Transcriptional Regulation through Tandem Repeat Variability

Tandem repeats in coding regions are known to confer variability in proteins.8 This leads one to hypothesize that tandem repeats in promoter regions might also serve a biological function. Indeed, this study shows that changes in repeat size correlate with differences in transcriptional activity. In YKL071w, gene expression levels rise as the number of repeat units increase. Because these strains were created in an isogenic background, the only difference between them is the number of repeat units, suggesting that the alteration of the repeat region is the cause of the deviation in gene expression from that of the wild type. Similarly, the SDT1 mutant series exhibits gene expression levels that increase as the number of tandem repeat units increase until the repeat reaches an optimal size, and then decrease with enlargement of the repeat beyond this size. These strains also only differ in the number of tandem repeats in the promoter region, indicating that the variation in the repeat region drives the change in transcriptional activity. Growth assays in 6-AU with the SDT1 strains showed that the length of the lag phase in each strain corresponded to the pattern seen in the gene expression levels. This correlation insinuates that the difference in repeat size also affects the phenotypic fitness of the respective strain. A shorter lag phase corresponded to a higher level of expression. This finding suggests that a shorter lag phase may indicate that activation of transcription occurs earlier than in strains exhibiting a longer lag phase and lower gene expression. Therefore, a different number of repeat units may allow certain strains to adapt more quickly to environmental stress through rapid transcriptional activation.

Mechanisms by which Tandem Repeats Regulate Transcription

Transcription factor binding motifs overlap with tandem repeats in 10% of the genes containing intra-promoter repeats (Table 2). YKL071w, whose repeat region overlaps with binding sites for the transcription factor Yap1, is one of these genes. YAP1 is a vital gene in the oxidative stress response of cells and is activated by oxidizing agents, such as hydrogen peroxide.20, 24 While the exact function of YKL071w is unknown, it is thought to also be involved in the oxidative stress response and shows high induction in the presence of hydrogen peroxide.15 As the number of repeat units in the YKL071w promoter region increase, and therefore, the number of Yap1 binding sites increases, gene expression levels rise. This suggests that Yap1 binds the motifs found in the repeat region, and that as more transcription factors are bound, transcriptional activity is enhanced. Strains containing a deletion of YAP1 had significantly lower expression compared to the respective strains with an intact YAP1 open reading frame, strengthening the claim that Yap1 binding is necessary for high levels of gene expression in YKL071w. These results suggest that transcription factor binding is a potential mechanism by which tandem repeats regulate transcription.

Because only 10% of these genes overlap with transcription factor binding sites, there must be other mechanisms by which transcriptional activity is controlled. The variability of tandem repeats may affect transcription spatially as a sort of on/off switch. Changing the number of repeat units could alter the distance between regulatory elements in the promoter region and thereby turn transcription on or off in the respective gene. Another likely mechanism by which tandem repeats could regulate gene expression is nucleosome positioning.25 Variation in the size of the repeat region may affect histone binding, allowing a gene to be turned on or off depending on how tightly the region was bound. If the DNA was bound tightly, transcription would be hindered, and a loose interaction would allow uninhibited transcription. Studies are currently under way to investigate these mechanisms.

Conclusion

Little is known about the role of tandem repeats in promoter regions. The results of this study begin to characterize the biological function of these repeats. This thesis demonstrates that repeats confer variability and “evolvability” of transcriptional regulation in S. cerevisiae. I show that intra-promoter tandem repeats are hyper-variable in natural strains, and that this variability affects the transcriptional activity of the respective gene, potentially through transcription factor binding along with other mechanisms. The findings outlined above suggest that repeat regions are desirable in certain genes because they allow quick evolution and adaptation to external conditions.

Continued study is needed to determine the precise role of repeats in the regulation of transcription. These results could provide insights into the function of intra-promoter repeats in other eukaryotes, including humans. In humans, fragile-X syndrome exhibits upregulation of FMR1 due to an expansion of the tandem repeat in the promoter region until the repeat reaches an optimal size, where expression is then silenced.26 This same pattern was seen in SDT1.  Increased knowledge of the mechanism behind this phenomenon could be crucial in understanding and perhaps even preventing such diseases.

References

“Human Genome Project”.  (March 26, 2008). 1 May 2008. <http://www.ornl.gov/sci/techresources/Human_Genome/project/info.shtml>.

Orgel, L.E., and F.H. Crick. “Selfish DNA: The Ultimate Parasite.” Nature 284 (1980): 604-07.

Verstrepen, Kevin J., Todd B. Reynolds, and Gerald R. Fink. “Origins of Variation in the Fungal Cell Surface.” Nat Rev Micro 2.7 (2004): 533-40.

Verstrepen, Kevin, et al. “Intragenic Tandem Repeats Generate Functional Variability.” Nat Genet 37.9 (2005): 986-90.

John, B.  Heterochromatin Molecular and Structural Aspects. Ed. R.S. Verma: Cambridge University Press, 1988. 1–147.

Wyman, A.R., and R. White. “A Highly Polymorphic Locus in Human DNA.” Proc Natl Acad Sci USA 77.11 (1980): 6754-8.

Thomas, Elizabeth E. “Short, Local Duplications in Eukaryotic Genomes.” Current Opinion in Genetics & Development 15 (2005): 640-44.

Csink, A.K., and S. Henikoff. “Something from Nothing: The Evolution and Utility of Satellite Repeats.” Trends Genet 14.5 (1998): 200-4.

Blackburn, E.H., and J.G. Gall. “A Tandemly Repeated Sequence at the Termini of the Extrachromosomal Ribosomal RNA Genes in Tetrahymena.” J Mol Biol. 120.1 (1978): 33-53.

Tyler-Smith, C., and H.F. Willard. “Mammalian Chromosome Structure.” Curr. Opin. Genet. Dev. 3 (1993): 390-7.

Morgante, M. “Plant Genome Organisation and Diversity: The Year of the Junk!” Curr Opin Biotechnol 17.2 (2006): 168-73.

Epplen, J.T., W. Mäueler, and E.J. Santos. “On GATAGATA and Other “Junk” in the Barren Stretch of Genomic Desert.” Cytogenet Cell Genet 80.1-4 (1998): 75-82.

Galimberti, Daniela, et al. “Association of a NOS1 Promoter Repeat with Alzheimer’s Disease.” Neurobiology of Aging In Press, Corrected Proof (2007).

Bell, P.J., et al. “Tandemly Repeated 147 bp Elements Cause Structural and Functional Variation in Divergent MAL Promoters of Saccharomyces cerevisiae.” Yeast 13.12 (1997): 1135 – 44.

Gasch, A.P., et al. “Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes.” Mol Biol Cell 11.12 (2000): 4241-57.

Nakanishi, Toshiyuki, and Kazuhisa Sekimizu. “SDT1/SSM1, a Multicopy Suppressor of S-II Null Mutant, Encodes a Novel Pyrimidine 5’-Nucleotidase.” J. Biol. Chem. 277.24 (2002): 22103-06.

Legendre, M., et al. “Sequence-Based Estimation of Minisatellite and Microsatellite Repeat Variability.” Genome Res. 17.12 (2007): 1787-96.

Goffeau, A., et al. “Life with 6000 Genes.” Science 274.5287 (1996): 546, 63-7.

Benson, G., et al. “Tandem Repeats Finder: A Program to Analyze DNA Sequences.” Nucleic Acids Res. 27 (1999): 573–80.

Fernandes, L., C. Rodrigues-Pousada, and K. Struhl. “Yap, a Novel Family of Eight bZIP Proteins in Saccharomyces cerevisiae with Distinct Biological Functions.” Mol Cell Biol. 17.12 (1997): 6982-93.

Kellis, Manolis, et al. “Sequencing and Comparison of Yeast Species to Identify Genes and Regulatory Elements.” Nature 423.6937 (2003): 241-54.

Martin, P., et al. “Microsatellite Instability Regulates Transcription Factor Binding and Gene Expression.” Proc Natl Acad Sci U S A 102.10 (2005): 3800-4.

Fondon, J.W., and H.R. Garner. “Molecular Origins of Rapid and Continuous Morphological Evolution.” Proc Natl Acad Sci USA 101.52 (2004): 18058-63.

Kuge, S., N. Jones, and A. Nomoto. “Regulation of YAP-1 Nuclear Localization in Response to Oxidative Stress.” EMBO J. 16.7 (1997): 1710-20.

Wang, Y.H., et al. “Long CCG Triplet Repeat Blocks Exclude Nucleosomes: A Possible Mechanism for the Nature of Fragile Sites in Chromosomes.” J Mol Biol. 263.4 (1996): 511-6.

Usdin, K. “The Biological Effects of Simple Tandem Repeats: Lessons from the Repeat Expansion Diseases.” Genome Res. 18.7 (2008): 1011-9.

Gietz, R.D., and R.H. Schiestl. “Transforming Yeast with DNA.” Methods in Mol. Cell. Biol. 5 (1995): 255-69.

Boeke, J.D., et al. “5-Fluoroorotic Acid as a Selective Agent in Yeast Molecular Genetics.” Methods Enzymol. 154 (1987): 164-75.

Comments Closed

NO COMMENTS