The Dynamic Genome and Transposable Elements
Transcript of Part 2: How TEs Amplify throughout the Genome
00:00:00.00 My name is Sue Wessler. I am a Professor of Genetics at the University of California, Riverside. 00:00:04.20 And my lab studies transposable elements. 00:00:07.08 The title of these two presentations are The Dynamic Genome. 00:00:11.12 In the first talk I introduced transposable elements by describing their discovery by Barbara McClintock, 00:00:21.02 how they move, and how that discovery over the years was recognized as a major revolution in biology 00:00:28.25 as it became appreciated that transposable elements 00:00:31.27 are the major component of most of the genomes of higher eukaryotes. 00:00:37.16 In this talk I am going to go into detail about how my lab 00:00:41.20 studies the evolutionary impact of transposable elements on genomes. 00:00:47.02 And how we develop strategies to identify elements 00:00:51.14 that have an impact on transposable elements. 00:00:54.26 I have divided this talk into three parts. 00:00:57.26 In the first part I talk about the transition from genetic approaches to genomic approaches 00:01:04.14 in order to identify elements that in fact impact genome evolution. 00:01:11.19 The elements that were discovered in my lab are called MITEs, 00:01:15.27 and I will tell you about that discovery in the second part of this talk. 00:01:20.01 And in the final part of the talk I will tell you about how MITEs 00:01:25.22 are able to increase their copy number in the genome without harming the host significantly. 00:01:32.22 So to review the first part, we talked about the genetic analysis of transposable elements, 00:01:41.21 how genetic analysis led to the discovery of transposable elements, 00:01:47.06 and I used this spotted corn kernel as an example to tell you really how powerful the genetic analysis was. 00:01:53.12 So when you see a spotted corn kernel like this, 00:01:57.05 we know, the geneticist knows, that the reason that kernel is spotted 00:02:02.07 is because there are active transposable elements. 00:02:06.08 There are... that is the spots reflect the movement of transposable elements. 00:02:10.05 The other thing the genetics tells us is exactly where in the genome that active transposable element is. 00:02:18.23 So for example, here when we are looking at spotted corn kernels, 00:02:22.13 we know that there is an active transposable element, in other words, one that is capable of moving 00:02:27.11 in a gene responsible for kernel pigmentation. 00:02:30.14 The other thing that the genetics tells us is the type of element that's there. 00:02:39.02 And I described at the beginning of the first talk the difference between autonomous elements, 00:02:45.18 that is, ones that encode transposase, and non-autonomous elements. 00:02:48.25 Those are the elements that don't make transposase 00:02:51.27 but are able to move if there is an autonomous element in the genome. 00:02:55.24 McClintock and others were able to deduce this just by looking at the behavior 00:03:02.15 of transposable elements in crosses. 00:03:05.11 Now that is the good news about the genetic analysis. 00:03:09.15 Unfortunately the genetic analysis is limited in its scope. 00:03:14.26 And that is that by its very nature genetics depends on the analysis of mutant alleles, 00:03:20.23 and so the transposable elements that were being studied were the ones causing mutations. 00:03:28.02 They were mutagenic elements. 00:03:31.04 Now because these elements cause mutations, there aren't many copies of them in the genome. 00:03:36.28 So, I mentioned in the first talk that genomes are up to 50-80% 00:03:42.28 of the genome sequence is derived from transposable elements. 00:03:45.04 However, the elements that cause mutations are not those elements. 00:03:51.04 And you can understand that an element that causes mutation is eventually, 00:03:55.17 if its copy number increases too high, will kill the host. 00:03:59.00 So these are.... these are a special class of transposable elements that cause mutations. 00:04:02.15 And as such these elements really have a minimal impact on genome evolution. 00:04:10.12 They're bad. They are really bad. 00:04:12.13 So McClintock as I also described in the first talk, 00:04:18.07 not only discovered transposable elements, 00:04:21.11 but she hypothesized that they were also tools that diversify organisms. 00:04:29.14 And to review, she hypothesized that transposable elements that are in the genome 00:04:35.15 do not move around frequently, 00:04:38.02 that there are conditions, such as changes in climate for example, 00:04:42.09 that could activate transposable elements. 00:04:45.21 that this activation would generate genetic diversity in the population 00:04:51.05 by increasing the frequency of mutation, 00:04:57.15 and that some of these transposable element mutations may be adaptive. 00:04:59.21 I will come back to this scenario at the end of the talk 00:05:02.16 when I show you how the elements that we have identified 00:05:08.00 in plant genomes fit this scenario very, very nicely. 00:05:11.08 So to review, I described in the last talk the two general classes of transposable elements in the genome. 00:05:19.23 The first class, which is called Class 2, 00:05:24.02 are DNA transposons. These are the elements that were discovered by McClintock genetically. 00:05:30.09 We know these elements have a typical structure of terminal inverted repeats, 00:05:34.27 that they encode a single protein necessary for the movement of the element, and that is transposase. 00:05:40.02 The other class of elements which I am not going to go into extensively in this talk, 00:05:46.06 and nor did I in the last talk, are called retrotransposons. 00:05:49.26 These are elements that encode reverse transcriptase, 00:05:53.20 and they move through an RNA intermediate, again by mechanisms that were described in the last talk. 00:05:59.11 I also want to re-introduce you to transposable elements families 00:06:04.02 because we are going to revisit families in this talk. 00:06:07.08 That transposable element families contain autonomous elements. 00:06:11.01 That is the elements that encode transposase. 00:06:13.13 And non-autonomous elements these are elements that don't encode transposase, 00:06:18.28 but are able to move utilizing the transposase that's encoded by the autonomous element. 00:06:24.26 Something else that I discussed in the first talk is just how prevalent transposable elements are in genomes. 00:06:33.00 So this was a human gene where I showed you the exons that are in the gene, 00:06:39.21 and this is really a pretty typical gene. And what we find is that 00:06:44.15 in the non-coding regions, there are many, many, many transposable elements, 00:06:49.07 that some human genes have over a hundred transposable elements in their introns. 00:06:54.14 Now the element that I am referring to, most of the elements in the human genome are called Alu. 00:07:01.13 They are a class 2 retrotransposon which are present at an astonishing copy number of over a million copies. 00:07:09.13 It's almost ten percent of the human genome. 00:07:12.25 Now one of the things, if we are interested in the evolutionary impact of a particular transposable element, 00:07:18.25 one question we could ask is, so for example, if we picked out one of these elements, 00:07:24.29 we could ask what happened when it inserted? 00:07:28.01 Did it change the expression of the gene? 00:07:30.08 These are the questions, if we are interested in the evolutionary impact, 00:07:33.03 these are the questions that we would like to be able to address. 00:07:35.29 Unfortunately we can't do that with the human elements, most of the humans elements. 00:07:42.02 And that is that the insertions may have changed gene expression, 00:07:47.20 but we have no way to address that now. And the reason is 00:07:51.02 first of all in the human population, virtually all of us, 99.99% of us, have these insertions, 00:08:00.22 have exactly these insertions. That's because these elements moved millions of years ago. 00:08:06.21 So what that means is that if we want to know how did the insertion 00:08:11.01 of a particular element change the expression of a gene, if at all, we are too late. 00:08:16.05 So what we want to do is identify a group of organisms 00:08:23.13 where these high copy number elements are actively transposing. 00:08:29.00 And I am going to talk about that strategy- that's exactly what this talk is about. 00:08:32.28 So here is our strategy for analyzing the impact of transposable elements on genome evolution. 00:08:41.03 And this is a figure from the previous talk which is sort of a typical region of a genome, a grass genome, 00:08:49.21 and this is from the barley genome, 00:08:51.28 and the blue boxes are genes and the triangles are transposable elements. 00:08:55.25 So in barley about 85% of the genome is derived from transposable elements. 00:09:01.16 So the strategy that we would like to do to identify evolutionarily relevant transposons 00:09:07.10 is to find a species that is in the midst of genome expansion. 00:09:12.04 So where these high copy number elements are moving, are increasing their copy number. 00:09:17.24 And we would then go ahead and identify and isolate an active element. 00:09:22.28 So this is not one of the mutagens that was identified by the geneticists, 00:09:26.09 but in fact these are the high copy number elements that are now increasing in copy number. 00:09:32.27 Ok, so we would then ask the question, 00:09:35.27 how is this element able to increase its copy number so extensively without harming the host? 00:09:43.23 What are its strategies for success? And success in this case is defined by being able to increase 00:09:50.21 your copy number without killing or harming the host, and possibly even by benefiting the host in some way. 00:09:57.17 And we are going to address all of those issues in this talk. 00:10:00.02 So in the first talk we talked about the discovery of transposable elements in maize. 00:10:06.23 Well, maize is a member of a larger group of organisms. It's a grass. 00:10:12.21 These are the most important organisms for human health, for the human diet. 00:10:18.29 More calories come from members of the grass clade than any other group of organisms on this planet. 00:10:25.11 We are familiar with maize. The other members of the grass clade is: rice, 00:10:30.06 which actually is the most important source of human calories, 00:10:34.05 sorghum, which is also a very important crop plant especially in Africa, 00:10:38.13 and finally barley. Another member of this family, which I am not showing is wheat. 00:10:45.00 So what you would notice here, those numbers are the size of the genome. 00:10:48.28 The maize genome is about the same size as the human genome, at 2500 megabasepairs. 00:10:55.03 The rice genome is much, much smaller. It is almost ten-fold smaller. 00:10:59.21 And what is remarkable is that here are these plants that are so incredibly similar, 00:11:05.15 yet their genomes size differs dramatically, by more than ten-fold. 00:11:11.11 So these organisms diverged from a common ancestor only about 70 million years ago. 00:11:17.19 And the main reason for this difference in genome size is this dramatic amplification, 00:11:23.19 expansion of transposable elements. 00:11:26.09 And this slide helps explain in part how that can happen. 00:11:30.27 These organisms, the grasses, in fact have about the same gene number. 00:11:37.05 They have about 30,000 genes, give or take a few 1000. 00:11:41.09 And so the genomes of these organisms are largely syntenous. The genes are mostly in the same order. 00:11:48.16 So in rice you could see, with smallest genome, I've shown three genes there in pink, yellow and blue 00:11:53.24 that are pretty close together. In maize those genes are further apart. 00:12:00.03 And in rice those genes, I'm sorry, in barley those genes are even further apart. 00:12:04.22 So what's happening here is transposable elements, 00:12:08.03 which are the squares and circles and ellipses in between, 00:12:14.12 transposable elements are inserting massively between the genes and expanding the genome. 00:12:21.09 So this is largely responsible for the difference in genome size. 00:12:27.19 And it's the safe havens that transposable elements can go without harming the host. 00:12:33.13 So I want to show you a little bit at a higher resolution 00:12:39.01 to show you what elements are involved. 00:12:44.21 So I introduced to you before that there were two types of transposable elements. 00:12:49.11 There were Class 1 elements which are retrotransposons, 00:12:52.18 and then Class 2 elements which are DNA transposons. 00:12:55.24 The retrotransposons which are generally these big elements that make RNA copies. 00:13:01.12 That RNA copy is then made into double stranded DNA. 00:13:04.22 The double stranded DNA can insert back into the genome. 00:13:08.02 It almost make copies like a printing press, like an old fashioned mimeograph machine, 00:13:12.05 which most of you probably never experienced. 00:13:14.24 So what you see here is that the huge blocks could be hundreds of kb 00:13:20.13 that separate some genes in the grass genome 00:13:23.07 are largely retrotransposons that are inserted into each other,` literally driving the genes apart. 00:13:29.08 So as we say it's almost like genes sitting in a sea of transposable elements. 00:13:33.13 Now these are not the elements that we are going to talk about today. 00:13:36.13 Instead we are going to talk about elements that I think are probably more involved in diversifying the genome. 00:13:44.08 And that is, so what I have done here is I've blown up the area of one gene 00:13:50.10 and broken it up into its exons and introns. 00:13:53.03 And I've shown you that sitting in plant genes, much like the Alu elements that are inserted in human genes, 00:13:59.25 are little elements, little transposable elements called MITEs. 00:14:03.06 These are DNA transposons. They are non-autonomous elements 00:14:07.06 and in the next few slides I am going to tell you a little bit more about MITEs 00:14:09.20 because they were discovered in my lab at least 2 decades ago. 00:14:14.16 So the way the first MITE was discovered was it was an insertion sitting in a mutant gene. 00:14:23.28 And this is the work of several people in my lab, especially Tom Bureau and Rita Varagona. 00:14:30.15 What you see here is a maize gene, and sitting in it is a little DNA transposon, 00:14:36.29 disrupting the gene, causing a mutation. 00:14:39.12 When Tom Bureau isolated this transposon, like I said, 00:14:45.19 this was back in the early 90s, when he isolated the transposon, 00:14:47.20 he took the sequence and compared it to several other maize genes or plant genes 00:14:53.08 that had been deposited into databases. This was before you did BLAST searches. 00:14:57.17 This was in the early 90s when there were some wildtype genes that had been sequenced. 00:15:02.16 And what he found was that in fact sequences like this little element 00:15:06.00 were present in many of the other wildtype genes. 00:15:09.28 And I am showing that here. So what you see is here are some wildtype genes 00:15:13.07 that were revealed by this computer search. 00:15:17.10 And so for example, this gene has an insertion in an intron. 00:15:21.17 So these are normal genes. So these insertions are in non-coding regions. 00:15:24.19 They are not effecting the expression of the gene. 00:15:26.13 This one is in the 5' promoter region. 00:15:28.13 And the one at the end here is in the 3' region. 00:15:30.16 So he discovered these elements, they are called... MITEs stand 00:15:35.09 for miniature inverted repeat transposable elements. 00:15:38.09 What is similar about these elements is their structure. 00:15:42.04 Their sequence may not be similar and is not similar when you go from organism to organism. 00:15:46.10 But these are the most predominant transposable element type associated with the genes of plants. 00:15:53.05 So let me tell you where MITEs fit in in a transposable element family. 00:15:59.03 I told you before about autonomous elements. I told you about non-autonomous elements. 00:16:03.21 MITEs are non-autonomous elements, 00:16:08.24 but they have no coding capacity. So in this case I have shown them. 00:16:14.07 They look like they are a deletion derivative of the autonomous element, 00:16:18.02 but have none of the coding sequence of the transposase. 00:16:21.01 That is one possibility. The other thing, first of all, MITEs are very, very small. 00:16:24.24 Very short. And they can attain very high copy numbers. 00:16:29.07 So where as most non-autonomous elements in the genome may be five or ten of them, 00:16:33.24 maybe up to 50, there can be 1000s of MITEs. 00:16:37.03 I am going to talk more about that in a bit. 00:16:40.01 Here's an example of MITEs that look like an autonomous element that's in the genome. 00:16:44.25 Its terminal inverted repeats, its ends, are very, very similar. 00:16:48.21 So we can look at this and say, Ah! This autonomous element must move these MITEs. 00:16:52.15 We've done lots of experiments over the years to validate that. 00:16:55.22 There are other MITEs in the genome that don't look like any other element that's in the genome except for the ends, 00:17:03.20 except for the terminal inverted repeats. 00:17:05.04 And I think you might remember from the first talk that the terminal inverted repeats 00:17:08.18 are critical because that is where the transposase binds and facilitates the transposition of the element. 00:17:15.05 So MITEs are short, miniature inverted repeat transposable elements, 00:17:21.07 I won't say that anymore, I'll just say MITEs from now on. 00:17:23.15 They are short elements. They can attain very, very high copy numbers unlike most other DNA transposons. 00:17:31.25 And that is what is relevant here. That its a high copy number elements that I will argue 00:17:37.04 are the ones that have an impact in diversifying genomes. 00:17:40.28 Not the lower copy number elements that McClintock and others had discovered cause mutations. 00:17:46.14 So fortunately, MITEs are not restricted to plant genomes. 00:17:53.12 And I say fortunately because a lot of my work over the years was funded by the National Institutes of Health, 00:17:57.26 and if you are working on a plant system, it's nice to be able to say that what you find in this plant 00:18:04.23 will be relevant to human health. And we all try to say that. 00:18:08.24 So in this case this is a composition of transposable element composition of a mosquito genome, Aedes aegypti. 00:18:17.12 And it turns out that about 16% of this genome is due to... is derived from MITEs. 00:18:27.05 From transposable elements. 00:18:28.12 So what I would say in my grant application is that by understanding how MITEs move, 00:18:33.23 and how they increase their copy number, 00:18:36.03 in a plant genome we can extrapolate... 00:18:40.06 we can use that information to understand how they expand in animal genomes. 00:18:46.06 And there are MITEs in zebrafish, and in most higher eukaryotes, but none of them to date have been shown to be moving. 00:18:53.05 So the other thing about MITEs that's really relevant is that they are preferentially in genic regions. 00:19:02.19 Now remember again from the first talk we said that very small percentage of the genomes of plants are genic. 00:19:11.12 So it may just be like 10 or 20% are where genes are, but MITEs preferentially go into genic regions. 00:19:19.09 And we will talk a little bit later about that preference. 00:19:22.08 So we have this situation here where we have these very high copy numbers, 00:19:28.01 so we have elements that expand, that increase their copy number, 00:19:32.22 and they go into genes, but they are not killing the host. 00:19:36.07 So how do they do that? So not only do they not kill the host, but they seem to be beneficial. 00:19:44.26 So what I've shown here are two examples of wildtype genes. 00:19:48.29 In the first one we see the red exons, and what I am showing is that transcription, 00:19:55.02 the sequences that initiate transcription, 00:19:59.08 are actually derived from the MITE sequence that is in the promoter region. 00:20:02.22 Similarly, there are other MITE examples of wildtype genes 00:20:07.22 that have MITEs that in fact carry the sequences for transcription termination. 00:20:11.12 So MITEs carry in some cases, regulatory sequences that are used by genes. 00:20:17.12 MITEs also contribute to allelic diversity. 00:20:22.08 So what I have shown here is the same gene 00:20:24.00 or alleles of the same gene, one with a MITE and one without. 00:20:27.24 So here would be a wonderful example to be able to say, okay, 00:20:32.16 I have a gene without a MITE, I have one with a MITE, what is... how do they differ in expression? 00:20:37.14 And that may be able to tell us what is the impact of insertion of that MITE. 00:20:42.26 Unfortunately, when we look at a database, and we harvest all of these related sequences and genes 00:20:52.14 it turns out that these genes don't just differ, the alleles don't just differ by the presence or absence of the MITE sequence. 00:21:00.25 They have many other single nucleotide polymorphisms, indels, that differentiate them. 00:21:06.19 And this indicates that this insertion happened a very long time ago. 00:21:12.08 So these are essentially dead elements, and really we can't tell by 00:21:17.22 comparing the expression of these two genes what the impact of the MITE was 00:21:22.07 because there are lots of other differences between those two genes. 00:21:25.00 So as I said, these are old insertions. 00:21:28.28 Okay, so what we need are active MITEs in order to understand how they... 00:21:34.23 and we need to catch them in the act of increasing their copy number. 00:21:38.15 So what we need is a situation like this, and I am going to come back to this later, 00:21:42.06 and that is two genes that differ only in the presence or absence of the MITE. 00:21:47.05 Then we can compare the two genes and say, okay, 00:21:49.21 if this one for example is expressed in the roots and the leaves, and this one is just expressed in the roots, 00:21:54.17 we can say that the MITE sequences allowed this gene to be expressed in a different tissue. 00:22:01.03 It diversified its expression. 00:22:02.23 Ok. So we want alleles that only differ by the presence or absence of the MITE. 00:22:08.28 So now I am going to.... the next part of the talk is that quest, 00:22:12.19 that search for active MITEs. 00:22:15.23 So what I am showing here is a phylogenetic tree. 00:22:21.19 And it's called a star phylogeny. So the way we interpret these, 00:22:26.17 so what's done is you take all of the MITE sequences that are in a genome, 00:22:30.22 you put it into a computer program, and it generates a tree 00:22:35.18 that tells you how these sequences are related to each other. 00:22:38.11 So this is, what you see, what this tells us, the story that this tree tells us 00:22:43.25 is sometime long, long ago, there was a single element or a couple of related elements. 00:22:48.14 They increased their copy number. They were all identical. They increased their copy number. 00:22:54.18 And then somehow transposition stopped and over the thousands or millions of years these sequences drifted. 00:23:03.04 They accumulated mutations and now you see this star phylogeny. 00:23:06.29 So that's a story that this tree tells us. 00:23:09.14 And that's a typical MITE family tree. 00:23:12.29 Okay, so what it tells us is that MITEs amplify rapidly from one or a few nearly identical copies. 00:23:21.23 Ok. So if we look at a typical, the genome of a higher plant or animal, 00:23:30.25 what we see are lots of bursts. 00:23:35.22 And I have, for convenience, I am showing the same tree that I have cut and pasted 00:23:40.13 but in fact each of these trees should be different 00:23:43.04 since it is a different MITE that started out as one copy and then increased their copy number. 00:23:47.06 So forgive me because these trees are not easy to draw. 00:23:50.04 So you see there are just, genomes are filled with MITEs, tens of thousands of them 00:23:56.14 that all started from a few elements, and burst, but over evolutionary time. 00:24:02.06 So in order to understand how those elements amplified without killing the host 00:24:09.26 we need to, as I said before, catch a MITE in the act of bursting, 00:24:13.17 and that is that central region, this red circle here, 00:24:17.06 where the element is rapidly amplifying, and that is what the rest of this talk is about. 00:24:24.08 So in order to identify active MITEs, my lab had to switch directions. 00:24:31.25 And I think this happens frequently that sometimes the organism you work on 00:24:36.13 isn't ideal for the questions that you want to address. 00:24:40.02 And so this was from the first talk I showed you- 00:24:42.23 this was the first maize genetics group, or the maize genetics group, 00:24:45.15 R. A. Emerson's group at Cornell. And there is a picture of Barbara McClintock at the end. 00:24:51.03 And here's the spotted kernels that she used to discover transposable elements. 00:24:56.14 And I mentioned that this is a shed in the Cornell plantation. 00:25:00.21 And this picture was taken in 1929. 00:25:03.00 Well, what we did in 2002 is we got a new group of researchers together, 00:25:09.02 involving Susan McCouch at Cornell, Sean Eddy, who at that time was at WashU, 00:25:16.05 Zhirong Bao, many other people, 00:25:18.27 and we took a picture in front of the same shed in the Cornell plantation. 00:25:23.28 And so this is our collaborative group that focused on identifying active MITEs, 00:25:30.01 and to do that we had to switch organisms, and we switched to rice. 00:25:35.02 And I will tell you in a second why we did that. I think I mentioned that maize has this really, really large genome 00:25:41.15 of 2500 megabasepairs. That it's about the same size as the human genome, very dynamic, very complex. 00:25:49.26 The rice genome is significantly smaller, almost, about 6 fold smaller. 00:25:55.17 And it is of this group of grasses that I talked about before. 00:26:01.02 It has the smallest genome of the cereal grasses, 350 megabases. 00:26:06.24 And for that reason, and plus because it is so important to human health, 00:26:12.07 it was the first grass genome that was completely sequenced. 00:26:14.29 And for us to do this project we required the complete genomic sequence in order to identify the active MITE. 00:26:23.22 We weren't going to use a genetic approach. We were going to use a computational approach, 00:26:26.24 and that is what I am going to talk about. 00:26:28.17 So here is the strategy. 00:26:32.17 We had the complete genome sequence of rice, and when I say we, 00:26:37.12 the person mainly responsible for this- two people- Zhirong Bao who is a graduate student in the lab of Sean Eddy, 00:26:43.28 who, as I said, was at WashU at that time. 00:26:45.26 And Ning Jiang who was a graduate student in my lab. 00:26:49.25 Zhirong Bao had devised a computer program called RECON. 00:26:56.03 What this program does is it takes.... so what we are looking for... 00:27:01.08 We are trying to find an element, a MITE, in the genome sequence, that has the features of an active transposon. 00:27:08.13 What are those features? First of all, a transposable element will have many copies. 00:27:13.20 So we are looking for something that has multiple copies, 00:27:15.24 but we are looking for something where those copies were generated very recently. 00:27:24.04 So those copies, remember, we are looking at the red region of that phylogenetic tree, 00:27:28.13 those copies should be identical or nearly identical. 00:27:31.18 Because when an element duplicates the two copies are identical, and over time they drift, 00:27:37.07 and that is what the star phylogeny is there. 00:27:39.25 So what was done, we used, we took the rice genome sequence, 00:27:43.05 compared it to itself, and in that way identified high copy number repeats. 00:27:49.15 And in about three thousand repeats were identified. These could be genes. 00:27:53.23 These could be transposable elements. 00:27:55.03 Now then the human being has to come in, or the human being came in devising the RECON protocol, 00:28:01.16 but what you have to do, and this was done by Ning Jiang, 00:28:04.28 she manually searched each of these three thousand families to try to find a sequence that looked like a MITE. 00:28:13.22 And she found one, obviously, or I wouldn't be talking about it now. 00:28:16.24 K, so what she found was a family that had fifty-one nearly identical copies in the sequenced genome, 00:28:25.00 which is called Nipponbare. That is the name of the strain. 00:28:27.28 There were 51 copies in this genome. 00:28:29.25 And it had the structure of a MITE. 00:28:31.18 And here it is. It is called mPing, and it is 430 basepairs in length. 00:28:37.24 So the problem though is that when you do computational analysis, 00:28:44.17 and you identify something that looks like it should be active, 00:28:48.20 you've got to go back to the bench and prove that it is active. 00:28:52.11 This is what we call a candidate. It's a candidate in development. 00:28:55.11 You've got to then do an experiment that validates, that shows it moving around. 00:29:00.26 So what Ning did was she took cells that were in the freezer for 4 years. 00:29:06.27 She popped these cells out, and we had a cell culture. 00:29:11.11 So cell culture is an environment where DNA... where transposable elements have been shown to move around 00:29:19.06 in other situations. So what we had is we had the DNA from the plant, two plants, 00:29:28.14 one before cell culture, and then after cell culture. 00:29:31.28 The question we are asking is can we see the movement of the mPing element. 00:29:36.00 And we use a technique called transposon display. This is what we used years ago. 00:29:40.22 Now all you have to do is sequence the genome. And we'll talk about that later. 00:29:44.17 But this was the technology available to us at the time. 00:29:47.01 And what you do is you make a primer, 00:29:49.07 and here we made a primer that was near the end of the mPing element. 00:29:54.08 And then what we are going to do is we are going to take genomic DNA, 00:29:57.10 we are going to cut it up with a restriction enzyme. 00:29:59.26 And we are going to put adaptors at the end of the genomic DNA fragments. 00:30:05.13 We are then going to do PCR using the primer from the end of the genomic fragment 00:30:11.10 and a primer from the mPing element. 00:30:14.15 And you resolve this on a gel, and that is what you see here. 00:30:18.16 So what you see in lanes 1 and 2 is a situation where it is a rice plant in one, 00:30:25.01 the DNA of the rice plant before cell culture, 00:30:26.18 and 2 is the DNA from the cell culture. 00:30:30.18 And all of the bands, and nothing is happening. You see one and two, they look exactly the same. 00:30:35.18 That's because the mPing element is not moving around in that cell culture. 00:30:40.28 Each of the bands comes from at one end of the band 00:30:45.13 is the mPing primer, which you see over here. 00:30:48.08 At the other end of the band is the region in the genomic DNA where the adaptor sequences is. 00:30:55.11 Okay, this is a modification of a technique called RFLP. 00:30:58.11 I am sorry, AFLP. 00:31:00.27 So anyway, lanes three and four are far more interesting. 00:31:04.23 And this is something that... there are... 00:31:09.08 Science is really slow, but there are these days you have which you always remember. 00:31:13.01 And this was a day. It was a Sunday morning and I turned on my computer and Ning had sent me this picture 00:31:19.10 that showed that the mPing element was moving around in the rice genome, 00:31:23.27 and it was, needless to say, it made my day. 00:31:25.14 So what you see in lane three is the plant before it went into cell culture. 00:31:32.01 And there are only a couple of copies of mPing in that strain. 00:31:35.14 However after cell culture there are hundreds of copies. 00:31:39.01 So what we can actually do is cut out those bands, re-amplify them, and sequence them, 00:31:43.28 and determine the position of insertion of mPing in the cell culture DNA. 00:31:49.10 So what I want to show you here is where mPing fits in. 00:31:56.05 mPing is a MITE. It is a non-autonomous element as are all MITEs. 00:32:01.13 It doesn't code for anything. It is only 430 basepairs in length. 00:32:03.28 Xiaoyu Zhang who was a graduate student in my lab took the mPing sequence 00:32:10.03 and BLASTed it, compared it to the entire rice genome sequence. 00:32:13.22 And he found a single transposon that looked like it was the autonomous element for this family. 00:32:22.23 So that is called Ping. 00:32:23.23 So we have Ping. We have mPing. 00:32:25.18 And there was only a single copy of that element in the entire Nipponbare genome. 00:32:30.24 And we went on over the years to show that the transposase from Ping is able to move mPing, 00:32:36.11 and I am not going to talk about that in this talk. 00:32:38.19 So what I want to tell you about is the copy number of mPing. 00:32:44.16 Because I've told you that MITEs are great because they attain these really high copy numbers 00:32:49.22 of hundreds to thousands. 00:32:51.09 And yet I am bragging about some element that has fifty copies. 00:32:55.16 That's not a whole lot. 00:32:56.10 And in fact when we look at a lot of strains... 00:32:58.17 what I've shown here is the Nipponbare, the sequenced genome, which has 51 copies 00:33:03.27 as Nb. And then when we look at a lot of other Japonica strains, 00:33:07.22 and this was done in collaboration with Susan McCouch. 00:33:10.06 We obtained the strain collection. 00:33:12.25 We see that there are very low numbers of mPing from 25 up to 38. 00:33:17.07 That's not the burst that I've talked about, which is... 00:33:23.22 So fortuitously another... two other groups in Japan had identified the mPing element, 00:33:33.24 but they identified it in plants, in living plants. 00:33:38.04 And I am going to talk about those plants in a second, but when we analyzed those plants, 00:33:42.26 we found that these four related land races, as I've shown here, had over 600 copies of the mPing element. 00:33:51.04 So this really said to us that here's an element that is capable and has increased its copy number very, very quickly. 00:34:00.04 But the next question, remember one of the things we are interested in, 00:34:03.13 is how can the element do this. 00:34:08.22 Here we have 4 strains where the mPing element had increased tremendously to 600 copies. 00:34:14.11 The question, what I want to show you in the next slide, 00:34:16.18 is just how closely related all of these strains are. 00:34:20.23 How the major difference between these strains is the mPing insertion sites. 00:34:27.10 And so what we can do, and again, this is another transposon display 00:34:32.17 where in the first lane, the Nb is Nipponbare, 00:34:36.07 with its sequenced genome, and where there are 50 copies of mPing. 00:34:40.27 Those are fewer than 50 copies, which I am not going to go into. 00:34:43.26 why that is. It's the way the experiment is done. 00:34:45.28 The next three lanes are three of the four land rices where the mPing element... where there are 600 copies of the mPing element. 00:34:51.21 And what you'll notice first of all is that the patterns are very, very different. 00:34:56.04 So the insertion sites for the elements are very, very different for each of those. 00:35:00.08 So from that we concluded that the mPing element had, at least at some stages, 00:35:04.08 amplified independently. These strains at some point, there probably was one strain 00:35:09.09 in which the mPing element had amplified, had activated, had become active. 00:35:13.02 And then land rices are strains of rice that farmers have grown in particular areas. 00:35:20.00 So these strains, they were then grown in particular areas, 00:35:23.00 They were kept separate. And now we are bringing them back together in the lab. 00:35:27.14 So what you see is the different patterns, but you can see that there are many more bands. 00:35:32.27 So we say that they burst independently in these three strains. 00:35:36.10 What I want to show in the next slide, not next slide, 00:35:38.10 but the next experiment is just how similar these strains all are. 00:35:42.19 So what you see here, is using, in the first, in the gel on the left, we've used the mPing element 00:35:51.21 the primer from the mPing element, remember I showed you that before, 00:35:55.12 in PCR. What we are using in this second panel is a primer from a different element. 00:36:02.04 So exactly the same genomic DNA preparations, 00:36:04.15 but the difference is one of the primers in PCR. 00:36:07.24 This is a primer from a retrotransposon called Dasheng, 00:36:11.12 which was also identified in my lab. There are about 1500 copies of this in the genome, 00:36:16.05 but what you see, again using the exactly the same DNA preparations, 00:36:21.01 is that the pattern for all of those strains is exactly the same. 00:36:23.14 So what that, or virtually the same. 00:36:26.05 So what that says is the major difference between these strains is the different insertion sites of mPing. 00:36:32.26 Now what I want to tell you is I want to show you 00:36:37.12 the experiment we devised, and this is Eunyoung Cho in my lab, devised this experiment 00:36:43.04 to see if the mPing element was still transposing in these high copy number strains. 00:36:50.13 And this is a, people in this area of transposon biology, 00:36:54.23 people will look at one organism and see transposons, and look at another and see them in different places. 00:37:00.06 And they'll say, okay, the transposon is moving. 00:37:02.28 They are probably not. What they are seeing is just polymorphism. 00:37:05.23 The only way you can really see transposition to say something is transposing, 00:37:09.25 is if you see it right before your eyes. 00:37:12.18 And that is what I am going to show you right here. 00:37:14.06 So what's done is we have ten. We have one plant that was grown up, one rice plant, 00:37:19.19 and from that plant Eunyoung took ten seeds. 00:37:22.29 Rice is a selfer. It self-pollinates, so all those seeds are virtually identical. 00:37:28.28 She planted the seeds, she grew them up. She isolated genomic DNA. 00:37:32.18 She did transposon display using the mPing primer. 00:37:36.24 And what you see are the ten lanes of the 10 individual plants there. 00:37:40.27 And you'll see differences. 00:37:42.23 You'll see differences, and I'll explain that in a minute. 00:37:46.25 So she took the last plant here, the very last one. 00:37:49.18 She took 10 seeds from that plant and she grew the next generation. 00:37:56.03 And she did the same thing, transposon display. 00:37:58.11 She took seed from the last plant here, I am sorry, ten seeds from that plant. 00:38:04.17 She grew it up. Here is the last generation. 00:38:06.23 So by comparing these three panels and by looking at the white arrows, the white empty arrows, 00:38:13.28 what you'll notice is that there may be... there's a band say in the F1. 00:38:20.12 Umm, that then in the next generation it is segregated. 00:38:25.02 Okay. So you see that band. That is a heritable insertion of mPing. 00:38:28.25 That is now segregating because when it inserts in the F1 generation, 00:38:32.28 it is heterozygous. And in the next generation we can see it segregating. 00:38:39.05 So essentially by looking at this what you can see is what I think is pretty remarkable. 00:38:41.25 That is this rapid increase in the mPing copy number over a very, very short period of time. 00:38:49.12 A couple of generations. 00:38:50.17 And so we can actually, just by simply counting bands, 00:38:53.22 we can determine how many new insertions there are per generation. 00:38:58.02 And that is what I will show you here. 00:38:59.26 So we had basically found that there were approximately forty new insertions per plant, per generation, 00:39:05.17 and that 80 percent of these insertions are heritable. 00:39:08.26 This blew us away. We had no idea that a transposable element could increase its copy number so rapidly. 00:39:17.28 So now we have the material to address the questions that I had... 00:39:23.09 one of the questions that I posed at the very beginning, 00:39:25.17 which is how do transposable element amplify without killing their host? 00:39:28.26 And then the second question I will end the talk with is does amplification actually benefit the host? 00:39:35.25 So the first one is the way were addressing this first question is where are the elements going? 00:39:43.03 Where are they in the genome? 00:39:44.17 So the way we... experimentally the way we address that question 00:39:49.15 was to take genomic DNA from these plants and essentially amplify out the ends of the element, 00:40:00.09 much like in transposon display, but we are not going to run a gel. 00:40:03.28 So we are taking genomic DNA, and you'll see in a second, 00:40:06.13 we are not taking it from one plant. We are taking it from a small population. 00:40:09.16 We are using the primer from the mPing element 00:40:12.18 and a flanking primer in an adapter and amplifying up all of these flanking regions, 00:40:20.24 and then instead of running it on a gel because we don't want to look at a few insertions, 00:40:24.16 we want to look at all the insertions, 00:40:26.06 we use high-throughput sequencing. In this case we used 454 sequencing 00:40:32.07 to sequence tens of thousands of... hundreds of thousands, flanking regions, regions flanking the mPing insertion sites 00:40:42.17 to try to find out where mPing is. 00:40:45.07 But more importantly, and this was an experiment done by Ken Naito 00:40:49.07 when he was a postdoc in my lab. 00:40:52.05 What he did is he figured out that the capacity for sequencing 00:40:57.29 is so great that we don't have to restrict ourselves to one plant. 00:41:01.21 He can actually determine where mPing is in a small population of plants. 00:41:06.25 And he chose the number 24, so we took 24 rice plants, and he was able 00:41:11.14 to barcode the PCR reaction so we could tell which plant 00:41:16.00 the PCR products came from. 00:41:18.19 And so, what he did, so what you see on the left in the gel, 00:41:26.09 is, you'll see for example some of the PCR patterns, products, 00:41:30.26 from each of, from a subset of the 24 plants. And you can see that in blue I am showing you 00:41:36.09 the bands that are shared by all of the plants. 00:41:39.19 In pink I am showing you the bands that are only present in one plant. 00:41:45.29 The ability to look independently at shared and unshared insertions is very powerful. 00:41:53.23 Because what we want to know is not just what are the insertions that are present in all 24 plants, 00:42:02.03 We want to know... those we will call "old" insertions. 00:42:06.08 We want to know what are the new insertions. What are the ones that are just happening. 00:42:10.17 And the reason is, it is possible that the old insertions have been filtered by selection. 00:42:15.14 So that over generations insertions that are in places that are in exons or whatever 00:42:23.11 have been removed because they have been detrimental. 00:42:27.02 So by looking at shared an unshared we are able to get the whole spectrum of new and old insertions. 00:42:33.29 Old is also... old is not really old. These are insertions that happened, this initial burst really 00:42:40.12 happened maybe over the last 50 to 75 years. 00:42:43.01 But these shared insertions, these unshared insertions, happened in our greenhouse. 00:42:49.25 So essentially what he was able to sequence, so we know 00:42:54.21 that if all of the bar-coded plants or some of them show the same insertion, 00:43:00.13 we call that shared. If only one of the plants shows an insertion, we call that unshared. 00:43:04.18 So he was able to determine 928 shared insertion sites, 00:43:09.24 and 736 unshared insertion sites. 00:43:14.06 Ok, so, as I said before, these unshared insertion sites are de novo insertions. They just happened. 00:43:19.24 And they are heterozygous. Heterozygous because when insertion happens it goes into one of the two alleles. 00:43:25.16 And heterozygous is important because if it is a detrimental insertion, 00:43:31.09 if it's a detrimental mutation, it is likely to be a recessive mutation. 00:43:36.29 So. This is a... I don't expect you to see this. It is just really to impress you. 00:43:43.23 What you are seeing is each of these graphs is a different rice chromosome 00:43:48.23 and the blue... I am going to blow it up in a second. 00:43:51.20 The shared insertions are shown in blue and the unshared insertions are shown in red. 00:43:58.05 And it essentially shows that the mPing insertion can insert throughout the 12 chromosomes 00:44:02.16 of rice. It is on every single chromosome. 00:44:04.26 And this shows a single chromosome, chromosome 4, 00:44:09.04 and we are looking at the insertions on chromosome 4. 00:44:11.21 And chromosome 4 has a very large region of heterochromatin. 00:44:16.26 And most of the transposable elements, the DNA transposons, don't insert into heterochromatin. 00:44:22.18 They insert near genes which are euchromatin. 00:44:25.02 But even, but we do have a few insertions that are in or near heterochromatin. 00:44:30.09 So what we know, what we have learned from this analysis, 00:44:34.10 first of all the distribution of shared and unshared insertions is the same. 00:44:38.10 It is exactly the same. There is no difference in the insertion preference of either class. 00:44:43.20 That, remember I said before, that MITEs prefer to insert into single copy regions, intergenic regions, 00:44:50.08 and sure enough, 91% of the insertions that we found were in single copy sequences. 00:44:57.09 The genome average of single copy sequences is about 54%. 00:45:02.06 So this is saying that MITEs do prefer to insert into single copy sequences. 00:45:07.08 And that we found that even when it does insert into heterochromatin, 00:45:10.28 it's actually inserting near genes that are in the heterochromatin. 00:45:14.05 So now we have looked at a gross scale throughout the chromosomes. 00:45:20.16 Let's look more closely and see where is mPing inserting in and around genes. 00:45:25.24 So here is a summary, with a very surprising result. 00:45:31.25 So what we see, we are looking at insertions that are in the 5' untranslated region, 00:45:38.16 in the exon sequences, in intron sequences, and in the 3' UTR. 00:45:42.19 And we are looking at again at a summary of a very large number of genes, 00:45:48.03 and what you are seeing is the percentage of insertions. 00:45:51.04 And so what we find is that the grey here is the expected number, 00:45:58.17 is the expected number of insertions given the composition of the genome. 00:46:03.23 The pink are the unshared insertions and the blue are the shared insertions. 00:46:06.23 And the only thing that really stands out on this histogram is this. 00:46:12.13 We find that there are far fewer insertions into exon sequences than we would expect by chance. 00:46:19.11 Almost ten fold fewer insertions. 00:46:22.09 Both the shared and the unshared. 00:46:25.14 And the unshared is significant again because it tells us that mPing prefers not to insert into exon sequences. 00:46:35.04 How it does this, how it knows exon from intron we can only speculate at this point, 00:46:41.15 and I'll speculate a little later. 00:46:42.11 So mPing has another insertion preference. 00:46:45.23 And so here is genic regions, so what you are seeing at the top is around the gene. 00:46:51.18 And what we are seeing on the X axis is the percentage of insertions. 00:46:56.05 And the blue or purple bar is the actual mPing insertions and the grey dotted line is our control. 00:47:04.25 So if we just sampled genome sequences what we would expect to see in those regions. 00:47:09.23 And what we are seeing where the dotted line is 00:47:12.08 is the transcriptional start site, and we go upstream from there to -1, -3, and that's in kb. 00:47:20.17 And what you see is that there is a spike. 00:47:22.25 There is a preference for mPing inserting within 1 kb of the transcription start site. 00:47:29.03 And when we take this together, and you say how could this possibly happen? 00:47:32.03 We don't know. And that is obviously an area of intensive interest. 00:47:36.15 One of the ideas is that in plant genomes, and in other genomes, 00:47:41.06 it is known that the region just upstream of the transcription start site 00:47:46.08 has fewer nucleosomes, as do exons. Exons have fewer nucleosomes, I'm sorry, exons have more nucleosomes. 00:47:55.22 And so, introns have fewer nucleosomes compared to exons, 00:48:01.04 so it seems that the mPing is avoiding insertion into dense chromatin regions, 00:48:10.05 or relatively dense chromatin regions. 00:48:11.02 But this is pure speculation at this point. 00:48:12.20 It is the only thing that is consistent with this pattern of insertion, 00:48:17.01 that is avoiding exons and preference for insertion near the transcription start site. 00:48:22.06 So let me summarize at this point, and that is we find that 91% or so 00:48:30.17 of the mPing insertions are in single copy regions near genes. 00:48:36.10 That exons insertions are 10 fold under-represented. So the element is avoiding insertion into exons. 00:48:43.09 And that insertions within 1 kb of the transcription start site are also enriched. 00:48:49.05 Ok, and finally that the distribution of shared and unshared insertions are indistinguishable 00:48:54.15 meaning that this is the insertion preference of the mPing element. 00:48:58.05 I said we don't understand it, but this is the preference. 00:49:01.04 So at the beginning of this section I posed two questions that we want to answer with these experiments. 00:49:08.23 The first one is how do transposable elements amplify without killing their host? 00:49:12.08 And the answer to that question for mPing is the following. 00:49:17.21 And that is the rapid amplification of a successful element, 00:49:20.13 and mPing is a successful element, has really a more modest impact on the host than previously thought. 00:49:28.27 So this is actually, when I first saw this data I was kind of disappointed 00:49:31.19 because I was like, "boy, it is just not doing a whole lot." 00:49:34.21 And then it kind of dawned on me that that is what successful elements have to do. 00:49:39.00 I mean in order to be successful, and success again 00:49:42.02 is defined as being able to attain very, very high copy number, 00:49:45.25 it has to do little harm. 00:49:48.21 So the second question that is more difficult to address is does the amplification actually benefit the host? 00:49:57.19 And we have some data that suggests that it does, and it is experiments that we are pursuing now, 00:50:03.07 and I am going to tell you what those results are. 00:50:05.03 And really what we want to do is we want to look at the impact of mPing insertions on host transcription. 00:50:11.03 So I showed you at the beginning of this talk the situation that we wanted in order to... 00:50:23.23 the experimental material we needed in order to address the question 00:50:29.25 of what is the impact of insertion on diversifying gene expression for example. 00:50:36.19 And what we needed were alleles that differ by the presence or absence of a transposable element. 00:50:44.00 Now we have lots of those examples. 00:50:46.09 So as I said, we wanted alleles that only differ by the presence or absence of the MITE. 00:50:51.12 And this summarizes really what we have now. 00:50:54.24 We have 710 genes, and EG4, I didn't show this before, 00:51:00.05 EG4 is one of the land rices. It is a strain that we determine the mPing insertion sites. 00:51:06.09 So by comparing the mPing insertions in EG4, with the same genes in Nipponbare, 00:51:13.04 we now have essentially 710 genes that have alleles that differ largely by the presence of this mPing element. 00:51:21.17 Of those 710 genes, almost 400 have insertions within the promoter region, 00:51:28.12 the 5' untranslated... I'm sorry, the promoter region. 00:51:31.17 120 or so have insertions within the gene. 00:51:37.23 And 193 have insertions downstream of the gene. 00:51:40.11 So the question we are asking is what is the impact of insertion on transcription. 00:51:47.14 To do that, Ken Naito when he was a postdoc in the lab 00:51:50.26 compared the transcription of the 710 alleles in EG4 versus Nipponbare 00:51:57.02 initially under normal growth conditions in the greenhouse. 00:52:00.06 So to do this he used a microarray of rice genes. He did microarray analysis of 31,000+ rice genes. 00:52:11.05 He isolated RNA from Nipponbare and EG4 seedlings. 00:52:16.03 And essentially he determined that for a significant percentage of these alleles, 00:52:26.15 there was no difference in gene transcription. 00:52:29.10 So for 78% the transcript levels for these genes 00:52:35.04 were the same for Nipponbare as they were largely the same in EG4. 00:52:38.25 So this is a pretty benign effect on host transcription. 00:52:43.00 So what you see in this slide is a comparison of the expression of the remaining alleles, 00:52:51.22 that is those where we did see a difference 00:52:53.02 between the transcription in EG4 and Nipponbare. 00:52:56.16 And for three quarters, approximately three quarters of those alleles, 00:53:00.23 three quarters of those alleles, we saw upregulation in EG4. 00:53:05.24 That is the presence of the mPing element was correlated with increased transcription of the gene. 00:53:13.11 And most of that difference, or what you see here is, 00:53:17.27 most of that difference were insertions that were in the 5' upstream regions. 00:53:22.09 So this is upstream of the transcription start site. 00:53:25.26 We do see also differences in insertion in introns. 00:53:29.04 That many of the intron insertions, many meaning of the remaining 25% that show an effect, 00:53:35.27 most of those in fact were upregulated. 00:53:39.26 There weren't many that were downregulated except the few that were in exons, and that is understandable. 00:53:44.00 So what I want to do know is to show you how we confirm this microarray analysis. 00:53:50.21 So what you'll see... I'll just take one of these over here. 00:53:56.05 To the left. What you see is a particular allele, and this is OS... some long number. 00:54:02.01 The insertion of mPing is at -2497. So it is 2.5 kb upstream from the transcription start site. 00:54:09.25 And yellow here is transcription in EG4. Gray is transcription in Nipponbare. 00:54:16.02 So what we are seeing... what we are doing here is we are isolating RNA and instead of using the microarrays, 00:54:21.23 we are doing PCR, quantitative PCR. 00:54:24.02 And what we find in every case we check, the EG4 allele for the particular alleles we are looking at, 00:54:30.19 the ones that showed upregulation by microarrays, 00:54:33.14 we're able to confirm that result using quantitative PCR. 00:54:38.11 Yes, indeed there is more transcription from the EG4 alleles. 00:54:43.03 Now we have a problem. 00:54:46.00 And this, I'll try to make this as simple as possible. 00:54:48.10 There's another difference between Nipponbare alleles and EG4 alleles. 00:54:55.23 besides the presence or absence of mPing. 00:54:59.01 And that is that the allele in Nipponbare, which doesn't have the transposon in it 00:55:05.12 is in a genome that only has 50 mPings. 00:55:09.12 Whereas the EG4 alleles, all of them, are in genomes that have, 00:55:13.29 I show here a thousand, but 500-1000 mPing elements. 00:55:18.19 So it is possible that the differences that we are seeing in transcription between EG4 and Nipponbare 00:55:26.13 is due to that load of 1000 elements in the background. 00:55:30.03 What we need is a control. 00:55:31.15 We need a control where we can compare the alleles with and without mPing 00:55:36.25 in the same type of genomic background. 00:55:39.23 And again, I wouldn't be telling you that we need this control if we didn't have one. 00:55:44.00 And so I mentioned at the beginning that EG4 is one of a couple of land rices 00:55:49.16 that have... in which the mPing element has burst, where we have many copies of mPing. 00:55:55.13 And recall I showed this transposon display, and so what I am showing here is EG4 is one of these land rices. 00:56:03.22 Another is A123, and another is A157. 00:56:06.27 So, and the other thing you'll notice is that the mPing insertions in those strains, 00:56:11.12 just by looking at the patterns you can see the patterns on the transposon display differ. 00:56:15.18 This means that the insertions are different. They are in different places in the genome. 00:56:20.02 So this allows us to have valid controls. 00:56:24.25 So here we see alleles that differ between Nipponbare and EG4. 00:56:30.01 The control is we can identify in for example, A123, we can identify A123 that has the Nipponbare allele in it. 00:56:40.10 Okay. So in that way we are able to compare A123 and Nipponbare to EG4 00:56:46.27 and in that way we sort of eliminate the complication of 1000 mPing insertions in the background. 00:56:53.29 And I'll show you that data now. 00:56:56.00 So here we have, what I am showing is a histogram, this is a particular gene Os... so on... It's a rice gene. 00:57:02.26 An annotated rice gene. 00:57:05.09 The -600 means that it has insertion of mPing 600 basepairs upstream of the transcription start site. 00:57:11.12 And that is shown in this schematic below here. 00:57:15.04 So what you see is that in Nipponbare, which is the gray, 00:57:20.07 we see a level of transcription which is set arbitrarily as 1. 00:57:24.17 In EG4 we see about 5-fold more transcription. Transcripts. 00:57:28.24 Now we also have to compare, we have the blue. 00:57:35.01 The blue is the A123, which is another land rice where there is no mPing 00:57:40.16 in that position. So A123 has the Nipponbare allele 00:57:44.24 and despite having that allele and that background. No, not that ... Despite. 00:57:50.23 Even with the 1000 copies of mPing in the background, 00:57:53.15 we still see reduced expression of the gene. So the alleles, the Nipponbare allele, 00:57:58.29 and A123 allele, we are getting about the same level of transcription. 00:58:04.02 A154 has the same insertion as EG4. 00:58:08.19 And as you see, we get increased transcription. 00:58:12.00 So this clearly tells us that the difference in transcription is due to the mPing, somehow. 00:58:16.20 We don't know how. It is due to the insertion of the mPing at -600 from the transcription start site of this gene. 00:58:24.14 Here's another experiment. I am not going to show you all of them. Don't worry. 00:58:27.06 Here's another gene it is Os01g0.... whatever. 00:58:31.02 This insertion is at -2.5 kb from the transcription start site. 00:58:35.17 Again, so what we have here, Nipponbare, the negative next to Nipponbare means 00:58:40.13 that there is no mPing in that gene. The + means that there is. 00:58:45.06 So here EG4 and A123 have that allele with mPing. 00:58:50.06 A157 doesn't. Again, the expression has to do with the presence of mPing, 00:58:55.00 in this case 2.5 kb upstream from the transcription start site. 00:58:58.00 So, just to summarize this part-the impact of mPing insertion on nearby gene transcription. 00:59:08.03 In the vast majority of alleles we see no impact. 00:59:11.17 No effect. This would be a neutral mutation. 00:59:15.11 Of the 710 alleles we are comparing, 111 we see upregulation of the nearby gene. 00:59:23.12 And for 45 we see down regulation of the nearby gene. 00:59:27.11 Now the question that we are going to ask is does the presence 00:59:34.12 of mPing affect transcription in a different way? 00:59:38.06 Does it in this case confer stress inducibility on nearby genes? 00:59:41.02 Now remember from McClintock's scenario she mentioned the possibility 00:59:45.19 that transposable elements are induced by stress. 00:59:50.00 So here we are going to look at something a little different. 00:59:51.09 We are going to ask does the presence of a transposable element 00:59:53.17 cause the nearby gene to be stress inducible? 00:59:56.20 So this experiment... I'll lead you through this here.... 01:00:00.27 is we are looking at three different stresses: cold, high salt and desiccation, 01:00:06.24 dryness. So this is a gene which has an mPing element at -55. 01:00:16.07 So 55 basepairs from the transcription start of this gene. 01:00:19.13 And this is a gene, you'll see the control under normal conditions, 01:00:23.18 it's one of those vast majority, 78%, that show no effect under normal growth conditions. 01:00:29.20 That is what you see in the control there. 01:00:31.17 However, when we subject these plants to cold, and we meaning Ken Naito again. 01:00:36.29 Cold or salt, we see that the strain EG4, which has mPing at -55, we see increased transcription. 01:00:45.27 Not much, but we see reproducible increased transcription, 01:00:49.25 whereas the other high copy strains that don't have this allele with mPing do not respond. 01:00:57.26 So here's another example. This is an mPing element in gene Os02... 01:01:04.23 it has an insertion at -41, 41 basepairs upstream of the transcription start. 01:01:10.01 What we see is that the alleles... here EG4, in blue, and A123, in yellow, have the mPing containing allele. 01:01:20.27 And you see those are the ones that are induced by cold and salt. 01:01:24.18 What's nice here is we are not seeing any effect of desiccation. 01:01:29.06 We see a consistent effect of cold and salt. 01:01:32.04 We don't know the mechanism for this. It is under investigation. So then the question is, 01:01:40.02 how... so one of the things you might wonder is how is the transposon effecting transcription? 01:01:47.10 Is it acting as an enhancer? 01:01:48.23 Or is it acting as a new promoter? A site of transcription initiation? 01:01:54.23 So we have several intron insertions and we can do the same experiment. 01:01:58.25 Here we have Os... another gene... which has an mPing element only in EG4 in an intron. 01:02:05.19 And when we do the same experiment, under room temperature, RT is room temperature, 01:02:09.23 normal conditions, there is no difference in the transcription of the allele with and without mPing. 01:02:15.08 However when we look at the situation in the cold, 01:02:18.20 we see that it is cold-inducible. 01:02:20.21 So here the transposon in a distant intron is effecting the inducibility of this gene. 01:02:27.06 And we see that in the next slide also. This is another gene, a very large gene, with an mPing element in an intron. 01:02:35.22 And these introns are in the EG4 allele. I am sorry, these mPing insertions are in the EG4 allele and in the A123. 01:02:45.12 And again, we see that those two are inducible, suggesting that mPing sequences 01:02:49.18 are in some way acting to enhance transcription under cold conditions. 01:02:54.21 We didn't do this experiment under... we didn't test dry and salt. 01:03:02.00 Okay, so let me give you conclusions from this part of the talk. 01:03:07.29 The first thing is that we found surprisingly, or maybe we were surprised, but that is why you do experiments, 01:03:16.10 to get surprised, and then you see the results and you say, "Oh that makes sense". 01:03:19.15 That massive amplification is largely benign. 01:03:23.09 And when I say up to a point, we've caught this element in the act of amplifying. 01:03:28.24 Obviously at some point if the number of elements transposing gets too great 01:03:33.25 it is going to start causing some damage, 01:03:36.04 and that is one of the things we are really, really interested in. 01:03:38.15 When does this activation stop? What happens? 01:03:42.15 And we don't know yet. 01:03:43.18 That the amplification has a subtle impact on the expression of many genes. 01:03:49.21 It causes stress induction. It induces the expression of some of genes, 01:03:55.10 but it really is tweaking them. Most of the expression we see is maybe a two-fold, three-fold increase. 01:04:00.16 And again, it produces stress inducible networks. And I say cold and salt. 01:04:06.20 Others, I'll give you a few tastes of where this experiment, where this is going. 01:04:12.16 And the other thing that is significant is that it generates dominant alleles. 01:04:17.04 So if you think about a population. Remember I said that when these elements insert they are heterozygous. 01:04:22.02 That... if it caused a phenotypic change, that overexpression will be a phenotypic change 01:04:28.25 that can be seen possibly in a heterozygous organism. 01:04:33.05 So we don't have to wait for this to become homozygous. 01:04:38.12 So I want to go back to McClintock's scenario. 01:04:41.19 Again, and that transposable elements... her scenario for how transposable elements can function as tools 01:04:49.16 to generate diversity. 01:04:51.03 Transposable elements usually don't move around, and we know that now. 01:04:53.27 We know that the vast majority of transposons in the genome are inactive, 01:04:57.12 even though genomes are 50-80%, 20-50-80% derived from transposable elements, 01:05:06.11 that most are inactive, that they are inactive because they have accumulated mutations. 01:05:15.00 Or the few that are active are being epigenetically restrained by the host. 01:05:20.07 That it is possible that somehow stress conditions may activate transposons. 01:05:27.10 Now I haven't shown you that. We started with the strain mPing, the EG4 strain, where... 01:05:33.13 the system was active already. We don't know how it became active. 01:05:38.15 That obviously is something that we are very interested in. 01:05:41.09 And we think that it is possible because EG4 and mPing are present in most rice strains, 01:05:45.28 that in most rice strains these elements are epigenetically silenced. 01:05:49.19 But that somehow in these few strains, these land rices and EG4, that the element became activated. 01:05:57.15 We do not know how. That obviously is an area of future research, and that is a critical area 01:06:02.08 because that's how we think most genomes are sort of poised. Many genomes. 01:06:06.02 They have the ability for active transposable elements to start amplifying. 01:06:12.16 But how that switches.... what is the switch and how is it thrown is the subject of future research. 01:06:19.15 Again the movement of transposable elements generates genetic diversity increases the mutation frequency, 01:06:28.27 McClintock looked at mutagens. She looked at elements that... geneticists look at mutants. 01:06:35.09 These are, as I said at the beginning, these are not insertions that will benefit the organism. 01:06:42.07 However, we have been able to identify an element where most of the insertions are benign. 01:06:47.25 And, as we said, a rare TE induced mutation may be adaptive. 01:06:55.01 So I want to sort of speculate a little bit. 01:06:58.07 And tell you about how we sort of fit mPing into this model 01:07:02.12 that somehow a stress could have induced Ping. Ping is the autonomous element. 01:07:06.18 Again, this is a black box. We do not know. We weren't there when this happened. 01:07:10.06 We came upon the strain, or our Japanese collaborators came upon the strain 01:07:13.18 when it was already active. 01:07:15.19 This leads to the massive and rapid amplification of mPing 01:07:20.17 that we're seeing. It is still in progress. 01:07:22.17 This generates tens of thousands of new alleles. 01:07:26.27 Now we looked at 24 plants, but imagine a field of a thousand plants. 01:07:33.14 mPing accumulating 25-40 new insertions per plant. 01:07:37.17 What is really interesting, a point I haven't made, is that rice is a selfer. 01:07:43.05 So it selfs. There is no new genetic information coming into populations. 01:07:49.21 The same genetic information is being scrambled up by recombination or whatever. 01:07:55.16 mPing is a way, transposons, are a way to dramatically diversify 01:08:00.28 the genetic material without introducing... without having gene flow into the population. 01:08:09.28 So this, what we are hypothesizing, we see it at the transcription level that this amplification 01:08:17.05 creates transcriptional changes and we are hypothesizing that these changes can lead to quantitative variation. 01:08:26.04 So changes in cold tolerance, changes in drought tolerance. Changes in desiccation. 01:08:32.04 But that is the point we are now testing. 01:08:35.04 And I am going to end by telling you about, very briefly, 01:08:40.00 about the experiments that we are currently doing to address the question. 01:08:44.20 Really the smoking gun question. 01:08:47.26 And that is what is the phenotypic consequences of the mPing burst on EG4? 01:08:56.14 Does this... we've talked about transcriptional changes, 01:09:00.05 but are there phenotypic changes that go along with those? 01:09:04.13 And the way we are doing that is a number of ways. 01:09:07.21 The first thing, and again we are taking advantage of the wonderful 01:09:10.27 new high throughput sequencing technologies. 01:09:14.01 So one of the things that has allowed this progress... this project to move forward, 01:09:21.08 and I think for most of us in molecular biology, 01:09:24.23 is the technology that really drives the questions that we ask. 01:09:29.08 And that we can get deeper and deeper into a particular problem as the technology changes. 01:09:35.08 And the availability of high throughput sequencing 01:09:36.29 has allowed us to address questions that we didn't even dream about, 01:09:42.24 you know as recently as 5 or 6 years ago. 01:09:45.28 In this case what we can do is, as shown here, we can... Well, first of all, 01:09:52.18 we know that Nipponbare and EG4 differ phenotypically in several characteristics. 01:09:59.18 They have different flowering time. They have different average height. 01:10:04.05 They differ in some of the stress responses. 01:10:06.10 We want to know for example, if any of those difference are due to one or more mPing insertion. 01:10:12.27 In order to figure this out the first thing we have to do is to figure out the... 01:10:16.01 we have to know is there more going on in EG4 and the land rices than just mPing amplifying. 01:10:22.01 Because remember when we sequenced the insertion sites we did an approach 01:10:28.08 where we used PCR primers and only amplified the element and flanking sequences. 01:10:34.10 Well again, we were limited before by the technology. 01:10:37.03 Now we can actually sequence the entire genome, and in fact we've done that. 01:10:40.04 So EG4 is currently being re-sequenced using next generation technology 01:10:46.09 so that we can see is the mPing amplification the only thing, the only transposon that is amplifying in the genome. 01:10:52.19 And so far the preliminary answer to that is yes. 01:10:58.16 It appears that mPing is the only transposon that is amplifying at this time. 01:11:03.05 The other thing is we are doing transcriptomics. 01:11:05.07 Rather than looking at particular individual genes, we are looking in a strand specific way 01:11:10.18 at the entire genome of mPing, of, I'm sorry, of EG4 and Nipponbare. 01:11:15.11 And this is done in collaboration with Tom Brutnell's group at Cornell 01:11:19.19 where they have developed a really nice protocol to look at single strand, do single strand RNAseq. 01:11:25.14 So now how do you... the way that is traditionally used 01:11:32.10 to find the regions of the genome that are responsible for quantitative traits 01:11:38.10 is mapping population or a recombinant, inbred population. 01:11:41.20 And our collaborators at Kyoto University in Tanisaka Okumoto's lab have over the last decade 01:11:50.12 been developing this incredibly valuable resource. 01:11:53.13 So what they did over ten years ago was to cross EG4 with Nipponbare. 01:11:59.15 Now what I want to point out is these are inbred lines. So all of their... 01:12:02.15 you know... they have two copies of exactly the same gene at every single locus. 01:12:07.29 So we have EG4 crossed with Nipponbare. We have our F1 progeny. 01:12:14.26 Many, many F1's. Those F1's are then selfed. 01:12:19.28 Selfcrossed for ten generations. 01:12:22.16 So we now have growing in this country 275 recombinant, inbred lines. 01:12:30.06 These lines have mosaic chromosomes that are derived from EG4 or Nipponbare. 01:12:36.25 And they are displaying different traits. So we are phenotyping them now, and I'll talk about that in a second. 01:12:43.19 So the question really and I think on the next slide I go into more here. 01:12:48.21 So we are looking at these recombinant, inbred lines. We are assaying... RILs for recombinant, inbred lines. 01:12:53.01 for morphological traits and stress responses. 01:12:55.25 And we are doing something that again I would never have thought would be possible 01:13:01.16 even in the grant application I wrote a couple of years ago. 01:13:03.22 I didn't even write that we world do this because it wasn't affordable, 01:13:07.15 but now again technology, the costs have come down. 01:13:09.18 We are actually re-sequencing all 275 RILs 01:13:13.06 to find out exactly the mosaic structure of their chromosomes. What part came from Nipponbare, 01:13:18.04 with its mPings? What parts came from EG4 with its mPings? 01:13:21.21 So that we can ultimately correlate the mPing insertions with the various phenotypes. 01:13:28.07 Now again, this is just correlative at this point. We then will have to prove that the candidates 01:13:32.25 that we find, the mPing insertions and alleles are the ones responsible for that phenotypic difference. 01:13:39.03 So many of us and many of us in the field think of Barbara McClintock 01:13:43.14 as the first genomicist. The first person who thought of the genome as an entity 01:13:49.20 not just of single genes. And there is a quote from her Nobel lecture which I want to end with. 01:13:55.24 And it is her thinking about the genome, and it really presents the challenge 01:14:00.04 that I have at least felt and have gone with with my lab. 01:14:05.07 And that is, "In the future, attention undoubtedly will be centered on the genome, 01:14:08.02 with greater appreciation of its significance as a highly sensitive organ 01:14:12.04 of the cell that monitors genome activities and corrects common errors, 01:14:16.13 senses unusual and unexpected events, and responds to them, 01:14:21.07 often by restructuring the genome." 01:14:23.11 She ends by saying, "We know about the components of genomes that could be 01:14:27.23 made available for such restructuring." 01:14:29.22 In part the transposable elements that she discovered. 01:14:32.10 "We know nothing, however, about how the cell senses danger 01:14:35.12 and instigates responses to it that are truly remarkable." 01:14:38.19 I'd like to say that we are beginning to understand that black box 01:14:42.28 of the connection between the outside world and the genomic changes. 01:14:46.26 And I think transposable elements are certainly part of that.