Bacteriophages: Genes and Genomes
Transcript of Part 2: Bacteriophages: Genomic insights.
00:00:01.02 Hi. My name is Graham Hatfull. I am a professor at the University of Pittsburgh 00:00:04.15 and a Howard Hughes Medical Institute professor. 00:00:07.03 In part two we are going to talk about some of the insights that we can gain from comparing 00:00:14.02 the genomes of bacteriophages and perhaps learn something about how they are constructed and how they have evolved. 00:00:22.03 In part one we saw some morphologies of bacteriophages, what they look like in the electron microscope, 00:00:32.29 and I showed you some different types of structures 00:00:37.23 that could arguably reflect different ways in which those bacteriophages have evolved. 00:00:44.27 But we have to be very careful about interpreting differences in virion morphology, what the viruses look like, 00:00:53.09 and their evolutionary relationships and how the genomes compare to each other. 00:00:58.09 I've illustrated that in this particular slide 00:01:01.00 where I have shown five examples of bacteriophages. 00:01:05.06 These would all be classified according to their long flexible tails 00:01:10.14 as being members of the Siphoviridae, the Sipho viruses, 00:01:15.03 each with their heads and their tails attached. 00:01:18.18 It might be tempting to look at these and say well they all look very similar to each other, 00:01:25.06 almost indistinguishable, perhaps they all are genetically similar. 00:01:30.11 In fact this is an example where these five share essentially little or no sequence similarity at the genomic level whatsoever. 00:01:40.01 So if we want to understand how genomes have evolved 00:01:44.00 and how they are related to each other from a phylogenetic perspective, 00:01:48.23 we need to go in, isolate the DNA, and sequence those genomes and then compare them. 00:01:54.24 There are various ways in which we can compare the genomic sequences. 00:02:00.22 We can compare them by looking at the similarities of the nucleotide sequences, 00:02:07.26 essentially sequencing the DNA or if it's RNA, the RNA, 00:02:13.05 and then comparing them one to another and seeing what is shared. 00:02:16.08 A second way of doing that would be to look at the genes 00:02:21.09 and comparing them through their predicted amino acid sequence similarities of the proteins that are encoded by those genes. 00:02:28.22 Right here I am showing you an example of what it looks like if we take two bacteriophages 00:02:34.09 and compare their nucleotide sequences. 00:02:37.20 And this particular representation is referred to as a dot plot. 00:02:42.05 And what we have done is to take two bacteriophage genomes, in this case Fruitloop and Boomer, 00:02:49.10 and we have aligned the two sequences, and we are going to slide one next to the other computationally, 00:02:58.03 and ask if there are segments that are similar to each other within a particular window of comparison. 00:03:06.07 And every time we see sequence similarity, a dot is presented on this dot plot. 00:03:12.11 And what you can see here is that 00:03:15.12 there's a rather complex series of relationships reflecting a quasi diagonal line 00:03:25.08 from the top left to the bottom right of this representation. 00:03:29.21 So where you can see a relatively solid line that means that there is a segment of DNA 00:03:35.24 which is substantially similar between the two. 00:03:38.23 Where you fail to see a line, such as in the top left hand corner, is a region where the two genomes 00:03:47.14 appear to be substantially dissimilar. 00:03:50.10 They don't have shared nucleotide sequences. 00:03:52.15 And then there's all sorts of complicated interruptions and shifts in the diagonal line as you look between these. 00:04:02.13 And this tells us an important aspect, a component, of what we see when we compare these types of genomes. 00:04:11.09 And that is, they are not simply completely similar from end to end or completely dissimilar from end to end. 00:04:18.20 But quite commonly we see these interrupted portions 00:04:22.22 where different segments of the genomes are related to each other in different ways 00:04:27.20 as though different parts of the genome have different evolutionary histories, 00:04:33.13 different ways of arriving in the genomes as we see them in Fruitloop and Boomer today. 00:04:40.14 So from this type of analysis and looking at a number of bacteriophage genomes, 00:04:47.08 we can see the following general conclusions. 00:04:51.15 First of all, the DNA that is isolated from these particular virions, these double stranded DNA virus types, 00:05:01.12 that the genomes are linear. So they have a left end and a right end. 00:05:08.23 They tend to form predominantly two types of groups that we can see when we look at the linear genomes. 00:05:17.06 There are those that have defined ends. 00:05:20.03 That means that if you isolate the molecules from a million particles 00:05:25.10 of a particular phage type, each of the million DNA molecules that you get out 00:05:30.08 have the same left and right ends. In other cases that is not true. 00:05:35.26 The DNAs have the same overall genetic constitution, 00:05:41.14 but the specific physical ends of the left and the right can be positioned in different places. 00:05:47.27 And therefore they are referred to as being circularly permuted. 00:05:52.18 They are not circular. They are linear, 00:05:54.21 but they represent different positions of the ends relative to the genetic information. 00:06:01.28 Often these viruses also contain terminal redundancies, 00:06:06.22 which means that one segment of the genome is duplicated at both ends. 00:06:11.19 And so these two major types of genomes that you see either have defined ends 00:06:17.27 or terminally redundant and circularly permuted ends 00:06:20.24 and there are other viruses that have different variations on these themes. 00:06:25.22 The sizes of bacteriophage genomes varies enormously. 00:06:31.05 There are those that are as small as perhaps 5000 bases, and there are those that are as large as 500 kilobases, 00:06:39.02 which is quite amazing when you think that 500 kilobases 00:06:44.22 is about the same size of the smaller of the free living bacterial genomes. 00:06:50.14 And so there are examples of viruses that are the same size genomically 00:06:55.00 and have the same or more genes as small bacterial genomes. 00:07:00.15 The phage genomes tend to be densely packed with genes. 00:07:05.19 And so most of the DNA is encoding genes. 00:07:10.16 And as I mentioned before in this section, the phages infecting bacteria from different genera 00:07:17.05 tend to be unrelated at the DNA level. 00:07:21.13 So this slide shows an example of what we see when we take a DNA sequence of a particular phage, 00:07:34.00 in this case it is a phage called Giles, 00:07:36.26 And we use computational approaches and a bioinformatic strategy to identify 00:07:44.07 the protein coding genes that are present within the virus. 00:07:49.02 And so the genome is largely filled with protein coding genes, 00:07:54.22 and they are shown here by these boxes, either colored or in white. 00:08:00.21 The genome is represented by what looks like this railroad track here 00:08:05.24 which has markers every kilobase and every 100 bases. 00:08:10.21 And the genome for Giles is linear with defined ends, 00:08:14.27 and so in this representation it begins in top left hand corner and goes to the bottom right hand corner, 00:08:22.14 and each of the genes are shown in these boxes represented either above or below the DNA. 00:08:30.00 Genes that are shown above the DNA are transcribed in the rightwards direction, 00:08:36.03 coming this way, and those that are shown below such as a couple or three genes in the top left hand corner 00:08:44.23 are transcribed in the leftwards direction. 00:08:47.19 So those are the standards that we use for presenting the genes 00:08:52.25 and illustrating the direction that they are transcribed 00:08:56.17 relative to the overall genome structure. 00:08:59.19 You can see here from these genes that they are densely packed into this particular genome. 00:09:06.10 There's few non-coding spaces between the genes. 00:09:10.19 They essentially represent 95% or more of the genetic information that's available. 00:09:19.12 In this particular representation we have colored the genes in such a way 00:09:26.12 as to reflect the relationships that some of these genes share with genes that you find in other bacteriophages. 00:09:34.27 The genes that are shown in white, and you can see some across the top here, 00:09:40.02 are simply genes for which we don't have any other close relatives in any of the databases. 00:09:46.03 And this illustrates the point that phages such as this can be replete with genes 00:09:52.11 that are not closely related to known genes 00:09:55.24 and for which we have rather little idea as to what they do. 00:09:59.04 I mentioned that when we compare the nucleotide sequences of phages we can see that it looks as though the parts 00:10:12.01 have evolved differently to each other. 00:10:15.00 And this leads to the idea that phage genomes are characteristically mosaic. 00:10:21.09 They are constructed architecturally from segments which have been put together in a particular way. 00:10:29.01 Modules if you like. And that each of these modules is in effect mobile 00:10:35.08 or can move around the population of bacteriophages 00:10:38.28 such that you can find it in more than one or perhaps several different genomic contexts. 00:10:45.27 And this slide illustrates how this might look when you see mosaicism 00:10:52.15 at the level of nucleotide sequence comparisons. 00:10:56.00 So this is showing a small segment of three phage genomes. 00:11:01.19 The one at the top, PG1. Rosebush in the middle, and Qyrzula towards the bottom. 00:11:08.17 You can see the genome represented by the markers in the railroad tracks for each of these. 00:11:14.19 The genes that are encoded are shown by the color boxes with their gene names inside the boxes, 00:11:19.11 and where these genomes contain and share nucleotide sequence similarity 00:11:25.04 there is a color coded area shading between the two such as you can see here. 00:11:32.01 Now Rosebush and Qyrzula have very evident and strong nucleotide sequence similarity 00:11:39.22 both in the left part here and over here in the right part as well. 00:11:44.09 PG1 and Rosebush and Qyrzula have no sequence similarity that is evident by this comparison, 00:11:53.22 in this example, because there is no color shading over on this left part. 00:11:58.29 Nonetheless, in this middle segment things are different. 00:12:04.23 There appears to be very little sequence similarity between Rosebush and Qyrzula 00:12:10.16 because there is no shading in that area, 00:12:15.22 however, when we compare PG1 and Rosebush we can see that in this central segment right here 00:12:22.22 that there is indeed a purple color shading that reflects strong sequence similarity 00:12:29.17 between these two genomes, PG1 and Rosebush, in this center portion. 00:12:35.04 So this is really important because it illustrates an example where the different segments of these genomes 00:12:42.09 particularly Rosebush appear to have had different evolutionary histories. 00:12:47.09 They've come from different places. 00:12:49.14 This segment that's in the middle of Rosebush clearly did not come from the same place as Qyrzula 00:12:55.02 It appears to have come from a common ancestor which had more in common in this region with PG1. 00:13:02.15 So this is a good example of mosaicism, a key architectural feature of bacteriophage genomes. 00:13:08.26 When you look at the nucleotide sequence level you can see precisely where these types of events occur- 00:13:18.27 at the boundaries that must reflect where recombination occurred to give you this exchange of information. 00:13:26.08 And in this particular slide I am showing the detailed information of two genomes. 00:13:33.28 The one at the top here you can see the sequences, 00:13:36.27 and in blue the amino acid sequences of the predicted genes in that region. 00:13:41.18 In the bottom you can see a second genome that we are comparing. 00:13:46.00 And this red shading over on the right hand side is 00:13:50.28 simply reflecting a segment where these genomes are closely related. 00:13:54.22 The nucleotide sequences, the DNAs are extremely similar if not identical in this red part, 00:14:03.09 but over here, they are completely different. They are completely dissimilar. 00:14:08.07 And so the key point that you can see from this type of comparison that this module boundary, this junction 00:14:15.20 between the red and the white parts where recombination must have happened, 00:14:21.13 this module boundary, corresponds precisely to the boundaries of the genes. 00:14:27.16 It is this boundary which is where this gene starts up here, and its comparable gene begins down here. 00:14:38.04 These genes to the left are very different, and to the right they are identical. 00:14:42.23 So the module boundary, or the recombinant joint which must have brought these together 00:14:48.15 coincides with the gene boundaries themselves. 00:14:52.25 And this is a common and important observation and it helps us to think about how mosaicism can be generated. 00:15:01.14 And there are two fundamental models. 00:15:03.05 The first is that recombination happens at targeted, short, conserved boundary sequences. 00:15:13.06 The idea that there are some short conserved segments of sequences, 00:15:17.00 perhaps a dozen or a couple of dozen nucleotides in length 00:15:19.23 which corresponds to those boundary regions. 00:15:23.05 And that homologous recombination perhaps encoded by host enzymes 00:15:26.26 catalyzes exchange at that region in order to promote recombination 00:15:32.15 at places where genes themselves in their entirety get exchanged. 00:15:36.22 There are some examples of that that have been reported in the literature. 00:15:41.29 So this is certainly an event that can happen. 00:15:45.05 We think however that it is more likely that much of the mosaicism that you see 00:15:51.02 because it is this pervasive feature throughout phage genomes 00:15:55.07 can occur by an alternative mechanism which is by illegitimate recombination 00:15:59.19 at what are essentially randomly chosen sequences. 00:16:04.04 In other words, that even though we see a close correspondence 00:16:08.08 between the point of recombination and the gene boundaries, 00:16:12.03 this does not result by this model from targeted exchange at that point. 00:16:19.03 Rather that the exchange positions are random 00:16:22.28 and the reason why that correspondence occurs is because of 00:16:26.12 selection for gene function for those genes that can actually work. 00:16:32.03 And so this just illustrates the different types of examples of recombination. 00:16:39.08 In the top panel one could imagine the targeted recombination, targeted homologous recombination, 00:16:45.03 could occur at these short black segments. Short segments of DNA that are conserved at gene boundaries 00:16:53.28 in order to give you these exchange events in these recombinants. 00:16:57.08 This middle panel here shows an example of illegitimate recombination 00:17:02.06 where recombination has essentially happened anywhere. 00:17:05.20 It has happened between sequences that are not related to each other. 00:17:09.02 And you get whatever gobbledygook may arise from just a random exchange in the process. 00:17:15.15 And at the bottom here I want to emphasize that we do expect recombination to occur between shared sequences 00:17:25.04 such as whole genes that are shared. 00:17:27.01 Homologous recombination of this sort always happens, 00:17:31.19 and it gives you new combinations of flanking genes, 00:17:34.28 such as A now joined together with C. 00:17:38.03 Ok. So homologous recombination is always going to play a role in reassorting 00:17:43.07 the types of genes that can be present in the modules. 00:17:47.03 But homologous recombination of this general type does not generate new recombinant boundaries, 00:17:55.18 new module boundaries unless it is in this targeted approach. 00:18:00.27 So as I mentioned we think that whereas there are a small number of examples 00:18:06.26 that would support the exchange of boundary sequences 00:18:11.02 By far the majority of the boundaries that we see when we compare phage genomes 00:18:16.13 show no evidence of such boundary sequences, lending support to the idea that illegitimate recombination 00:18:23.27 is playing a key role. But there are some really important consequences 00:18:28.26 that we have to think about as a model for illegitimate recombination in this process. 00:18:33.16 First of all, illegitimate recombination, recombination between sequences 00:18:38.15 that don't share anything or very little in common, 00:18:41.11 is likely to happen at rather low frequencies. 00:18:45.02 It is going to occur at random positions, for the most part, 00:18:49.26 and that when you put together two pieces of DNA randomly, 00:18:55.00 for the most part it is just going to generate genomic garbage. 00:18:59.17 Material which may not have a genome of the appropriate length, 00:19:04.24 and will have lost some genes and is liable to be non-functional. 00:19:11.03 So in its essence we can think of it as a rather disruptive or destructive type of process. 00:19:19.21 And one can imagine that if this was going to play an important role, 00:19:24.24 that you would probably need multiple low frequency events in order to actually generate survivors, 00:19:34.07 the phoenix that can rise from the ashes with a full complement 00:19:37.04 of functional sequences that can function as a virus. 00:19:41.27 If sequences are going to recombine randomly with each other 00:19:49.07 then there is no necessity to think of these events as being predominantly involving two phage genomes. 00:19:58.12 The bacterial chromosome is about a hundred times the size of an average bacteriophage genome, 00:20:03.22 and therefore there is going to be a strong propensity or at least an opportunity 00:20:08.05 for the phage genome to recombine with the bacterial chromosome. 00:20:13.21 The process we can think of as being one that is infrequent and yet extremely creative. 00:20:22.08 This is the way in which you can take pieces of DNA 00:20:26.08 and put them together in a way in which has perhaps never been seen before in nature. 00:20:32.10 That's a way of making new genes, or perhaps putting domains together in novel combinations, 00:20:40.06 and generating new types of functions which perhaps have not been seen in nature before. 00:20:47.08 And so this fits in very much with our model as described by Darwin for the process of the origin of species, 00:20:58.09 where we can think of the variation being generated by these illegitimate recombination events 00:21:05.26 and then natural selection working on what is essentially this garbage 00:21:11.14 in order to select from that those components that work. 00:21:16.15 Even though we would think of this as being a very low frequency event, 00:21:21.25 requiring infrequent recombination events and multiple numbers of them, it is nonetheless it is creative. 00:21:30.27 And as we saw previously, that phages have likely to have been evolving for many, many years 00:21:40.19 in a very dynamic population very successfully. 00:21:44.26 So this will give us these recombinant joints. 00:21:49.09 These recombinant joints once they are formed are likely to be stably maintained. 00:21:53.07 There's no mechanism necessarily for undoing them and therefore 00:21:57.14 these survive as we see today as the fossilized relics of recombination events 00:22:02.29 that probably happened many of hundreds of millions or even billions of years ago. 00:22:08.27 And thinking about the mechanisms by which this might happen, 00:22:12.00 it's been shown that many bacteriophages encode recombinase enzymes 00:22:19.25 which have the capability to recombine genomes at least at very short sequences 00:22:26.23 that don't have to be completely identical to themselves. 00:22:32.00 raising the interesting possibility that bacteriophages actually encode their own machinery 00:22:36.20 that can facilitate this type of recombination, 00:22:40.27 and indeed the generation of the mosaic genomes as we see them. 00:22:44.10 Looking at bacteriophages that are very different in their sequences 00:22:53.03 is quite limited to each other and this shows us that if we really want to learn more 00:23:00.14 about the details about how mosaicism is created and how it works, 00:23:04.29 we really have to think about, and very carefully, about what types of genomes we want to compare with each other. 00:23:12.04 And we will see an example of that in part three. 00:23:15.09 So we can conclude then from this genomic comparison of phages 00:23:24.14 we can conclude that phage genomes are architecturally mosaic. 00:23:28.15 That mosaicism is fueled by this process of illegitimate recombination. 00:23:32.27 And that genome segments can eventually be reassorted by homologous recombination 00:23:39.13 once new joints between new genes are generated to form that mosaicism. 00:23:45.09 In part three, we'll look at a rather particular case 00:23:49.14 of the detailed analysis of bacteriophages that infect one particular common host 00:23:54.27 where all those bacteriophages can be argued 00:23:58.00 to be potentially in genetic communication with each other, 00:24:01.21 and we can therefore explore what they look like 00:24:04.06 and the insights that they can give us in bacteriophage evolution.