Molecular Biology of Gene Regulation: Transcription Factors
Transcript of Part 1: Gene Regulation: An Introduction
00:00:00.22 My name is Bob Tjian, 00:00:01.18 I'm a professor of MCB at the University of California, Berkeley, 00:00:05.29 and I'm also serving as the President of the Howard Hughes. 00:00:09.29 I'm going to spend the next 25 or 30 minutes telling you about some fundamentals 00:00:15.09 of one of the most important molecular processes in living cells, 00:00:21.02 which is the expression of genes through a process called transcription. 00:00:26.19 Now, first, to understand what gene expression means, 00:00:32.08 you have to have a sense of what we tend to refer to in the field as the "central dogma" of molecular biology. 00:00:39.25 Another way to think about this is the flow of biological information from DNA 00:00:46.28 (in other words, our chromosomes, which every cell has its complement) 00:00:50.21 to be transcribed into a sister molecule called RNA. 00:00:56.25 So this process of converting DNA into RNA is called "transcription," 00:01:01.13 and that is the topic of this lecture. 00:01:05.12 This process is very complicated, as you will see by the end of my two lectures. 00:01:11.20 And it is very important for many, many fundamental processes in biology. 00:01:18.08 What I'm going to spend today's lecture on is the discovery of a large family of transcription proteins: 00:01:27.03 These are "factors," we call them, that are key molecules 00:01:31.11 that regulate the use of genetic information that has been encoded in the genome. 00:01:38.22 Now, transcription factors, or proteins, are involved in many fundamental aspects of biology, 00:01:46.04 including embryonic development, cellular differentiation, and cell fate. 00:01:51.20 In other words, pretty much what your cells are doing, how a tissue works, 00:01:56.13 and how an organism survives and reproduces is dependent on the process of gene expression, 00:02:03.17 and the first step in this process is transcription. 00:02:08.07 Now, there are many other reasons why a large group of people and scientists are interested in transcription, 00:02:16.13 and another reason is that understanding the fundamental molecular mechanisms that control transcription, 00:02:23.27 in humans or in any other organism, can inform us 00:02:29.13 and teach us about what happens when something goes wrong, for example, in diseases. 00:02:34.28 And I list here just a few diseases that we could study as a result of understanding the structure and 00:02:42.19 function of these transcription factor proteins that I'm going to be telling you about. 00:02:48.08 And of the course the hope is that in understanding the molecular underpinnings of complex diseases, 00:02:53.26 like cancer, diabetes, Parkinson's, and so forth, 00:02:58.00 that we will be able to develop and use better, more specific therapeutic drugs 00:03:05.01 and also to develop more accurate and rapid diagnostic tools. 00:03:10.04 So those are a couple of the reasons why many of us have spent, 00:03:14.01 in my case, over 30 years, studying this process of transcriptional regulation. 00:03:20.15 Now, to get the whole thing started, I have to give you a sense of what the magnitude of the problem is. 00:03:27.05 So imagine that one would really like to understand how this process of decoding the genome happens in humans. 00:03:34.24 So, as you may know, the human genome has some 3 billion base pairs 00:03:39.04 (or bits of genetic information), and that encodes roughly 22,000 genes. 00:03:45.01 These are stretches of DNA sequence that encode, ultimately, 00:03:50.08 a product that is a protein, which actually makes the cells function. 00:03:55.21 So as I already explained to you, there's this flow of biological information 00:04:00.19 where you have to extract the information buried in DNA, convert it into RNA, 00:04:05.17 and what I'm not going to tell you about today is the process of going from RNA to protein, 00:04:10.06 which is a reaction called a translational reaction. 00:04:13.27 I'm going to instead just focus on the first step of converting DNA into RNA, 00:04:18.22 which is the process of transcription. 00:04:22.02 Now, one of the most amazing results that we got over the last decade or so was, 00:04:29.11 when the human genome was entirely sequenced, 00:04:34.13 we realized that actually the number of genes in humans is not vastly different from many other organisms, 00:04:42.15 even simple organisms like little worms or fruit flies and so forth. 00:04:47.14 That is, roughly 22,000 to 25,000 genes is all the number of genes that all these different organisms have. 00:04:55.19 And yet, anybody looking at us versus a little roundworm in the soil or a fruit fly 00:05:02.13 can tell that we're a much more complex organism with a much bigger brain, 00:05:06.24 much more complex behavior, and so forth. 00:05:09.27 So how does this happen? 00:05:11.21 Part of the answer to this very interesting mystery or paradox 00:05:17.10 lies in the way genes are organized and how they're regulated. 00:05:21.22 And one of the most striking results of the genome sequencing project was to realize that 00:05:27.01 a vast, vast majority of the DNA in our chromosomes is actually not coding for specific gene products, 00:05:35.02 and that only roughly 3% of the DNA is actually encoding. 00:05:40.11 Those little arrows that I show you on this purple DNA are the gene-coding regions, 00:05:46.13 so you'll notice that there's a lot of "non-arrow" sequences, 00:05:50.04 which I'll show you in this next slide as green. 00:05:52.20 These are "non-coding" regions, so the vast majority (97% or greater) is non-coding. 00:05:58.26 So what are these other sequences doing? 00:06:02.18 And of course, it turns out that these sequences carry very important, 00:06:07.15 little fragments of DNA which we call "regulatory sequences." 00:06:12.09 And these are the sequences that actually control whether a gene gets turned on or not. 00:06:19.02 I'll be spending much of the next 20 minutes telling you about how this process all works 00:06:24.14 and what these little bits of DNA sequences actually function to control gene expression. 00:06:34.10 Now, the other thing I have to bring you up to date on 00:06:37.10 is this mysterious process we're calling "transcription," 00:06:41.01 which reads double-stranded DNA and then makes a related molecule, 00:06:45.13 which is a single-stranded RNA molecule, which is an informational molecule. 00:06:49.25 That reaction is catalyzed by a very complex, multi-subunit enzyme called RNA polymerase II. 00:06:59.02 Now, there's the Roman numeral II at the end of this because there are actually at least three enzymes in most mammals, 00:07:07.13 that carry out different processes and different types of RNA production. 00:07:11.23 But I'm only going to tell you about the ones that make the classical messenger RNA, 00:07:16.17 which then ultimately becomes proteins. 00:07:20.03 Now, one of the things we learned early on in the study of mammalian 00:07:25.28 (or other multicellular organism) transcription processes is that, 00:07:30.17 despite the fact that this enzyme is quite complex in its structure, 00:07:35.28 it turns out to be an enzyme that nevertheless needs a lot of help to do its job. 00:07:42.16 So, on its own, this RNA polymerase II cannot tell the difference between the non-coding regions of the genome 00:07:49.29 and places where it's supposed to be coding, or reading, to make the appropriate messenger RNAs. 00:07:57.00 So this sort of leads you to think that there must be a number of other factors 00:08:02.25 that somehow directs RNA polymerase to the right place at the right time in the genome of every cell in your body, 00:08:11.08 so that the right products get made, so each cell in your body is functioning properly. 00:08:18.15 And this is where things get really interesting. 00:08:23.01 Some 25, 30 years ago, a number of laboratories took on the job of hunting for these elusive and, 00:08:31.17 as it turned out, specialized protein factors that recognize these little stretches of DNA sequences that I've been telling you about, 00:08:40.14 that make up the vast majority of the non-coding part of the genome, 00:08:45.15 and how these proteins can then recognize and, ultimately, 00:08:49.13 physically interact with these little bits of genetic information to then turn genes on or off. 00:08:57.21 Now, in this lecture, 00:09:00.02 I can't go into all the details of the types of experiments or the ranges of experiments 00:09:05.24 that many, many laboratories have done over the last two decades 00:09:09.11 to finally work out this molecular puzzle of how transcription works. 00:09:14.24 But I can tell you that there are fundamentally two major approaches 00:09:18.14 that have been taken over the last few decades to get a "parts list" 00:09:24.14 of the machinery that decodes the genome and carries out the process of transcription. 00:09:29.20 One is kind of the old style. 00:09:32.16 I'll call it "bucket biochemistry": 00:09:35.21 Take a live cell, crush it up, spread out all of its parts, 00:09:40.17 and then try to figure out how to put it back together again. 00:09:43.04 That's what I call "in vitro" biochemistry. 00:09:45.23 And the other one is "in vivo" genetics, 00:09:47.27 where you effectively use genetic tools (mutagenesis) 00:09:51.23 to go in there and selectively remove or "knock down" 00:09:55.18 or "knock out" certain genes 00:09:58.03 and gene products, and then ask what is the consequence on that cell or that organism. 00:10:03.01 Both of these technologies are very powerful and highly complementary, 00:10:10.20 and they continue to be used. 00:10:13.13 Today, I will focus primarily on the in vitro biochemical techniques 00:10:18.16 which led us to the discovery of the first few classes of transcription factors, 00:10:24.15 and in subsequent lectures, we'll go to more recent technologies that 00:10:29.22 allow us to speed up this whole process of identifying key regulatory molecules and how they work. 00:10:38.26 So, let's go back to the basic unit of gene expression, which is a gene, 00:10:45.04 here shown in the orange arrow, and the non-coding sequences surrounding it. 00:10:51.29 And you'll see that now I've added a few more elements to this purple DNA. 00:10:56.06 You see some symbols: blue square, round circle that's pink, and then a yellow triangle. 00:11:02.22 Those are just a way for me to graphically represent the little bits of 00:11:07.28 DNA sequences that I told you about that are the regulatory sequences. 00:11:11.08 So the little, round one happens to be very GC-rich, 00:11:14.22 the triangle one is a classical element that's called a TATA box 00:11:19.06 (I'll tell you about it a little bit later), 00:11:20.20 and the blue one is yet another recognition element. 00:11:23.18 So, why are we so interested in these little stretches of nucleic acid sequence 00:11:28.26 in the genome when it's buried amongst billions of other sequences? 00:11:33.10 Well, these individual little sequences turn out to be very important because of 00:11:37.25 where they sit (you'll notice they are sitting near the top of the arrow), 00:11:42.07 and they are recognized by very special proteins, which are the transcription factors. 00:11:48.11 So now I'm showing you some symbols with little cut-outs 00:11:51.24 which fit into either the square, the circle, or the triangle. 00:11:56.03 So, transcription factors, at least one major family of transcription factors, 00:12:03.07 are proteins whose three-dimensional structure is folded into a shape that 00:12:08.21 allows them to recognize these short stretches of double-stranded DNA, 00:12:13.19 in fact, largely through interactions with the major groove of DNA. 00:12:17.16 And I'll show you a structure of one in a little bit. 00:12:22.03 Now it turns out that there are probably thousands of these transcription factors, 00:12:26.23 because the number of genes that we have to control, 00:12:29.25 as I showed you, is on the order of 20,000 or 25,000 genes. 00:12:33.18 And so it turns out that you need a pretty large percentage of the genome devoted to 00:12:38.23 encoding these regulatory proteins, in order for a complex organism like ourselves to survive. 00:12:45.09 The other component of this, let's call it the "transcriptional apparatus," 00:12:49.16 is of course the enzyme that catalyzes RNA, and I already told you that this enzyme 00:12:55.04 on its own can't tell the difference between random DNA sequence and a gene or a promoter. 00:13:01.16 These other sequence-specific DNA-binding proteins are the ones that 00:13:06.17 must recruit or otherwise direct RNA polymerase to essentially land on the right place and 00:13:14.03 at the right time in the genome, to turn on a certain subset of genes that are specifically required 00:13:20.26 in a specialized cell type, whatever cell you happen to be looking at. 00:13:26.16 So, that is kind of the first level of complexity of informational interactions between 00:13:33.05 the transcription factors and the more ubiquitous, 00:13:38.20 I would call it promiscuous, RNA polymerase II enzyme. 00:13:43.19 Well, as it turns out, it took several decades to work out most, 00:13:49.18 if not all, of the components of this so-called transcriptional machinery. 00:13:56.14 And it turns out, in this slide I'm showing you things are already starting to get more complicated, 00:14:03.05 so not only do you have RNA polymerase, but you have a bunch of other proteins that 00:14:07.04 go by names like TFIIA, B, D, E, H, F, and so forth. 00:14:13.12 So, it looks like there are going to be many, many proteins that 00:14:17.14 are necessary to form the transcriptional apparatus. 00:14:22.04 And then on top of that, you need sequence-specific DNA-binding proteins, 00:14:26.13 which I already described to you, to further inform or otherwise regulate the process of when 00:14:33.04 a particular RNA polymerase molecule should be binding to a particular gene. 00:14:37.29 So that's the sort of overview. 00:14:40.05 Now let me get into the specifics and how did we actually discover the family of proteins, 00:14:45.15 and it'll be interesting for you to see how science in this field evolved. 00:14:51.14 Now, as is often the case, when you first try to tackle a very complex problem, 00:14:56.28 and of course we didn't really know how complex it was when we began these studies, 00:15:00.24 but we assumed it might be complicated. 00:15:03.16 Certainly it would be more complicated than systems that we already had some idea about, 00:15:09.06 for example in bacteria or bacteriophages. 00:15:13.24 We took a lesson from our studies of bacteriophages and decided that, 00:15:18.11 to begin to dissect the molecular complexities of 00:15:22.09 the transcription process in animal cells, we should start with viruses, 00:15:26.28 because we knew that viruses will enter these host cells, these complex cells that 00:15:31.27 we ultimately want to study, and have to use the same 00:15:35.17 molecular machinery to transcribe their genes as the host mammalian cell would do. 00:15:41.27 So, this was kind of a trick or a way to look at a 00:15:45.28 molecular window into a complex system and try to simplify it. 00:15:49.22 And in our case, the early studies of the late 70s and early 80s 00:15:54.18 involved a very simple, one of the simplest double-stranded DNA viruses, called "Simian virus 40." 00:16:00.28 And Simian virus 40 is of course a monkey virus, which was nice because it's very close to humans, 00:16:06.15 and many things that we could learn about the way this virus uses its host, which are monkey cells, 00:16:12.12 to replicate and to express their RNAs and genes, would be applicable to our studies of humans, as you'll see. 00:16:20.11 And this virus was one of the first whose DNA, 00:16:24.07 its double-stranded DNA of about 5000 base pairs, was fully sequenced. 00:16:29.14 This was long before rapid, modern-day sequencing was available, 00:16:33.12 so this gave us a very powerful tool. 00:16:35.20 It basically allowed us to look at the entire genome of this virus, 00:16:39.03 which was tiny by comparison, only 5243 base pairs. 00:16:44.28 But just that information was already very important 00:16:47.27 because it very quickly allowed us, for example, to map where the genes are, 00:16:53.02 and one of the genes encoded a protein called the "Tumor antigen" 00:16:57.19 which turns out to be a transcription factor. 00:17:00.08 This then allowed us to get our hands (basically, to do biochemistry and genetics) 00:17:06.04 on the very first eukaryotic transcription factor. 00:17:09.14 Which in this case happens to be a repressor; 00:17:12.09 that is, a protein that, when it binds to DNA, just the same way as I showed you for the model case, 00:17:19.24 binds through specific protein-DNA interactions, 00:17:23.28 but in this case actually shuts transcription down rather than turn it up. 00:17:29.19 In the process of studying the way that this little virus, when it infects a mammalian cell, 00:17:36.15 uses proteins like T-antigen to regulate its gene expression, 00:17:42.17 it became clear that it had to use the host machinery to do the process. 00:17:47.22 And that meant that there must be monkey proteins that 00:17:53.00 are also involved in activating or repressing genes of this virus, 00:17:57.19 and this then led us to the most important step, 00:18:00.19 which is to transfer the technology we learned about viruses 00:18:04.06 and how to work with the virus transcription factor, like T-antigen, 00:18:07.26 to the cellular ones. And I'm going to give you just one example 00:18:11.05 of how the simple jump into the host cell allowed us to discover the first human transcription factor. 00:18:17.28 So, the question we then asked back in the early 1980s was: 00:18:23.17 What host molecule is regulating the expression of transcription of this virus, when the virus is in the host? 00:18:31.10 And we knew from the DNA sequence of the virus that 00:18:34.02 there were these six very GC-rich snippets of DNA that were regulatory, 00:18:41.05 because if we deleted them, the virus no longer would express the gene of interest. 00:18:45.20 So we knew that something was probably responsible for recognizing these GC boxes, 00:18:51.10 and we knew that it wasn't a virally encoded gene because we had tested 00:18:54.29 all of the viral genes, of which there were only six to begin with. 00:18:59.05 So we knew it had to be a host gene, and that led us to a whole, 00:19:04.06 I would say "family," of experiments that 00:19:06.20 led to the discovery of sequence-specific mammalian transcription factors. 00:19:10.25 And, as I said, we could've taken multiple approaches 00:19:13.25 to try to address this complicated issue. 00:19:16.26 I'll just give you one example of using in vitro biochemistry 00:19:20.29 to finally get our hands on this key, sequence-specific human transcription factor, 00:19:27.12 which of course has a homolog in the monkey. 00:19:31.29 And the way we did it was very interesting and simple in retrospect, 00:19:37.02 and that is recognizing the fact that whatever this protein was, 00:19:41.24 it had to have the property of recognizing those GC boxes that were sitting next to the viral gene. 00:19:49.22 We assumed that it must be a sequence-specific DNA-binding protein, 00:19:53.09 so all we had to do was figure out a way to extract proteins from human cells or monkey cells, 00:20:01.05 and then try to fish out those specific proteins out of the many thousands 00:20:05.26 of different proteins that were in this gemisch of cellular extract that would be responsible for 00:20:11.12 discriminating between random DNA sequences and the specific GC box. 00:20:17.15 And I'll quickly run through sort of the logic behind this. 00:20:20.19 So what I'm showing you here is a solid surface with DNA coupled to it that is 00:20:28.00 highly enriched for the recognition element, the GC box, 00:20:31.14 which should be the sequence recognized by the protein of interest. 00:20:35.14 Now, we had no idea what this protein was going to look like, 00:20:37.18 how many proteins there were going to be, and so forth, 00:20:39.20 but we knew it had to recognize the GC box. 00:20:42.05 So, we're going to try to fish this out of a pool of many thousands of other proteins. 00:20:47.18 Now, the key trick here was that, 00:20:49.28 because all cell extracts contain not only one DNA-binding protein but, as I told you, 00:20:55.25 thousands of different DNA-binding proteins, most of them, or in fact, in our case, 00:21:00.17 none of the other of several hundred to a thousand proteins that could bind DNA actually happened to recognize the GC box. 00:21:08.27 They just bind other DNA sequences. 00:21:11.07 So to kind of favor our protein being able to bind to our GC box and 00:21:16.07 not have to compete with all the other proteins, what we did was 00:21:20.22 to add nonspecific DNA in mass stoichiometric excess, so that all the other proteins that 00:21:28.21 wouldn't recognize the GC box would still have some partner to hang onto. 00:21:33.11 And this trick worked very well. 00:21:35.00 So, having the specific DNA on the solid resin and the nonspecific DNA flowing all over the place, 00:21:43.23 we could capture selectively the pink molecules here, which were the GC box recognition ones, 00:21:50.07 and the blue-green molecules of course predominantly bind to nonspecific DNA. 00:21:56.02 I show you one little blue one on the column because nothing works perfectly in real science 00:22:01.10 and it tells you that we have to go through this process iteratively to finally obtain 00:22:06.21 a preparation that's purely pink molecules with no green-blue ones. 00:22:11.18 Well, that turned out to work very, very well, 00:22:14.13 and that whole process of biochemical fractionation followed by a direct affinity, sequence-specific DNA resin 00:22:23.21 gave us the ability to perform a biochemical purification, followed by a molecular cloning, 00:22:30.12 of the transcription factor that encodes the protein Sp1. 00:22:35.03 And then we carried out a bunch of experiments, which I'll tell you next, 00:22:38.20 to show that this protein actually does activate transcription. 00:22:43.16 And of course, we went back and we proved that this protein, which turned out to be a rather large polypeptide, 00:22:49.08 can indeed recognize the GC box, and it doesn't matter if it's a GC box from the SV40 genome 00:22:55.20 or any other GC box that we could find in the human genome. 00:22:59.16 It would find that sequence and bind to it, 00:23:02.12 and then it would generally activate transcription. 00:23:05.22 So this led to the discovery of the first of a 00:23:09.03 very large family of sequence-specific DNA-binding proteins. 00:23:13.23 Now, I told you that the way these proteins tend to recognize short DNA sequences 00:23:19.08 is to interact with DNA through the major groove, and here is a perfect example. 00:23:23.12 So the stick, blue model there shows the actual three structures 00:23:28.17 that are called "zinc fingers," and the reason they're called zinc fingers is because 00:23:31.29 there are amino acids that are organized around a center that contains a zinc molecule, 00:23:37.27 which holds the three-dimensional shape of the polypeptide in a position just right for 00:23:43.23 fitting into the major groove of DNA, and the DNA here is shown in pink. 00:23:47.24 And you can see that that blue outline fits right into the major groove of the DNA, but not to the minor groove. 00:23:54.25 And one of the most important findings was not only the discovery of the first human transcription factor, 00:24:00.27 but the realization that most if not all sequence-specific DNA-binding transcription factors 00:24:06.18 have a similar structural motif. That is to say, 00:24:10.19 some structure is built to recognize sequences in the major groove of DNA, 00:24:15.24 and these three-dimensional motifs are recognizable as amino acid sequences in the genome. 00:24:23.27 So we can now much more quickly scan the entire sequence of a genome and 00:24:28.24 identify genes that are likely to be DNA-binding proteins, 00:24:31.28 as a result of understanding the structure-function relationships of these DNA-binding motifs, like zinc fingers. 00:24:40.03 So, what I'd like to show you now is that I've only introduced you to one class of transcription factors, 00:24:48.08 which are the sequence-specific DNA-binding proteins. 00:24:51.07 Well, I think I gave you a little taste of the level of complexity that's probably going to be needed to be able to 00:24:58.11 build the machine that's ultimately going to be able to allow you to transcribe every gene in every cell of the human body. 00:25:07.08 So that turns out to be a much more elaborated machine than what I just showed you. 00:25:12.05 So I want to show you now what is sort of our state-of-the-art thinking about 00:25:17.18 what is actually needed to build the machinery at a gene to allow it to be expressed and transcribed. 00:25:25.16 And the term I want to introduce you to is the "pre-initiation complex," 00:25:30.27 and it's pretty much what it says. 00:25:33.10 It's the complex of multiple subunits that has to essentially land on the promoter of a gene, 00:25:40.28 which will be designated for later expression. 00:25:45.14 And this is a process that is probably quite orderly; 00:25:50.01 that is, there's an order of events that happens, which we, 00:25:52.28 by the way, are not entirely sure exactly what the order is or 00:25:56.18 even if the order is the same from one gene to the next. 00:25:59.12 But we can kind of see where it starts and where it ends up, 00:26:02.07 and the pathway in between, I would say, is still a little bit murky. 00:26:06.18 And the story here, again, starts with a little snippet of DNA called the TATA box, 00:26:11.13 which I already introduced you to briefly. 00:26:13.21 It's an AT-rich sequence which sits at the 5' end, 00:26:18.22 or the beginning, of many genes, but not all genes... 00:26:21.01 maybe 20% of the genes might contain this AT-rich region. 00:26:26.15 And that AT sequence is the signal or landmark, if you like, 00:26:31.16 for a particular protein to bind to it. 00:26:34.07 And that protein is called, not surprisingly, 00:26:36.26 the "TATA-binding protein," because it's the TATA sequence. 00:26:40.11 And so this represents a second class of transcription factors. 00:26:45.08 These are not the type that I just introduced you to, which are going to be different for every gene. 00:26:50.22 The TATA sequence is present in a very large number of genes, so it can't be gene-specific. 00:26:56.22 But it turns out to be very crucial for our understanding of how gene regulation works. 00:27:02.06 So, you start with a TATA-binding protein finding a TATA box. 00:27:08.01 We later found out that the TATA-binding protein rarely functions on its own 00:27:12.07 and has a bunch a of friends that we call "TAFs," for "TBP-associated factors." 00:27:17.03 And now you're talking about an assembly, a multi-subunit complex of almost a million daltons. 00:27:23.12 There are somewhere between 12 to 15 subunits in addition to the TATA-binding protein that make up this 00:27:29.05 little complex of proteins that kind of travels around together, and this is found in most cell types. 00:27:36.00 And later on I'll show you in a subsequent lecture that not every cell type might have 00:27:40.29 exactly the same complement of these subunits, 00:27:43.17 but many of them have this prototypic complex. 00:27:47.27 Is this enough for building the pre-initiation complex? 00:27:52.06 Unfortunately not. It turns out that there are a host of other, 00:27:57.21 I'll call them "ancillary factors," in addition to the multi-subunit RNA polymerase itself, 00:28:03.12 that are necessary for you to build up an assemble that is necessary to form an active, 00:28:12.06 ready-to-activate transcriptional pre-initiation complex, or the PIC. 00:28:18.27 This is kind of the picture we're getting to, and even this picture, 00:28:24.29 with many, many colors and many, many different polypeptides, 00:28:28.18 that adds up to probably greater than 85 individual proteins that all have to kind of fit together like a jigsaw puzzle... 00:28:37.00 it's probably not even the whole story. 00:28:39.17 You'll notice I still have one big, red question mark there, 00:28:42.24 because I think, as we begin to study specific cell types and specific processes, 00:28:48.09 like embryonic development or germ layer formation, additional components that are not present here 00:28:55.19 in this prototypic pre-initiation complex will come into play. 00:29:00.14 And that's a subject of a subsequent lecture. 00:29:03.15 But already you can tell that the transcriptional machinery is anything but simple. 00:29:09.23 So, can we get a better idea of what transcription might actually look like? 00:29:15.19 What's happening when a transcription process takes place? 00:29:18.20 So, let me first of all say that I'm going to finish my lecture now with a little cartoon, 00:29:24.15 which is our attempt to imagine the events that take place when you form a pre-initiation complex, 00:29:33.01 you bring regulatory proteins to the activated gene, and what happens during this process. 00:29:40.09 Now, keep in mind that this is, at this point, mostly a cartoon that is in our imagination, 00:29:47.25 and only parts, if any, of this is probably real. 00:29:52.29 But it gives you a sense of the complexity of the transactions that have to 00:29:57.26 take place just for one gene to transcribe and express itself. 00:30:02.07 So let me show you the movie, and then we'll finish 00:30:05.28 just by keeping in mind that there is much to be learned, and in my next lecture, 00:30:11.09 we'll go into the selectivity of this process in specialized cell types. 00:30:16.20 So now let's see what this cartoon of transcription looks like. 00:30:22.08 So we start off with DNA with some pre-assembled TFIID molecule, 00:30:26.27 and along comes this other green molecule, which is actually a co-factor, 00:30:30.24 which then forms this very large complex with RNA polymerase. 00:30:34.06 And then a distal activator protein came in and activated the process, 00:30:39.10 and this molecule, this bluish molecule that's moved away from the complex, 00:30:46.18 is actually the RNA polymerase. 00:30:48.03 And that little, yellow, sort of a bead-on-a-string is actually the RNA product. 00:30:53.20 So that gives you a sense of: 00:30:56.01 Things have to happen quickly, and yet it involves many, many molecules having to assemble 00:31:01.02 and then disassemble to give you this reaction to happen. 00:31:04.06 And in my next lecture, we'll go into more specific aspects of this reaction, 00:31:10.13 and particularly during embryonic development and tissue-specific gene expression.