Molecular Biology of Gene Regulation: Transcription Factors
Transcript of Part 2: Gene Regulation: Why So Complex?
00:00:01.03 My name is Bob Tjian, 00:00:02.19 I'm a professor at the University of California at Berkeley, 00:00:06.19 where I've taught many years in Molecular Biology and Biochemistry, 00:00:11.02 and more recently, I've also taken on the job of being the President of the Howard Hughes Medical Institute. 00:00:16.26 And it's my pleasure today to continue with my second lecture in this series, 00:00:22.29 to describe to you some exciting ideas about how gene regulation works, 00:00:30.25 particularly in more complex organisms. 00:00:34.21 Now, in my last set of lectures, I left you with this view of the type of complexity that has to be evolving 00:00:48.26 to allow the type of gene expression patterns that we see in the many, many organisms that we know exist on this planet. 00:01:00.29 And so, there's some really intriguing questions that I'm going to address in this second lecture. 00:01:09.15 And one thing I left you with was an image of the interplay of many molecules that have to come together, 00:01:18.26 and to land on a particular site of the DNA molecule that's part of the chromosome of an organism 00:01:24.25 or within a cell of an organism, and how this process might work. 00:01:31.04 But, I think the question that's plagued us for decades, 00:01:35.20 now that we had a better idea of what this molecular machinery looks like that's involved in decoding DNA information into gene expression, 00:01:49.07 we wondered why is it so complex? 00:01:53.20 And to sort of begin to address this issue, let me just take you back to a simple concept. 00:02:01.10 And you remember that different organisms have different sizes of their genomes, 00:02:08.00 that is, the amount of DNA that is required to encode the particular organism. 00:02:14.23 And here are some examples of both bacteria, simple, single-celled prokaryotic organisms, 00:02:22.18 as well as single-cell eukaryotic organisms like the baker's yeast, and then there's the little, round soil worm C. elegans, 00:02:30.25 and then you can go up to up mammals and vertebrates. 00:02:33.29 And you'll see first of all that the amount of DNA can vary a lot from a few million base pairs 00:02:40.17 all the way up to 3 billion base pairs or more. 00:02:45.00 To go along with this sort of expanding level of DNA and chromosome length, you also have different levels of genes. 00:02:54.13 Now, you'll notice that the range of genes is a lot less than the range of DNA length, 00:03:00.20 so this partly informs us about maybe why we need the complexity that we ultimately discovered is involved 00:03:10.26 in forming this molecular machinery that's responsible for reading the genetic information. 00:03:17.22 So this is just a little table to reemphasize that these more complex genomes, which also means more complex organisms, 00:03:28.15 which really means a lot of different cell types, many different behaviors, complex interactions with their environment and so forth, 00:03:38.24 how is all this information really decoded from our genomes? 00:03:43.03 And on one side here, you see the prokaryotic core gene regulatory machinery, 00:03:50.07 or the core transcription machinery, and in almost all bacteria, 00:03:54.20 it's only a few polypeptides... 5, 6, 7 polypeptides. 00:04:00.00 Then, on this side, you'll see that the so-called eukaryotic organisms, 00:04:05.00 and particularly when you talk about multicellular metazoan organisms, now you see huge diversity and number of proteins or, 00:04:15.17 as we call, transcription factors, that are necessary to assemble into very large, multi-subunit ensembles 00:04:25.18 that are required to transcribe the 10,000 to 30,000 genes that define these more complex organisms. 00:04:33.21 So, right away you can see that there's this proliferation of the subunits and the machinery and the complexity. 00:04:43.00 So, in this lecture, I'm going to give you a little sense of maybe why this is the case, 00:04:49.11 and what's special about the more complex, multicellular organism, 00:04:55.11 and why this machinery may have to have been more elaborated through evolution, compared to simpler organisms. 00:05:05.07 Now, one of the first things that you realize when you look into the cell, 00:05:10.03 or particularly the nucleus of a higher organism, let's say our own cells, versus a bacteria, 00:05:17.10 is that the DNA, the very molecule that makes up the genetic information, is kind of packaged away in a very different way. 00:05:25.11 So, in all eukaryotes, the double-stranded DNA doesn't sit there in the form that we would call 00:05:33.29 the "naked" DNA, which is shown up at the top here. But rather, 00:05:38.17 this DNA is wrapped up with a set of proteins, very basic proteins, called "nucleosomes," 00:05:46.00 and these are in turn further packaged all the way to highly condensed form 00:05:52.09 that ultimately forms the chromosomes that you'll be able to see under a microscope. 00:05:57.26 And the blue figures over here and green figures just give you a view of the 00:06:02.29 high-resolution structure of a nucleosome with DNA wrapped around it. 00:06:09.00 So, what is the consequence of having all of our DNA, 00:06:14.04 all our chromosomes, condensed and wrapped up in this way? 00:06:18.09 You can think of it as packaged away. 00:06:20.15 Well, one thing is that you can shove all this down into a small nucleus, 00:06:25.15 so if we strung out our DNA in every cell in our body, out from end to end 00:06:31.20 and stretched it out like a string, it's almost a meter long. 00:06:35.12 And yet, you have to cram all that into a tiny, little volume. 00:06:39.10 And part of the way that that happens is that you can compact the DNA by these structures. 00:06:46.24 Now, the consequence of that is, of course, you somehow have to negotiate 00:06:52.18 through this highly compacted form of DNA to get access to the DNA information and the genes. 00:07:00.12 So, to put it another way, you have to have a machinery, 00:07:04.24 a transcriptional apparatus whose job is to read DNA and, you remember from the first lecture, 00:07:10.27 convert that DNA information into RNA, an intermediate molecule 00:07:14.23 which ultimately then gets translated into a protein product. 00:07:18.19 Well, clearly one of the reasons we have this highly elaborated transcriptional machinery 00:07:25.02 is in part to deal with having to navigate through a chromatin template, as opposed to a naked DNA template. 00:07:33.24 And so there are various proteins and protein complexes that are called 00:07:38.29 "chromatin remodeling complexes," "chromatin modifying complexes," 00:07:44.12 and these have to coordinate with the transcriptional machinery down here in the yellow and the orange, 00:07:50.01 in order to navigate and basically express a series of interactions 00:07:55.08 that are transactions between the protein machinery and the DNA. 00:08:00.29 So this is a very challenging problem. 00:08:04.10 So that's part of the problem, or part of the reason why we think there's such complexity. 00:08:09.11 So, how did we come to this picture? 00:08:12.05 How did we finally get to figuring out that there were over 85 proteins that all have to assemble on a chromatin template, 00:08:21.16 to give you gene expression and transcription, in the right place, in the right time? 00:08:26.25 And I want to just give you one sort of quick look into a technology that one can use to address the issue of, 00:08:37.03 how do we break down this complex machinery into understandable units? 00:08:42.29 And as I said in the first lecture, there are many tools that molecular biologists 00:08:47.18 and biochemists can use to try to tease out these complex molecular transactions. 00:08:53.18 One of them, of course, is to use genetics, which is to use genetic mutation 00:08:58.29 to either remove or alter one particular gene product and then ask what is the consequence. 00:09:05.06 The other way to do it is to actually take a cell with all of its complexity 00:09:10.02 and break it down literally into its component parts, and then try to put it back together 00:09:14.09 again in a functional form. And that's what I'm going to show you today. 00:09:18.00 And it's a technology I kind of call the "biochemical complementation assay." 00:09:23.03 And it's very simple: You ask, what are the minimal components, 00:09:27.24 for example, in the case of a human gene... what are the minimal protein components of the transcriptional apparatus 00:09:33.29 that you can extract from the nucleus of a cell that you need to put into a test tube 00:09:38.19 that will allow you to essentially reconstruct or, as we say, 00:09:42.19 reconstitute the activity that will allow you to read the gene in an accurate fashion? 00:09:48.19 And you can keep adding or taking away different proteins, 00:09:53.13 the yellow ones, the green ones, the orange ones, and so forth, 00:09:56.24 and ask, does it make any difference? 00:09:59.11 And by playing this adding and subtracting, or "biochemical complementation," assay, 00:10:04.29 you can very quickly discover what are the minimal components you need to activate a gene in a regulated fashion, 00:10:11.12 and what are other things that might be necessary to support this activity. 00:10:16.15 So, the first question that was asked was from the biochemical analysis of about 00:10:24.02 four dozen different proteins: What are really necessary and sufficient? 00:10:30.07 In other words, what's the minimal component set that you need to give you regulated transcription? 00:10:36.29 So we're now asking a more complicated question. 00:10:39.09 Not only what is necessary to just simply give you transcription, in other words the conversion of DNA into RNA, 00:10:45.29 but to do it in a regulated fashion. 00:10:47.27 Because after all, that's what's really interesting... 00:10:50.23 is why one cell does it in one way and a different cell has a different program. 00:10:55.25 And this experiment here says that our sequence-specific classical transcription factor that 00:11:02.00 binds DNA at its regulatory promoter region, together with what we will call the "core" or 00:11:09.07 "basal" machinery of transcription, is necessary but not sufficient. 00:11:14.07 So, plus or minus the activator Sp1 doesn't make any difference, 00:11:19.16 even though we know that in a living cell, Sp1 is highly activating this gene that we're looking at. 00:11:26.16 So, that means there's something missing in this reconstitution experiment. 00:11:31.16 So, how do we go find what's missing? 00:11:35.03 And this biochemical complementation really relies on our ability to take the cells that contain the necessary components 00:11:43.25 and the sufficient components, and then start to extract it 00:11:47.28 and to find which molecules are missing that we're not adding to our reaction yet. 00:11:54.05 And to do that, we basically have to take the cells, in this case, human cells, 00:11:58.28 break the cells apart, extract the nucleus, remove all the proteins from the nucleus, 00:12:04.03 and begin to separate the thousands of different proteins 00:12:08.05 that are in the nucleus into different pools, if you like. 00:12:12.08 And we separate them based on their physical and chemical properties, 00:12:17.19 and some of you probably have had some experience in running column chromatographs. 00:12:23.16 This is basically a way of separating proteins based on their positive charge, negative charge, molecular size, 00:12:32.16 hydrophobicity (in other words, how greasy they are... how well they interact with water), and so forth. 00:12:38.20 So if you do that iteratively, as is shown here in a series of different anion exchange 00:12:45.23 and cation exchange, as well as gel filtration, chromatographs, 00:12:50.12 you can eventually separate the thousands of different components of a nuclear extract into its individual parts. 00:12:59.00 And then you can test each one to see if they're the missing piece. 00:13:03.15 And when you do that, lo and behold, you find that there are a couple of missing pieces 00:13:07.22 that are necessary for you to add back, in other words, reconstitute, the reaction 00:13:13.12 so that now you have regulated transcription. 00:13:15.27 So unlike the previous data that I showed you, 00:13:19.14 now you can see that the machinery is more complex and, most importantly, 00:13:24.23 you can see also that the machinery is now responsive to the activator. 00:13:29.19 So, the signal with the activator, plus Sp1, is much darker than in the signal without Sp1. 00:13:36.13 That means that there is activated transcription that is Sp1-, a classical transcription factor, dependent. 00:13:43.11 So that allowed us to identify two very important, key components that we didn't know about before we did this experiment: 00:13:51.29 One is a multi-subunit complex called the "Transcription factor II D," 00:13:57.25 and the other one is called the Mediator complex. 00:14:01.01 And these turn out to actually define an entirely new class of transcription factors, which are the so-called co-factors. 00:14:09.27 So I'm going to tell you a little bit more about one of these co-factors, 00:14:13.15 because they both really perform similar functions, 00:14:16.13 but we happen to know quite a bit more about one of them than the other. 00:14:20.17 So this so-called TFIID complex has roughly 15 subunits, in other words, 00:14:25.28 15 separate proteins that have to mesh together to form a complex. 00:14:31.22 And it's a very large macromolecule, so it's a million daltons... 00:14:36.23 that's a very, very large, floppy molecule, with many pieces to it. 00:14:41.12 One of its functions you already know about, 00:14:43.16 because it contains as one of its subunits the so-called "TATA-binding protein." 00:14:48.12 That's that saddle-shaped molecule that binds to double-stranded DNA, 00:14:55.02 at the AT-rich sequence called a TATA box, 00:14:58.06 which is associated with many genes in animal cells. 00:15:02.09 But what we've come to learn in the last decade or so is that this little complex 00:15:07.22 is doing much more than just simply binding to the TATA box; 00:15:11.18 it's doing a whole bunch of other things that we didn't have any idea about. 00:15:15.12 And now that we knew the existence of this activity and that it was critical not only for TATA binding, 00:15:22.08 but also for mediating or potentiating transcription activation, we then could break down 00:15:28.04 more of its functions of individual subunits, because you remember there's 15 different polypeptides here. 00:15:34.09 And this is just a little summary showing you that this complex of proteins 00:15:38.11 is doing a lot of different functions. 00:15:41.18 It's recognizing the nucleosomes, which have a basic protein called a "histone," 00:15:49.26 and so it recognizes histones only when it's got 00:15:53.09 a certain chemical modification called an acetylation event. 00:15:59.12 This big orange complex also itself has enzymatic activity, including kinase activity, 00:16:06.06 which can put phosphate groups on other proteins and enzymes. 00:16:10.04 It has acetylase activity, and of course, it has to interact directly 00:16:15.05 with activators in order to potentiate their function in turning on transcriptional activation. 00:16:22.07 And I'm probably safe in speculating that 00:16:26.20 there are yet unknown functions of this large complex that we still have to discover, 00:16:32.23 because we've really only understood maybe half of the subunits, and even there, 00:16:37.25 only partially understood the functions of that half of the subunits that are part of this complex. 00:16:44.05 So, there's clearly much more work to be done, but I think what's clear from these experiments 00:16:48.25 is that these proteins are doing a lot more than just binding DNA. 00:16:53.08 They're what I would think of as integrators of information. 00:16:58.01 So, this integrator of information means that this structure and the function is very complex, and so, 00:17:04.27 one of the things that we've had to do... 00:17:08.00 it's been a very challenging problem that remains challenging, 00:17:11.21 because we haven't solved all the technical problems... 00:17:13.18 is that because it's a large, megadalton, floppy molecule, solving the three-dimensional 00:17:19.25 structure of such large assemblies has proven to be rather technically challenging. 00:17:26.08 And we have to use many different techniques to try to address this in: 00:17:31.28 X-ray crystallography, NMR... 00:17:34.07 but one of the techniques that's emerging, that's very, very powerful 00:17:38.09 for solving the structures of these large assemblies is something called "cryo-electron microscopy." 00:17:46.24 It's basically a way of freezing these large assemblies in place, 00:17:52.18 and then solving their structure by microscopy. 00:17:55.29 And this is just about a 25 angstrom, so relatively low-resolution structural determination, 00:18:03.17 of the human TFIID complex and, most importantly, 00:18:08.24 its relationship to two other transcription factors that are part of the assembly 00:18:14.03 that has to align itself up on the promoter to start transcription, 00:18:18.08 and that's the other two transcription factors TFIIA and B, which are shown in green and purple here. 00:18:24.13 So you can slowly start building up the entire complex in pretty accurate three-dimensional space 00:18:33.00 to figure out what its shape will inform us about its function, 00:18:37.15 and that's something that's an ongoing project in many laboratories in molecular biology. 00:18:43.01 So, this cartoon... and again I want to emphasize that 00:18:46.27 all the figures there and the colored blobs are more a part of our imagination at this point, 00:18:53.16 although, as I just showed you, we actually have real structures of some components of this pre-initiation complex. 00:19:02.25 This slide just emphasizes the point that there's a lot of information integration going on, 00:19:09.22 and that there is protein-protein and protein-nucleic acid interactions 00:19:15.06 that are critical for the regulatory functions of these large, macromolecular assemblies. 00:19:21.02 And this also reminds you that there are at least three separate classes of transcription factors 00:19:27.17 that are playing a key role in the regulation of genes: 00:19:30.28 the classical activator and repressor that are sequence-specific DNA-binding proteins, 00:19:35.26 like the Sp1 protein I talked to you about earlier, just shown here in pink; 00:19:41.11 there are the components of the core machinery, which are shown in yellow; 00:19:45.27 and then you have these things we call co-factors or co-activators, 00:19:49.12 that are integrating information between the activators and the core machinery. 00:19:56.06 So this kind of gives you a slightly better view of why there's this kind of complexity, 00:20:04.06 but it still doesn't really address all of the issues with respect to: 00:20:09.22 Why do you need 85 proteins to do this? 00:20:12.09 So, let me dig a little deeper into this. 00:20:15.00 So, first, let me just pose some of the questions that are really still largely unresolved in the field, 00:20:21.14 even though this is a pretty mature area of study; 00:20:24.10 we've been trying to address these issues for a couple of decades, 00:20:28.14 and it goes to show how difficult it is to really tease apart this complex molecular machinery. 00:20:34.23 And I should say that the complexity of this machinery is not unique 00:20:38.19 to the transcriptional apparatus. Many other biological processes are also dependent on 00:20:43.23 macromolecular machines that are very similar in complexity to this one. 00:20:49.04 So I think things that we learn about the transcriptional machinery could be applied in principle to many other machineries. 00:20:56.20 So, couple of interesting questions: 00:21:00.20 What are the transcriptional mechanisms that regulate complex cell types? 00:21:06.29 Because, after all, multicellular organisms evolved to having many, many different 00:21:13.12 cell types, so our bodies are made up of many different cell types, 00:21:18.23 which means that each cell's performing a different function. 00:21:22.02 Our hair follicle cells are producing hair, our red blood cells are 00:21:27.02 producing hemoglobin and doing something else, our skin cells are protecting us. 00:21:31.17 Each cell type is doing a different thing, so how does this happen, 00:21:36.01 how do we generate this diversity of cell types through the gene regulatory networks? 00:21:42.17 And then, knowing what we now know about the first level of complexity of the machinery 00:21:49.04 that's responsible for decoding this information, what more can we learn about the process of regulation now? 00:21:57.03 Particularly, what is the division of labor between the core machinery 00:22:03.24 (which binds to the promoter), the activators, and the co-activators? 00:22:08.22 So, what is their relationship, and what's their respective roles in defining cell type-specific gene expression? 00:22:16.19 That's really the last topic that I want to cover in this lecture. 00:22:21.06 So, let's review a few basic facts about individual cell types. 00:22:26.13 So, let's take two well-recognized cell types: fat cells and muscle cells. 00:22:33.12 Very different cells that perform very different functions, 00:22:36.27 but every cell in a particular organism has the same genetic information. 00:22:43.04 It has the same DNA, it has the same set of chromosomes. 00:22:46.08 That means that these two cells have to be using different parts of the information 00:22:52.09 from the genome to give it their distinct identities. 00:22:56.20 So, each cell must only express some subset of the genes, 00:23:03.21 and that particular subset would define the function of a fat cell versus a muscle cell. 00:23:10.11 And, so then the question becomes: 00:23:12.26 Okay, that makes sense, but how do you get there? 00:23:15.02 How do you get cell type-dependent differential gene expression patterns? 00:23:20.16 How do you turn on the right genes to make fat 00:23:22.27 versus keeping the muscle cell gene functions turned off, and vice versa? 00:23:29.06 So that is a fundamental question of trying to understand the process of cellular differentiation, 00:23:36.19 cell-specific function, and really, developmental biology. 00:23:41.09 Another set of interesting points to make is that, of the 20,000 to 30,000 genes 00:23:46.18 that a typical metazoan organism encodes, a pretty big chunk of it is devoted 00:23:54.03 to the very machinery that I'm talking about, in other words, the transcription factors. 00:23:59.04 So roughly somewhere between 5 and 10% of the entire coding capacity 00:24:04.02 of genes in a genome is devoted to encoding transcription factors. 00:24:10.11 So this is clearly a very important class of molecules. 00:24:13.02 So that means there are several thousand transcription factors. 00:24:16.28 But now if you start thinking about the many, many thousands of cell types and the behavior of different cells, 00:24:23.16 are a few thousand transcription factors, in and of themselves, enough to generate the diversity of function? 00:24:31.14 And this is where we have to start thinking about, 00:24:33.21 how do you create really large numbers of distinct transcriptional networks? 00:24:40.28 And they really are networks, as you'll see in a minute. 00:24:43.15 And one thing that became clear as we defined what genes look like and what a promoter as a transcriptional unit looks like, 00:24:51.22 we come to understand that the only way to create the kind of huge levels of diversity of distinct transcriptional 00:24:58.20 components and patterns, is to do it by combinatorial regulation. 00:25:03.12 And what do I mean by that? 00:25:04.27 So, one way to think about it is that you might only have ten cards, 00:25:09.25 but if you shuffle those ten cards and pick four at a time, 00:25:13.06 you can have many, many combinations. 00:25:15.11 So here's a perfect example of three different cell types, could be in the same organism, 00:25:20.11 and each of those symbols represents binding sites, 00:25:25.22 and then the little boxes and triangles above them represent the binding proteins. 00:25:32.18 And you can see that those three cell types might express these sets of genes in similar ways, 00:25:38.25 but they use different combinations of proteins to do it. 00:25:42.08 And this is really the notion of combinatorial mechanisms for gene regulation, 00:25:46.27 and we now know that that is indeed the way, at least in part, 00:25:51.05 that gives us the ability to create many different specific transcription patterns. 00:26:00.04 I have to now also tell you about another, I would say, defining, 00:26:04.23 unusual property of transcription in animal cells, 00:26:09.06 and this is a hard one sometimes to get your head around. 00:26:12.19 And that is that these different little units of DNA that specify the activity of a gene 00:26:18.20 don't have to be sitting, linearly and spatially, directly next to the gene that it's activating or repressing. 00:26:26.21 They can sit tens of thousands of base pairs away from the site. 00:26:32.04 So these we call long-distance enhancers or silencers, so they can both upregulate a gene... 00:26:39.01 in other words, make more of the gene or less or the gene. 00:26:41.27 And the thing that was so surprising was that the intervening DNA can be very, very long; 00:26:48.04 it can be thousands and maybe even millions of base pairs. 00:26:53.00 So how does this work? 00:26:53.28 How can something sit so far away actually influence transcription at a very remote site? 00:27:00.26 And this is one of the big conundrums that we still face in the field. 00:27:05.15 We have some models and we have some ideas that we can test, 00:27:08.09 and I'll end my lecture with a few speculations about that. 00:27:11.24 But clearly, we don't fully understand this so-called long-distance regulation, 00:27:16.27 which clearly is regulated by activators and repressors just like 00:27:21.09 the same players that we've been talking about, like the Sp1 molecule and other activators. 00:27:26.12 But yet, how they can reach across long distances of the chromosome to grab on to the core machinery to actually impart information 00:27:36.06 and to create the kind of specific regulatory events is still somewhat obscure. 00:27:43.28 So, another thing that I should say is that, 00:27:47.07 because of the combinatorial mechanisms of generating diversity was so dependent 00:27:54.18 on the distinct sets of sequence-specific DNA-binding proteins, 00:28:00.12 over the last two decades we've come to kind of a traditional model that the core machinery stays relatively invariant. 00:28:10.04 In fact, we kind of think of it as universal, because if you break open a nucleus of a very 00:28:15.11 simple organism like yeast, or you break open the nucleus of a human cell, 00:28:21.22 that machinery looks remarkably similar to each other. 00:28:24.26 And yet, their gene networks are very, very different, so we thought, 00:28:29.05 well, maybe it's all having to do with the sequence-specific DNA-binding proteins, 00:28:34.07 that will generate the diversity through combinatorial regulation. 00:28:38.25 And that's probably true; in fact, there's a lot of evidence to support that. 00:28:42.27 But it was only part of the story. 00:28:45.04 So, a kind of related question would be: 00:28:47.25 Are we really right in thinking that the core machinery is universal and invariant? 00:28:54.03 And that turns out to be an oversimplification. 00:28:57.06 So it turns out evolution didn't work that way. 00:29:02.26 And when we looked very carefully in the last few years, particularly at individual, 00:29:07.02 different, distinct cell types, let's say muscle versus fat, or neuron, or liver cell, 00:29:13.01 we certainly see differences in the activators, as we would expect, and indeed they are working in combinatorial fashion, 00:29:20.02 but they're not only working combinatorially with each other, 00:29:23.06 but they are combining in different combinations with the core machinery, which is itself variable. 00:29:29.23 And that was kind of a revelation that's really become more clear just in the last few years. 00:29:35.29 So, in addition to the sequence-specific binding proteins and their diversity, 00:29:41.10 there turns out to be a much greater degree of diversity in the core machinery, 00:29:46.09 the parts that we thought were invariant, than we ever imagined. 00:29:50.11 Now, once you realize that that's the case, 00:29:54.07 that opens up a whole other level of generating diversity that we didn't anticipate, 00:29:59.13 and that of course really allows multicellular organisms to diversify in unbelievable ways. 00:30:06.18 So, let's drill down finally a little bit at how did we find this out, and where are we going? 00:30:13.08 So now, unlike a few decades ago when we first began to study the process of transcription 00:30:20.02 and discovered all of this initial complexity, in those days we mainly worked on just a few different cell types. 00:30:28.13 But today, we have the ability technically to work with just about any cell type, 00:30:33.18 from the most complex, such as embryonic stem cells, 00:30:37.13 to perhaps the simple cell, like the skeletal muscle, and everything in between... 00:30:41.18 liver cells, neuronal cells, and so forth. 00:30:45.03 And this has really opened up our view of just how diverse, interesting, 00:30:51.17 and variable the transcriptional apparatus is, that is probably really necessary 00:30:56.18 from an evolutionary standpoint to drive the diversity of gene expression and cell types that we see. 00:31:04.10 The first hint that this core machinery that we thought was so invariant may not be so invariant, 00:31:10.14 came from studying the development of the skeletal muscle. 00:31:14.21 So when you go from a precursor cell called a myoblast, which looks like most every other mammalian cell, 00:31:21.13 with its standard, prototypic core machinery, and then when you look at it when that cell type differentiates, in other words, 00:31:29.16 specializes into a myotube (which will ultimately form skeletal muscle, which is the muscle around your large bones 00:31:37.03 that makes you be able to move), 00:31:39.16 it turns out that it not only shifts which transcriptional activators it uses, 00:31:45.05 but it also jettisons the prototypic core machinery and substitutes it with some modified versions of that core machinery, 00:31:53.21 which is shown down here in the purple and the bright blue. 00:31:57.29 So this was really a change in the paradigm of the way we're thinking about regulation, 00:32:03.08 and of course, this was just the first example. 00:32:07.08 One wanted to know if similar things were happening in other different cell types, and very quickly, 00:32:13.04 if you look at hepatocytes or liver cells, if you look at adipocytes or fat cells, 00:32:18.21 if you look at neuronal cells, and you compare what's going on in muscle, 00:32:23.07 in every case, one can find changes in the core machinery, either because a particular component 00:32:28.28 like one of the TBP-associated factors is highly upregulated (that means its concentration went way up, 00:32:35.15 when all the other ones went down), or some other permutation. 00:32:39.09 In other words, clearly, components of the so-called core machinery were variable from cell type to cell type, 00:32:46.20 and that really changed the way we thought about how regulation of multicellular organisms works. 00:32:54.28 At the same time that we were looking at these, 00:32:57.17 what we would call mature, terminally differentiated cell types, 00:33:01.20 we were also looking at perhaps one of the most interesting cell types that we could study, 00:33:06.18 particularly if we're interested in understanding the process of mammalian development, 00:33:11.23 and those are of course the embryonic stem cells. 00:33:14.14 These are those amazing cells that, when tickled with just the right chemicals or physiological signals, 00:33:21.08 can turn themselves into every cell type of an organism, maybe 10,000 different cell types. 00:33:31.06 So, this so-called pluripotency made these human and mouse embryonic stem cells very special for all kinds of reasons, 00:33:41.05 partly because they are amazing models to study this process of development and differentiation, 00:33:46.18 but partly because of biomedical possibilities for cell regeneration and therapeutics. 00:33:57.02 So we've studied this, and these are very, very new studies, 00:34:01.27 and I'll just very quickly touch on it. We really were curious, 00:34:05.23 how can these cells be so pluripotent? 00:34:09.08 That is, their capacity to turn into every other cell type seems so amazing, what is the mechanism, 00:34:15.07 what's the machinery that's going to allow these cells to be able to differentiate into every cell type in the body? 00:34:22.16 And so, we began to probe this. 00:34:24.27 In some cases, we did it by the genetic technology, which is we made 00:34:29.08 mutations in certain candidate regulatory factors and transcription factors, 00:34:34.10 and then asked, does that have a consequence on the development of different cell types? 00:34:41.10 In other cases, we used a standard biochemical complementation technology to figure out what's going on. 00:34:47.21 So, I'll finish with two quick stories. 00:34:51.06 So, using the genetic tools of knocking genes out and asking 00:34:55.02 what effect it has on differentiation and pluripotency, we discovered that 00:35:01.15 a component of the core machinery (or at least we used to think of it as being purely of the core machinery), 00:35:07.07 that is, one of the TBP-associated factors, particularly TAF3, 00:35:11.26 turns out to be extremely important for the regulation and 00:35:15.23 expression of genes that will ultimately define the so-called endoderm. 00:35:23.17 And that's true for both the so-called primitive endoderm and the definitive endoderm, 00:35:28.04 which ultimately will give rise to the placenta, the yolk sac, lungs, liver, 00:35:32.12 pancreas, intestines, and so forth. 00:35:34.17 At the same time, knocking out this TAF3 had the opposite effect on the other two major germ layers, 00:35:42.09 which are the mesoderm and the ectoderm. 00:35:44.13 So here was a really beautiful case of differential function of a transcription factor 00:35:50.28 that was not a standard sequence-specific binding protein. 00:35:55.09 This core machinery factor, which by the way, probably on its own doesn't even bind to DNA directly, 00:36:01.20 when you knock it out, you lose the ability to form endoderm, 00:36:05.22 but you elevate the probabilities of forming mesoderm and ectoderm. 00:36:09.18 In other words, the balance between these different cell types gets messed up, 00:36:13.20 and of course this will cause major difficulties for a developing embryo. 00:36:21.16 Even more interesting and intriguing, and this really goes to show the level of information that we still lack, 00:36:28.15 although TAF3 was originally defined both genetically 00:36:32.04 and biochemically as part of the TFIID core promoter recognition complex, 00:36:37.02 and it is absolutely true that that is the case, 00:36:40.04 it had another life that it led that we didn't know about. 00:36:43.17 So TAF3, it turns out, it doesn't have to strictly function as part of this large multi-subunit core promoter complex, 00:36:52.17 but it can also do other jobs, and in this case, it pairs up, or partners up, 00:36:57.06 with a different transcription factor called CTCF (doesn't really matter what the name is) 00:37:03.02 and now it does its job in a completely different way. 00:37:06.16 And in fact, the most recent experiments suggest that TAF3 and CTCF get together 00:37:12.03 to partly allow that amazing property of long-distance regulation. 00:37:17.25 So, regulators bound at thousands of base pairs away from the site of activity 00:37:24.21 can be brought together in three-dimensional space by what's known as "DNA looping," 00:37:31.04 and it turns out that TAF3 is involved in this DNA looping, together with a whole bunch of other proteins, 00:37:38.00 whose relationship to TAF3 is still not entirely clear. 00:37:45.02 And we find it particularly intriguing and exciting that this type of long-distance function is being 00:37:50.06 carried by a TAF and in the context of embryonic stem cell differentiation potential to form endoderm. 00:37:57.19 So this is a very, very new type of way of thinking about the core transcription factors. 00:38:06.11 Likewise, when we looked at the embryonic stem cell transcriptional circuitry and asked, 00:38:13.20 what other transcriptional co-regulators, or regulators and co-factors, 00:38:17.18 are necessary to allow this so-called pluripotency program? 00:38:23.01 This amazing ability of these cells to be able to differentiate into every other cell type, 00:38:27.03 how does that happen? What is allowing that to happen 00:38:30.23 in this particular cell type, and not in other cell types? 00:38:33.23 And again, using the biochemical complementation technology, 00:38:38.00 we recently were able to identify a new co-factor complex, again a multi-subunit complex, 00:38:45.15 called the SCC, or "stem cell co-factor." 00:38:49.25 And remarkably, this SCC-B turns out to be a well-known protein that again had a different lifestyle in other cell types. 00:38:59.18 It's a protein complex that had previously been described as XPC, 00:39:03.29 which stands for "Xeroderma pigmentosum, complex C," 00:39:08.22 which means that it's involved in DNA repair. 00:39:11.07 So up until now, we thought XPC was only functioning as a DNA repair complex, 00:39:16.14 and now we know that it's doing something quite different, 00:39:19.12 but only in the context of ES cells, which is to form a co-factor complex that will potentiate 00:39:25.27 the activity of two critical transcriptional activators, Oct4 and Sox2, 00:39:31.21 which define the pluripotent, self-renewing state of ES cells. 00:39:36.20 So these are just two examples of sort of what we're learning about, 00:39:41.21 the continuing saga of how transcriptional machinery evolved and works in animal cells. 00:39:50.26 And I'll finish with this last model slide, 00:39:54.00 which just simply reiterates what I just said: 00:39:57.00 We have to keep in mind that, in generating large sets of combinatorial, specific gene networks, 00:40:07.07 we have to use the diversity not only of sequence-specific DNA-binding proteins, 00:40:11.29 but we more and more see examples that components of the previously thought to be invariant core machinery 00:40:19.14 are an integral part of diversifying the combinatorial regulation of gene expression. 00:40:26.18 And this of course opens up many new possibilities, 00:40:30.08 and I suspect that there are many question marks yet about what exactly each of these components 00:40:36.27 is doing to drive complex regulation that gives rise to complexity like human beings, 00:40:44.12 the human brain, all the physiology that goes on. 00:40:48.03 And of course, as we understand these mechanisms in greater detail, 00:40:51.23 I think we have a much better chance of tackling the problems of human disease and diseases of other organisms. 00:40:59.10 Because ultimately, we have to understand the molecular basis of disease, 00:41:03.21 and I think a big part of that is understanding the mechanisms of gene regulation.