Session 8: Human Evolution
Transcript of Part 3: African Genomics: Natural Selection
00:00:07.17 For the last part of my lecture series, 00:00:10.11 I wanna talk about examples of natural selections in humans, 00:00:14.29 and the two particular examples 00:00:17.01 that I'm going to be talking about 00:00:19.00 are the evolution or lactose tolerance in east Africa, 00:00:22.10 and of pygmy short stature. 00:00:25.04 So if we're going to be talking about natural selection, 00:00:27.11 we have to first of course 00:00:28.29 acknowledge Charles Darwin, 00:00:31.12 who came up with the theory of natural selection. 00:00:36.10 In fact, to quote from Darwin, he said, 00:00:39.13 "This preservation of favourable variations 00:00:42.03 and the rejection of injurious variations, 00:00:44.21 I call Natural Selection. 00:00:47.06 Variations neither useful nor injurious 00:00:49.27 would not be affected by natural selection, 00:00:52.18 and would be left a fluctuating element, 00:00:55.01 as perhaps we see in the species called polymorphic." 00:00:58.10 And that was from his classic book 00:01:00.06 On The Origin of Species, 00:01:01.24 published in 1859, 00:01:03.25 and you might recognize from our first lecture, 00:01:07.03 that this is really talking about genetic drift, 00:01:09.27 random fluctuations. 00:01:13.13 However, part of the evolutionary change that we see 00:01:18.12 is not just going to be due to random genetic drift, 00:01:21.03 it's also going to be due to natural selection. 00:01:24.18 And so, according to that theory, 00:01:27.10 natural variation exists and is heritable, 00:01:30.02 more organisms are born than can survive, 00:01:32.11 and therefore organisms best suited to the environment 00:01:35.07 survive more often, 00:01:36.25 and slight differences can accumulate in a species over time. 00:01:40.24 So this is the idea of gradual evolution of a species 00:01:43.27 by natural selection. 00:01:45.18 And this is Huxley, 00:01:47.03 who was also known as Darwin's bulldog 00:01:50.11 because he was the big proponent of his theory, 00:01:52.17 and he said, 00:01:54.05 "How extremely stupid not to have thought of that!" 00:01:57.12 So when Darwin first came up with his theory of natural selection, 00:02:01.08 there was really no concept of genetics 00:02:04.12 as we know it today. 00:02:06.02 In fact, it wasn't until the late 1800s 00:02:08.12 that Mendel proposed his theory of genetics. 00:02:13.02 So in the 1930s and 1940s 00:02:15.22 there was sort of a synthesis of natural selection 00:02:19.09 and genetics and mathematics, 00:02:22.16 population genetics, 00:02:23.25 and at that time it was proposed that genetic variation in populations 00:02:27.05 arises by chance through mutation and recombination, 00:02:31.15 that evolution consists primarily of changes in the 00:02:34.12 frequencies of alleles between one generation and another, 00:02:37.25 largely as a result of genetic drift, 00:02:40.24 gene flow, 00:02:42.05 and natural selection. 00:02:43.27 And that speciation occurs gradually when populations 00:02:46.00 are reproductively isolated, for example, 00:02:48.20 by geographic barriers. 00:02:52.14 And so if we look at this timeline, 00:02:54.12 starting with the Origin of Species, 00:02:56.21 and then Mendelian inheritance 00:02:59.04 is actually rediscovered in 1900, 00:03:01.19 it was first proposed in the late 1880s, 00:03:04.01 but very few people knew about it at that time. 00:03:06.27 And then in the early 1900s 00:03:09.09 we have the theoretical foundations of population genetics 00:03:12.21 and then, as I mentioned, 00:03:14.09 the modern synthesis in the 30s. 00:03:16.19 And then in the 70s we have Kimura's theory of neutral evolution, 00:03:21.19 which was proposing that most changes and speciation events 00:03:25.07 are simply due to random genetic drift 00:03:27.22 and to new mutation events. 00:03:29.20 And I think that today we would say 00:03:31.14 it's a combination of all of the above. 00:03:33.19 There's certainly a lot of genetic drift that occurs, 00:03:36.02 but we know that natural selection 00:03:37.29 is having a very important influence 00:03:40.26 on the variation that we see 00:03:43.05 in terms of phenotypic variation and even disease susceptibility. 00:03:48.04 So let's look what happens 00:03:49.08 when a neutral mutation occurs in a population, 00:03:52.07 as indicated by this individual in green. 00:03:55.25 Let's look what happens as we proceed forward in generations, 00:03:59.08 and you can see there's not too many changes 00:04:01.21 in allele frequency. 00:04:03.29 But what happens when we have a beneficial mutation, 00:04:09.05 which means that it increases the fitness of the individual, 00:04:12.20 meaning that they're more likely to produce children, 00:04:16.12 and their children are more likely to produce more children, 00:04:19.00 and so on and so forth. 00:04:21.15 And so we can see that each generation, 00:04:24.04 this beneficial mutation is going to spread, 00:04:27.22 until eventually it may be nearly fixed 00:04:32.06 in the population. 00:04:34.17 So I want to tell you about some of our studies 00:04:37.20 focused in African populations 00:04:39.14 in which we're trying to identify 00:04:41.02 genetic signatures of natural selection, 00:04:43.19 and regions of the genome that are targets of natural selection. 00:04:47.29 And this is important 00:04:49.22 because it's thought that mutations associated with diseases 00:04:52.16 in modern populations, 00:04:54.13 like hypertension 00:04:56.03 , diabetes, 00:04:57.08 obesity, 00:04:58.10 and asthma, 00:04:59.08 may have been selectively advantageous or adaptive 00:05:01.23 in past hunter-gatherer environments. 00:05:04.04 So if we can identify these regions 00:05:06.25 that are targets of selection, or actual variable sites 00:05:09.18 that are targets of selection, 00:05:11.16 those may be functionally important 00:05:13.14 and may give us a clue about disease risk. 00:05:16.11 So here I'm showing you a few of the populations 00:05:18.12 that we've studied in Africa, 00:05:20.23 and we have people who are living at very different climates, 00:05:23.15 high altitude, low altitude, 00:05:26.03 savannah, and tropical environments, for example. 00:05:30.10 We have people who have very different diets, 00:05:32.22 so agriculturalists, 00:05:34.08 hunter-gatherers, 00:05:35.23 or pastoralists. 00:05:37.14 And they have very different infectious disease exposures, 00:05:40.05 so they've likely undergone local adaptation 00:05:42.13 to different environments. 00:05:45.25 And I'm going to, as I mentioned, 00:05:47.16 tell you about two examples today. 00:05:49.15 The first one is the evolution of lactose tolerance 00:05:51.29 in east African pastoralist populations. 00:05:57.07 So, the ability to digest the sugar lactose, 00:06:00.21 which is quite common in milk, 00:06:03.07 is due to an enzyme called lactase-phlorizine hydrolase, 00:06:07.15 or known as lactase for short. 00:06:09.28 And lactase is expressed specifically 00:06:13.01 in the brush border cells of the small intestine, 00:06:16.25 and in individuals who maintain high levels of this enzyme 00:06:20.29 as adults, 00:06:22.23 they're able to break down the complex sugar lactose 00:06:26.16 into glucose and galactose, 00:06:29.14 which is rapidly taken up into the bloodstream. 00:06:32.23 However, 00:06:35.19 most mammals, and most humans, 00:06:38.19 shut down lactase activity 00:06:40.23 shortly after weaning. 00:06:42.28 So, as adults, they do not have an active form of this enzyme. 00:06:46.24 And what's going to happen is 00:06:48.19 they're not going to be able to break down that complex sugar. 00:06:51.26 It's going to go down into the lower gut, 00:06:54.13 it's going to be attacked by bacteria, 00:06:56.25 and you're going to have severe intestinal distress. 00:07:01.00 Now, it has been noted for many years by anthropologists 00:07:04.27 that there is a very strong correlation 00:07:06.27 between the lactose tolerance trait, 00:07:09.19 or you could think of it also as the lactase persistence trait, 00:07:13.26 because there's persistence of the enzyme activity as adults. 00:07:18.15 And they've seen a strong correlation 00:07:20.20 between the prevalence of that trait 00:07:23.14 with populations who traditionally practice cattle domestication 00:07:28.04 and dairying. 00:07:30.05 So for example, this trait is most common in northern Europe, 00:07:33.19 it decreases in frequency as one moves 00:07:36.23 into southern Europe 00:07:38.29 and into the Middle East. 00:07:40.24 It's very uncommon in eastern Asia 00:07:43.26 and in the Americas, 00:07:46.13 and it's uncommon in western Africa, 00:07:48.26 which is one of the reasons that we see high levels 00:07:51.17 of lactose intolerance in African Americans, for example. 00:07:55.24 But in regions of Africa where there's a high prevalence 00:07:59.04 of cattle domestication, pastoralism, and dairying, 00:08:03.14 we see a high prevalence of this trait. 00:08:07.15 So, in 2002, 00:08:10.18 there was an elegant study done 00:08:12.20 by Leena Peltonen's group in Finland, 00:08:14.28 in which they identified a genetic mutation 00:08:17.08 that regulates lactose tolerance in Europeans. 00:08:20.20 And it was located near the... 00:08:23.25 upstream of the lactase gene. 00:08:26.01 When we sequenced that region in east African pastoralists, 00:08:29.04 they didn't have it, 00:08:31.10 so we knew they must have something else. 00:08:33.13 So in order to identify those mutations, 00:08:35.21 we did something that's called a lactose tolerance test. 00:08:38.29 So, basically what we do is 00:08:42.11 we give people the sugar lactose in a powdered form, 00:08:46.20 we add water, and it basically tastes like orange Kool-Aid, 00:08:51.09 and then we have to line people up 00:08:54.17 and have them drink the lactose at the same time. 00:08:57.27 This is a group of Maasai women from Tanzania. 00:09:03.28 This is a group of pastoralists from southern Ethiopia. 00:09:11.23 And then we can use a standard diabetes monitoring kit, 00:09:16.03 and what we can do is to measure the blood glucose, 00:09:19.29 starting at baseline before they drink the lactose, 00:09:23.29 and then every 20 minutes we're gonna measure this, 00:09:27.06 over a period of about an hour. 00:09:30.02 And then we're gonna look at the maximum rise 00:09:32.25 in blood glucose. 00:09:35.15 If individuals have a rise 00:09:37.09 that is greater than 1.7 millimolar (mM) 00:09:39.20 we consider them to be lactose tolerant, 00:09:42.20 or to have the lactase persistent trait, 00:09:45.03 shown in light blue. 00:09:47.07 And if they have a rise that is less than 1.1 mM, 00:09:51.12 they're considered to be intolerant, 00:09:53.25 shown in dark blue. 00:09:55.18 So, we measured this trait 00:09:57.12 in nearly 500 individuals 00:09:59.17 from Tanzania, Kenya, and the Sudan, 00:10:02.00 and then we looked for association 00:10:04.10 with genetic variation that we identified 00:10:06.21 by resequencing the region 00:10:08.28 where the European variant had been identified. 00:10:13.13 And in doing so we identified 00:10:15.12 three novel genetic polymorphisms 00:10:18.21 that are associated with the lactose tolerance trait in east Africa, 00:10:22.17 and those are shown here by the boxes. 00:10:26.29 The most common was this one at position 14010, 00:10:31.06 but we also saw those others 00:10:32.24 at positions 13915 and 13907, 00:10:36.03 located roughly 14,000 basepairs 00:10:38.25 upstream of the lactase gene 00:10:41.18 which is located on chromosome 2. 00:10:44.07 Now, one of the really interesting things about this is that, 00:10:48.11 one, these regulatory mutations were pretty far away, 00:10:51.28 about 14,000 basepairs from the gene, 00:10:54.26 and they were located in an intron 00:10:57.25 in a non-coding region of a neighboring gene called MCM6. 00:11:03.00 So this is demonstrating that 00:11:04.25 functionally important variation 00:11:07.13 can actually be located in non-coding regions, 00:11:10.21 and we were able to show, 00:11:13.13 using in vitro cell line studies, 00:11:16.20 that these variants that are derived, 00:11:20.19 shown in the different colors here, 00:11:23.22 that they regulate expression 00:11:26.15 of the lactase gene using the lactase promoter. 00:11:31.10 Now, they're located very close to the mutation 00:11:35.01 associated with lactose tolerance in Europeans, 00:11:38.20 located at position 13910, 00:11:41.14 but they arose independently 00:11:43.17 due to a process called convergent evolution, 00:11:46.13 and probably due to a very strong 00:11:48.27 selective force to be able to drink milk that contains lactose, 00:11:56.15 in these different regions of the world. 00:12:00.27 What's also interesting 00:12:02.19 is that the variants that we identified 00:12:04.16 have a very distinct geographic distribution. 00:12:07.09 So the one that we found that was most common in our study 00:12:10.06 was at position 14010, 00:12:12.08 and we can see that it is pretty localized 00:12:15.01 to east Africa, to Tanzania and Kenya, 00:12:17.26 and that's the most likely site of origin of that mutation. 00:12:21.11 Interestingly, we also see it a bit in south Africa, 00:12:26.01 probably reflecting migration of pastoralists 00:12:28.29 from east Africa into that region. 00:12:32.16 The variant position at 13915 00:12:35.08 appears to have originated in the Middle East, 00:12:37.21 and we could see that it was introduced into northeast Africa, 00:12:40.17 probably by migration. 00:12:43.10 And then the variant at position 13907 00:12:46.26 likely arose in northeast Africa. 00:12:49.21 But again, one of the important take-home points is that 00:12:53.02 we have a functionally important variant 00:12:55.07 that's occurring at high frequency, sometimes as high as 40%, 00:12:59.12 and it's very geographically restricted, 00:13:02.22 and there are likely to be other mutations like that, 00:13:05.06 some of which may have implications for disease susceptibility, 00:13:09.11 again emphasizing the importance 00:13:11.23 to look amongst ethnically diverse Africans. 00:13:16.29 So the next thing we wanted to do 00:13:19.21 was to look for a signature of positive selection, 00:13:23.17 and this is the method in which we can do that. 00:13:27.21 So imagine, here in red, 00:13:30.25 imagine that this is a new mutation that has occurred, say, 00:13:34.15 one of the mutations associated with lactose tolerance. 00:13:38.00 And it's adaptive, 00:13:39.10 meaning that it increases the fitness of individuals who have it, 00:13:43.25 meaning that they're more likely to have children, 00:13:45.27 and their children are more likely to have children, 00:13:47.22 and so on. 00:13:49.28 And so it's going to increase in frequency 00:13:52.24 in the population, 00:13:54.29 and it's going to drag with it 00:13:57.15 the neighboring variants nearby. 00:14:00.03 So, you can see that when it originated, it had... 00:14:02.29 it was on a chromosome with a green variant 00:14:05.18 and a black variant. 00:14:08.03 And now these got dragged along to high frequency, 00:14:11.04 through a process known as hitchhiking. 00:14:14.07 Now, if this had gone to fixation, 00:14:17.12 meaning that everybody has it, 00:14:19.00 we would have called it a full selective sweep. 00:14:21.20 In this case, it hasn't quite reached a full selective sweep, 00:14:26.15 so we call it a partial sweep. 00:14:29.24 Now, that could just mean that 00:14:31.17 there hasn't been time for it to go to a full sweep, 00:14:33.06 or it could be that for some reason 00:14:35.17 there may be some negative aspects of having it, 00:14:38.02 and there's a reason that both variants are maintained in the population. 00:14:43.17 Now, after the sweep occurs, 00:14:45.19 you're going to have new mutation events 00:14:47.26 and new recombination events 00:14:49.21 shuffling up the variants 00:14:52.04 that are linked to the mutation that's adaptive. 00:14:56.12 And so that will decrease the association 00:15:01.00 observed between the mutation and the flanking variation. 00:15:05.00 And in fact, 00:15:06.15 if we have an estimate of the recombination rate, 00:15:08.27 we can use computational methods 00:15:10.24 to estimate how old this mutation is. 00:15:14.18 And that's exactly what we did here. 00:15:17.12 So shown on top 00:15:19.28 is an example from the most common mutation 00:15:22.27 that we found associated with lactose tolerance, 00:15:25.03 at position 14010. 00:15:27.18 Individuals who have the C variant 00:15:29.26 are able to digest milk, 00:15:31.22 and individuals who are homozygous are shown as red. 00:15:35.28 And what we did is we genotyped markers 00:15:38.28 going a distance of about 3 million nucleotides, 00:15:42.26 and what we would do is that if someone is homozygous, 00:15:46.20 starting at the lactose tolerance mutation, 00:15:49.02 and then we go to the next mutation. 00:15:51.02 If they're homozygous, 00:15:53.00 then we continue going. 00:15:55.13 If they underwent a recombination, 00:15:57.05 we stop the line. 00:15:59.13 And what we can basically see is that homozygosity 00:16:02.28 extends about 2 million basepairs 00:16:06.01 on chromosomes that have the lactose tolerance mutation. 00:16:09.15 But if we look at chromosomes that have the ancestral mutation, 00:16:14.00 they have almost no extended haplotype homozygosity. 00:16:19.07 And so this is a classic signature of a selective sweep. 00:16:22.25 It means that this variant 00:16:24.16 was under very strong positive selection 00:16:28.14 and it rapidly increased in frequency in the population, 00:16:32.04 dragging with it the neighboring variation. 00:16:38.16 Now, here I'm showing the European variant, 00:16:41.17 in this case the T variant 00:16:43.20 is associated with lactose tolerance, 00:16:45.27 and it shows a very similar pattern. 00:16:50.22 So using computational approaches, 00:16:52.27 we were able to estimate the age of the African mutation 00:16:57.22 to be somewhere between about 3,000-7,000 years of age. 00:17:01.28 These are the populations 00:17:03.25 that had the oldest age estimates, 00:17:06.08 and they include individuals 00:17:08.07 who speak Cushitic languages. 00:17:10.04 They came from Ethiopia, 00:17:12.10 and they practiced agro-pastoralism. 00:17:15.01 They came into Kenya and Tanzania 00:17:17.16 within the past 5,000 years. 00:17:20.23 And then we saw it at very high prevalence 00:17:23.26 and an old age estimate in Nilo-Saharan-speaking groups, 00:17:27.11 and these would include, for example, the Maasai. 00:17:30.09 Now, they came into the region more recently, 00:17:32.17 from southern Sudan, 00:17:34.08 within the past 3,000 years, so if I were to guess, 00:17:37.05 I would think perhaps this mutation 00:17:39.04 arose in the Cushitic speaking populations. 00:17:42.03 But irregardless, it quickly, rapidly spread 00:17:45.07 to all of the populations in the area 00:17:47.21 because it was so selectively advantageous 00:17:51.22 and adaptive to have this mutation. 00:17:55.03 Now, because we see the correlation 00:17:59.18 between the practice of cattle domestication and pastoralism 00:18:04.11 and the rise in this mutations, 00:18:06.16 this is a really excellent example 00:18:08.22 of gene-culture co-evolution. 00:18:12.03 And in fact, what's really interesting is 00:18:15.01 that the date estimates that we came up with correlate really well 00:18:18.25 with the archaeological data, 00:18:20.17 which shows that cattle domestication 00:18:22.14 arose in the Middle East or north Africa 00:18:27.04 somewhere between 8,000-10,000 years ago, 00:18:29.26 and that corresponds with the age estimate for the European mutation, 00:18:33.18 which we inferred to be about 9,000 years old. 00:18:37.25 But cattle domestication was not introduced 00:18:40.25 south of the Saharan desert 00:18:44.20 until roughly 5,000 or 5,500 years ago, 00:18:48.21 correlating very well with the age estimate 00:18:52.03 for the mutation we found in eastern Africa. 00:18:54.24 And then it was introduced 00:18:56.13 much more recently into southern Africa. 00:19:00.12 But one could argue that perhaps Mendelian traits like lactose tolerance, 00:19:05.04 which are regulated by a single locus or gene of major effect, 00:19:10.23 are in a sense the low hanging fruit; 00:19:12.20 they're the easiest to identify. 00:19:15.04 So one thing that my lab is interesting in doing 00:19:17.10 is looking at more complex traits, 00:19:19.23 and perhaps one of the most classic complex traits is height. 00:19:23.28 So, height is highly heritable, 00:19:26.19 genome wide association studies in tens of thousands of Europeans 00:19:30.12 have identified hundreds of loci, 00:19:33.06 each of very small effect, 00:19:35.06 and explaining only a very small proportion of the variation in height. 00:19:39.20 Now, interestingly, most of these are not part of 00:19:42.22 the growth hormone/IGF1 pathway, 00:19:45.07 which we know plays a very important role in idiopathic short stature, 00:19:49.16 for example. 00:19:53.05 Now, in Africa, we see some of the broadest distributions, 00:19:56.21 or ranges in height, 00:19:59.02 ranging from the very short statured Pygmies in central Africa, 00:20:03.28 and then we see some of the tallest individuals 00:20:07.13 in the Sudan and in eastern Africa. 00:20:10.23 And it's thought that these differences 00:20:12.14 may be partly due to adaptation 00:20:14.29 to different environments. 00:20:16.27 So what I want to tell you today is about 00:20:18.19 our genetic studies of short stature 00:20:22.01 in Pygmy populations from central Africa. 00:20:25.14 And, for you to fully understand and appreciate the work we've done, 00:20:29.17 I think I should first tell you a little bit about 00:20:32.07 how we went about collecting these samples 00:20:34.07 and how challenging it could be. 00:20:35.25 So, this is... 00:20:37.18 to get to one of the groups that we studied in Cameroon, 00:20:39.27 you have to cross this river, 00:20:41.28 and you have a person who has a ferry, 00:20:44.05 he's actually using a hand crank here 00:20:47.13 to get us across. 00:20:50.12 And I guess I'm very fortunate 00:20:52.19 because as a woman, I was able to get shade, 00:20:54.15 but not everybody was that lucky. 00:20:56.28 And here are some other hazards that we run into, 00:20:59.15 but I'm smiling because the head is cut off of this snake. 00:21:03.01 But I actually have to give credit to Dr. Alain Froment, 00:21:06.24 who has been studying the Pygmy populations in Cameroon 00:21:09.16 for greater than 30 years, 00:21:11.20 and he did the majority of the sample collection 00:21:14.01 in this case. 00:21:16.21 So, the genetic basis of short stature in Pygmies 00:21:19.26 is a question that's been of tremendous interest 00:21:22.08 to endocrinologists and human geneticists alike 00:21:25.07 for most than 50 years. 00:21:27.08 The particular populations that we studied 00:21:29.25 are located in Cameroon, three different groups from Cameroon, 00:21:34.27 who mean male height is 152 cm. 00:21:40.11 And they live in very close connection and interaction 00:21:44.29 with neighboring populations who speak Bantu languages 00:21:47.29 and practice agriculture, 00:21:50.06 and their mean male height is 170 cm, 00:21:54.04 so that's quite a difference between the two. 00:21:58.16 So, the Pygmy short statured phenotype in humans 00:22:01.26 has arisen independently in different global populations. 00:22:05.12 Typically, these are populations 00:22:07.03 that live in tropical environments, 00:22:09.14 so there have been a number of hypotheses 00:22:11.11 about why this trait might be adaptive. 00:22:14.28 And these include thermoregulation, 00:22:19.04 limited food resources, 00:22:21.19 locomotion - that it may be easier to move 00:22:23.28 in a dense tropical environment if you're short, 00:22:26.18 and more recently there's a theory 00:22:30.10 that this is due to a life-history tradeoff, 00:22:32.10 and I'm going to focus on that theory. 00:22:35.08 And that has to do with the fact that 00:22:37.14 Pygmies have a remarkably short lifespan. 00:22:40.11 Their chance of living to age 15 00:22:42.06 is only about 40%, 00:22:44.18 and if they make it to age 15, 00:22:46.23 the expected lifespan is only around 25 years of age. 00:22:50.01 Now, that is due largely to very high infectious disease burden 00:22:54.05 and a very challenging life in dense tropical forests. 00:22:59.24 Now, what the study showed is that 00:23:02.18 Pygmies appear to be reaching reproduction... 00:23:05.25 they appear to be reproducing and reaching puberty 00:23:08.09 at a significantly earlier age 00:23:11.01 than other Africans. 00:23:13.13 And the growth trajectory in Pygmies 00:23:14.28 appears to be similar to other populations until the point of puberty, 00:23:19.21 and then they lack the adolescent growth spurt. 00:23:22.15 So this may be some sort of a tradeoff: 00:23:24.18 there's selection to reproduce earlier 00:23:26.22 because they're dying very young, 00:23:28.22 but that may be a tradeoff, 00:23:30.24 in that they're not undergoing the adolescent growth spurt. 00:23:35.20 Now, there have been only a handful 00:23:37.28 of physiologic and metabolic studies in Pygmies, 00:23:42.00 but nearly all of these are pointing towards 00:23:44.18 disruptions of the growth hormone/IGF1 pathway, 00:23:47.19 so this is in contrast to what we're seeing in European populations. 00:23:52.05 However, there's been quite a bit of dispute of 00:23:54.28 where along this pathway these disruptions are occurring. 00:24:00.04 So, in order to try to address these questions, 00:24:03.06 we genotyped one million single nucleotide polymorphisms 00:24:08.06 in 67 pygmy individuals 00:24:10.23 and 58 of the neighboring Bantu individuals. 00:24:14.14 And here we can see a plot, 00:24:17.10 similar to what I've shown you before, 00:24:19.09 based on structure analysis. 00:24:21.09 And to remind you, 00:24:23.02 this is composed of a series of lines, 00:24:24.23 and each line represents a person, 00:24:26.16 and they can have ancestry 00:24:28.08 from different ancestral populations, 00:24:31.01 represented by the different colors. 00:24:33.04 So here in orange 00:24:34.29 are individuals who speak the Bantu language 00:24:38.00 and practice agriculture, 00:24:40.08 and in dark green are individuals who self-identify as Pygmies. 00:24:44.19 And what you can see is that there's been 00:24:46.22 a lot of admixture between the Pygmies 00:24:49.29 and the neighboring Bantu people. 00:24:52.03 Now, interestingly, this tends to be unidirectional, 00:24:54.28 and it tends to be gene flow between males 00:24:57.27 from the Bantu population 00:25:00.03 with females of the Pygmy population. 00:25:02.22 This is largely due to socioeconomic factors. 00:25:06.23 Now, when we look at a correlation 00:25:08.26 between ancestry and height, 00:25:11.03 we observed a very strong and significant positive correlation. 00:25:15.04 So, we can see that Pygmies who have more of the Bantu ancestry 00:25:19.23 tend to be taller. 00:25:21.19 And, so this is showing 00:25:22.25 that there's a strong genetic component to this trait. 00:25:26.17 We've also worked with collaborators 00:25:28.11 to develop methods 00:25:30.19 to infer tracts of Pygmy and Bantu ancestry 00:25:35.11 across the chromosome. 00:25:36.29 So here, these are the different chromosomes, 00:25:38.18 starting with chromosome 1 00:25:40.05 and going up to chromosome 22, 00:25:42.25 and here I'm showing you an example from chromosome 3. 00:25:46.04 And in blue is showing tracts of the genome 00:25:49.03 that are Pygmy ancestry, 00:25:50.24 and in red are tracts of the genome that are Bantu ancestry, 00:25:54.23 and what we tend to see are very, very short tracts of Bantu ancestry. 00:25:58.28 And that's reflected in the fact that admixture 00:26:01.08 has been occurring over thousands of years. 00:26:06.11 Now, the next question that we wanted to address 00:26:08.17 is how do the genomes of the Pygmy hunter-gatherers 00:26:12.04 differ from the genomes of the Bantu agriculturalists 00:26:17.00 and from other groups, such as the Maasai pastoralists 00:26:20.28 from east Africa. 00:26:22.28 And to do that, 00:26:25.03 we use a number of scans of natural selection 00:26:27.28 across the genome. 00:26:29.29 Without getting into detail about the methods, 00:26:32.26 I'll just point out that you can see by the different colors here 00:26:37.00 across the different chromosomes, 00:26:39.00 here's chromosome 22 and going down to chromosome 1, 00:26:42.04 that we found a number of regions of the genome 00:26:44.22 that are targets of selection. 00:26:47.05 But there was one region in particular, 00:26:49.26 on chromosome 3, 00:26:52.04 where we saw a cluster of targets of natural selection. 00:26:57.01 And this was over about a 15 million basepair region. 00:27:01.14 Now, given our small sample size, 00:27:03.20 we have very little power 00:27:05.15 to detect a genome-wide association. 00:27:09.04 And so what we did is, 00:27:10.26 under the hypothesis that this is an adaptive trait, 00:27:13.17 we just focused on the regions of the genome 00:27:16.07 that are targets of selection, shown here, 00:27:19.11 and then we looked for an association with height. 00:27:22.10 And one of the strongest, most significant associations 00:27:25.09 was exactly in that same 15 million basepair region 00:27:29.19 of chromosome 3. 00:27:31.23 And indeed, it encompassed several genes, 00:27:34.15 one of which is DOCK3, 00:27:36.18 which has been shown to be associated with height 00:27:39.09 in non-African populations, 00:27:41.08 so we replicated that finding. 00:27:43.20 But nearby was another gene called CISH, 00:27:47.09 which is a member of the cytokine signaling family, 00:27:50.10 plays a very important role in regulating 00:27:52.28 IL-2 cytokine signaling pathway, 00:27:56.18 and studies have shown that it's associated 00:27:58.26 with resistance to a number of infectious diseases 00:28:01.18 in Africa. 00:28:04.01 Now, interestingly, 00:28:05.29 CISH also directly inhibits 00:28:07.14 human growth hormone receptor action 00:28:10.06 by blocking the STAT5 phosphorylation pathway. 00:28:13.15 And so we know that studies in mice 00:28:15.17 show that when this gene is overexpressed, 00:28:18.06 the mice are short statured. 00:28:20.23 Now, this led me to the hypothesis that, 00:28:24.14 could it be that there could actually be selection 00:28:26.19 for immune function 00:28:28.11 that is indirectly resulting 00:28:30.05 in short stature in Pygmies, 00:28:32.05 because that gene plays an important role in both. 00:28:35.29 And we need to do further functional studies, 00:28:38.20 and look at differences in gene expression 00:28:40.13 to test this hypothesis. 00:28:44.04 The last study I wanna tell you about is a study 00:28:46.20 in which we sequenced the entire genomes, 00:28:49.15 at high coverage, 00:28:51.07 of 15 African hunter-gatherers, 00:28:53.22 including 5 Pygmies, 00:28:55.28 5 Hadza, 00:28:57.10 and 5 Sandawe. 00:28:59.26 We identified over 13 million variants, 00:29:02.29 3 million of which are completely novel; 00:29:05.29 they have never previously been identified. 00:29:08.13 And that's just from 15 individuals, 00:29:10.14 so you can imagine how much variation is out there. 00:29:13.16 Many of these are novel variants... 00:29:15.27 many of these novel variants are in known regulatory sites. 00:29:21.04 So now, combining the two studies, 00:29:24.08 we wanted to ask the question, 00:29:26.03 which pathways are enriched for genes near targets of selection? 00:29:29.16 And these enriched pathways 00:29:31.25 include genes involved in neuro-endocrine signaling, 00:29:35.01 reproduction, 00:29:36.06 metabolism, 00:29:37.11 and immune function, 00:29:38.22 and interestingly, based on the whole genome sequencing study, 00:29:42.08 we saw an enrichment for genes 00:29:44.06 that play a role in pituitary function in Pygmies, 00:29:47.13 including follicle-stimulating hormone receptor, 00:29:50.13 growth hormone receptor, 00:29:52.11 HESX1, which I'll tell you more about in a moment, 00:29:55.11 and thyrotropin-releasing hormone receptor. 00:29:58.15 In fact, TRHR was one of the biggest hits 00:30:02.13 that we saw in terms of these studies of selection. 00:30:05.17 And what's interesting is that this gene 00:30:08.22 plays an important role in the hypothalamic-pituitary-thyroid axis, 00:30:12.28 influencing a number of traits that could potentially 00:30:15.14 be of adaptive significance in Pygmies. 00:30:18.26 And also of interest was that anthropologists 00:30:21.18 have noted that there is a significant difference 00:30:24.23 in the prevalence of Goiter 00:30:27.00 among Pygmies and neighboring Bantu groups. 00:30:29.24 So the Pygmies have a much lower frequency of Goiter 00:30:33.16 compared to the neighboring Bantu populations, 00:30:36.16 and this could reflect a biological adaptation in Pygmies 00:30:41.20 to a low iodine environment. 00:30:43.24 It's very deleterious to get Goiter 00:30:46.22 because it can also lead to a diseased called Cretinism, 00:30:49.27 which of course is going to be very deleterious. 00:30:52.18 So again, here's an example 00:30:54.10 where something like adaptation to diet 00:30:56.13 could indirectly influence growth 00:30:58.28 or other phenotypes in the Pygmy population. 00:31:04.01 The last thing we wanted to do 00:31:06.01 was to look for regions of the genome, 00:31:08.08 using the whole genome sequencing data, 00:31:10.13 that are specific to Pygmies, 00:31:12.20 and those are shown in green here. 00:31:16.02 Now, we identified 25 clusters in the genome, 00:31:19.23 and the largest cluster 00:31:22.27 was right in that same region of chromosome 3 00:31:25.14 that we had previously identified. 00:31:28.00 But we had missed it in the prior study, 00:31:30.11 and the reason why is because 00:31:32.17 it contains these Pygmy-specific variants, 00:31:35.08 that were not captured by the SNP array that we used, 00:31:39.17 and thus demonstrating the great importance 00:31:42.00 of doing resequencing for identifying novel 00:31:44.24 and potentially functionally important variation 00:31:47.15 in ethnically diverse populations. 00:31:50.28 Now, this cluster consisted of 00:31:55.10 44 SNPs in 100% association with each other 00:31:59.16 over 170,000 nucleotide, 00:32:03.06 shown here, 00:32:05.24 and it contained a very interesting candidate gene called HESX1. 00:32:10.10 HESX1 codes for a transcription factor 00:32:13.05 that plays a very important role 00:32:15.04 in regulating the development 00:32:17.15 at the anterior pituitary in the brain, 00:32:20.14 and that's the site of production of growth hormone, 00:32:22.23 as well as other reproductive hormones. 00:32:25.11 Now, interestingly, 00:32:27.06 we identified a non-synonymous, 00:32:29.28 so an amino acid change, basically, 00:32:33.23 in this gene 00:32:36.03 that had been previously associated 00:32:38.13 with idiopathic short stature in humans. 00:32:41.26 But it turns out that this varian 00:32:44.01 t is present at about a 20% frequency in other Africans. 00:32:47.12 So what we hypothesize is that 00:32:49.13 there's something about this region 00:32:51.22 that may be altering gene expression of HESX1 00:32:55.07 or other genes in that region. 00:32:58.01 Upstream, we found another cluster 00:33:01.18 near this gene POU1F1, also known at Pit-1 in mouse, 00:33:07.13 and again this codes for a transcription factor 00:33:09.18 that plays a critical role in regulating growth hormone expression. 00:33:14.23 So another excellent candidate gene. 00:33:17.28 Now, what is interesting is that 00:33:19.27 both of these clusters, or genes, 00:33:23.18 are amongst the most differentiated regions 00:33:26.27 of the Pygmy genomes, 00:33:28.27 compared to genomes from elsewhere in Africa. 00:33:31.29 So we then picked out some of the SNPs in these regions 00:33:37.13 and genotyped them in a larger set 00:33:39.19 of western and eastern Pygmies, 00:33:41.26 and we showed that they are statistically 00:33:44.02 associated with short stature in Pygmies. 00:33:47.29 So the next step is going to be 00:33:49.24 to try to make transgenic models 00:33:52.01 that express these variants using transgenic mouse models, 00:33:56.06 and see what the phenotype looks like. 00:34:00.19 So that leads us to a number of hypotheses. 00:34:03.19 One, is that alterations in the growth hormone/IGF1 pathway 00:34:07.15 play a role in the short stature trait in Pygmies. 00:34:13.01 Two, is that anterior pituitary hormones 00:34:15.10 may play a central role in the Pygmy phenotype, 00:34:18.09 influencing growth, reproduction, 00:34:20.15 metabolism, and immunity. 00:34:24.00 And thirdly, that short stature 00:34:26.16 could be a byproduct of selection 00:34:28.11 acting on pleiotropic loci. 00:34:31.04 So if we look here, 00:34:32.21 one of the candidate loci that we identified is HESX1. 00:34:36.13 That's going to influence expression and development 00:34:39.20 of the anterior pituitary, 00:34:42.02 site of production of growth hormone. 00:34:44.20 Growth hormone expression is also regulated 00:34:46.23 by this other gene we found, POU1F1. 00:34:50.04 And this CISH regulates growth hormone receptor. 00:34:54.17 Now, if we look at the downstream effects 00:34:56.24 of growth hormone, 00:34:59.07 growth hormone, when it binds to growth hormone receptor, 00:35:02.18 will trigger off expression of IGF1, 00:35:06.12 predominantly from the liver, but from other tissues as well. 00:35:10.06 IGF1 will have an effect on muscle growth 00:35:13.14 and also on bone growth and height, 00:35:16.02 but the other impact, or the other role of growth hormone 00:35:20.12 is that it also influences insulin metabolism, 00:35:24.06 it influences fat metabolism. 00:35:28.01 And then we know that infectious disease 00:35:30.01 alters immune response and cytokine levels, 00:35:33.08 and that these can influence gene expression from CISH, 00:35:36.11 or other genes that are in this pathway. 00:35:40.09 So, when we go back to Africa to study the Pygmies, 00:35:42.28 what we would ultimately like to do next 00:35:45.16 is to measure all of the phenotypes, 00:35:48.01 because if you want to understand something 00:35:50.04 like the evolution of short stature in Pygmies, 00:35:52.19 I think you can't just be looking at stature 00:35:55.09 because the growth hormone pathway 00:35:58.25 plays a role in all of these different traits, 00:36:01.01 so we need to be looking at this as an integrative picture. 00:36:06.01 And in fact, our approach in the future 00:36:08.26 is to use an integrative genomics approach 00:36:11.24 combining whole genome data, 00:36:14.15 data on protein variation from blood, 00:36:17.25 epigenetic variation, 00:36:19.21 which can be influenced by diet and environment, 00:36:22.12 gene expression, 00:36:24.10 we're starting to look at the microbiome, 00:36:27.16 which is the spectrum of bacteria in the gut, 00:36:32.05 because that can not only be influenced by diet, 00:36:35.16 it can also have an influence on the metabolome, 00:36:38.12 or the set of all the metabolites, for example, 00:36:40.27 in blood. 00:36:42.20 And we want to combine that information 00:36:44.25 together with information on diet 00:36:46.22 and other environmental factors, 00:36:48.29 to try to identify genetic and environmental factors 00:36:52.15 that play a role in short stature 00:36:55.05 and in other anthropometric, 00:36:56.25 cardiovascular, 00:36:58.01 and metabolic traits. 00:37:00.20 One of the other approaches we can take 00:37:02.20 to distinguish the role of genetics and environment is, for example, 00:37:06.00 to look at individuals of the same or similar ethnic background, 00:37:10.29 but living in an urban versus a rural environment. 00:37:16.21 We can also take a different... 00:37:18.14 the opposite approach. 00:37:20.00 We can look at individuals who have 00:37:22.06 very different genetic ancestries, 00:37:25.03 but live in similar environments. 00:37:27.13 So for example, 00:37:29.20 this is a girl who is from the Fulani population, 00:37:33.17 and here's a neighboring... 00:37:35.19 an individual from the Tupuri population. 00:37:38.26 So they are genetically very differentiated, 00:37:41.20 but live in a similar environment, 00:37:43.16 yet the Fulani seem to have some innate resistance 00:37:47.06 to malaria infection. 00:37:50.03 By contrast, in the San, 00:37:53.09 from southern Africa, 00:37:54.29 are very differentiated from the Bantu, 00:37:57.15 but the San seem to have an innate susceptibility 00:38:01.09 to TB infection. 00:38:03.20 So again, by contrasting populations with different ancestry, 00:38:07.26 and living in different environments, 00:38:09.11 we may identify clues about the genetic basis 00:38:12.10 of differences in phenotypic variation 00:38:14.26 and disease susceptibility. 00:38:17.23 So in conclusion, 00:38:20.20 Africans have the highest levels of genetic diversity 00:38:23.04 within and among populations. 00:38:26.28 The demographic history of Africans 00:38:29.00 and local adaptation to different environments 00:38:31.04 has resulted in population 00:38:33.01 or region specific genetic variation. 00:38:36.25 And we need to be including 00:38:38.21 ethnically diverse Africans in genomic studies 00:38:41.17 to better identify both unique rare, and common variants 00:38:45.28 which may be of functional importance, 00:38:47.28 including those that play a role in disease risk 00:38:50.13 in these populations. 00:38:52.14 And I will just end by thanking 00:38:54.04 the many individuals 00:38:55.25 who contributed to these studies, 00:38:57.29 and my funding agencies, 00:39:00.16 and particular thanks to the Africans 00:39:02.20 who have contributed to these studies.