Session 2: Theory Behind Evolution II
Transcript of Part 2: African Genomics: Human Evolution
00:00:07.10 Hi, I'm Sarah Tishkoff. 00:00:08.23 I'm a professor at the University of Pennsylvania 00:00:11.02 in the Departments of Biology and Genetics, 00:00:13.24 and today I'm gonna tell you about my research 00:00:15.18 on African integrative genomics, 00:00:17.29 and implications for human origins and disease. 00:00:21.17 So in Part 1, I'm gonna tell you a bit about 00:00:23.24 human evolutionary history, 00:00:25.24 and what the implications are of that 00:00:27.20 on the patterns of genomic variation 00:00:29.18 that we see in populations today. 00:00:34.05 So I want to start by talking about some of the 00:00:35.26 key challenges in human genomics research. 00:00:38.19 And the first one is to characterize 00:00:40.27 the immense array of genomic and phenotypic diversity 00:00:44.29 across ethnically diverse human populations. 00:00:48.14 Secondly, to understand what the evolutionary processes are 00:00:51.16 that are generating and maintaining that variation. 00:00:54.14 And third, to better understand how 00:00:56.04 gene-gene, gene-protein, and gene-environment interactions 00:00:58.28 contribute to phenotypic variability. 00:01:01.27 So first let's start with the evolutionary history 00:01:05.00 of the hominin lineage 00:01:06.26 that's leading to modern humans, 00:01:10.13 which begins around the time that we 00:01:12.03 diverged from our closest genetic relative 00:01:14.04 the Chimpanzee, 00:01:15.18 sometime between 5-7 million years ago. 00:01:18.14 So shown here are some of the fossils 00:01:20.07 from the different species 00:01:22.17 preceding anatomically modern humans. 00:01:25.16 In blue are shown fossils from the oldest lineages, 00:01:30.06 and in fact one of the oldest is Sahelanthropus, 00:01:34.10 which has been dated to at least 7 million years ago, 00:01:37.29 and there's some debate about whether it even 00:01:39.14 belongs on the hominid lineage 00:01:41.09 or if it actually preceded the Chimpanzee and human divergence. 00:01:45.26 After that, in green, 00:01:47.14 we see the Australopithecus genus. 00:01:50.14 In yellow, we see Paranthropus genus. 00:01:54.09 In orange, we have the genus Homo 00:01:56.24 and the species proceeding anatomically modern humans 00:02:01.13 is Homo erectus, dated to about 2 million years ago. 00:02:06.14 And then we have the origins of 00:02:08.15 Homo neanderthalensis 00:02:11.02 and of anatomically modern humans. 00:02:13.24 Neanderthals are thought to have originated 00:02:16.00 somewhere between 300,000-400,000 years ago, 00:02:19.12 and modern humans originated 00:02:20.27 approximately 200,000 years ago. 00:02:24.03 Here's one of the best examples 00:02:26.11 of Australopithecus afarensis. 00:02:29.07 This was a set of fossils that was 00:02:31.24 discovered in the 1970's by Johanson and Gray, 00:02:36.02 named Lucy, 00:02:38.00 and Lucy was about... 00:02:41.04 she lived about 3.2 million years ago. 00:02:43.29 She was very small, only about 3 feet tall, 00:02:46.13 she had a very small brain, 00:02:48.07 and she was bipedal. 00:02:49.27 And being bipedal, in fact, 00:02:51.07 is one of the characteristics of the hominin lineage. 00:02:57.12 And, interestingly, 00:02:59.17 there have been some fossilized footprints 00:03:01.21 identified in Tanzania, 00:03:03.24 and we can see from these that there 00:03:06.08 appears to have been a mother, 00:03:08.27 from the species Australopithecus afarensis, 00:03:12.08 and she was holding the hands of her child. 00:03:14.29 And they must have been walking 00:03:16.15 in ash from recent volcanic activity, 00:03:20.06 and then that ash hardened and preserved these footprints 00:03:23.06 so that we can see them today, 00:03:24.21 and we can clearly see that they were bipedal. 00:03:29.08 So the species preceding modern humans 00:03:31.28 is called Homo erectus. 00:03:33.24 Homo erectus evolved around 2 million years ago, 00:03:39.02 and then after the origin of Homo erectus in Africa, 00:03:42.24 Homo erectus spread across Eurasia 00:03:47.17 and, indeed, shown here are some of the 00:03:49.21 oldest fossils of Homo erectus, 00:03:52.18 dated to as early as 1.9 million years ago (MYA) in Indonesia. 00:04:00.15 And this species was very successful, 00:04:03.14 lasting to as recently as 25,000 years ago 00:04:06.17 in Southeast Asia. 00:04:09.08 A very interesting recent finding was 00:04:11.20 a set of fossils identified on the island of Flores, 00:04:14.26 which is within Indonesia, 00:04:17.25 and these fossils actually show some characteristics 00:04:21.22 that look very similar to Homo erectus, 00:04:24.19 and for that reason it was proposed that 00:04:27.09 this species may have directly evolved 00:04:30.23 from a Homo erectus ancestor 00:04:33.20 that arrived on that island 00:04:36.07 about 1 million years ago 00:04:37.28 and then evolved in isolation. 00:04:39.25 And two of the very unique features of this species 00:04:42.17 is that they were very short, so again, 00:04:46.01 about the same size as Lucy, around 3 feet tall, 00:04:50.15 and secondly, that they had tiny brains. 00:04:53.14 And there's been a lot of debate about 00:04:55.01 whether this is an adaptation or in fact a pathology, 00:04:58.09 and there's still a lot of research being done, 00:05:01.03 but what was clear is that there were multiple species 00:05:04.01 outside of Africa 00:05:05.29 within the past 2 million years. 00:05:08.20 So now let's move on to the origins of 00:05:10.15 Homo neanderthalensis and Homo sapiens. 00:05:13.12 There's some question about the species preceding 00:05:16.28 Neanderthal and Homo sapiens. 00:05:19.17 Some say that it was heidelbergensis, 00:05:22.04 but there's debate about that. 00:05:24.15 However, what is clear is that the Neanderthals species 00:05:28.10 arose somewhere within the past 300,000-400,000 years, 00:05:32.15 and Homo sapiens arose within the past 200,000 years. 00:05:38.04 And this is a fossil from Neanderthals, 00:05:40.29 we can see a few features such as 00:05:44.02 the double arched and very wide brow ridges, 00:05:47.08 a broad nose, 00:05:48.28 a very large brain size, 00:05:50.27 and a retromolar space, 00:05:52.21 and in fact these species were very robust. 00:05:55.16 The males would have been over 6 feet tall, 00:05:57.15 they had very big bones, 00:05:59.19 and they had rather big brains. 00:06:02.20 In fact, here are some reconstructions of Neanderthal. 00:06:06.28 We have the old reconstruction 00:06:09.03 and then the more recent one as well. 00:06:12.11 So, anatomically modern humans, Homo sapiens sapiens, 00:06:16.06 arose approximately 200,000 years ago. 00:06:19.02 In fact, here these red dots 00:06:21.09 are representing locations where fossils have been found 00:06:24.11 of anatomically modern humans, 00:06:26.27 and the oldest fossil is 00:06:28.22 dated to around 150,000-195,000 years ago, 00:06:32.19 in Southern Ethiopia. 00:06:36.23 We also see evidence of early modern human behavior 00:06:40.10 dated to 70,000 years ago, 00:06:42.11 or even as old as 120,000 years ago, 00:06:45.16 in caves in south Africa 00:06:47.13 and also some from east Africa as well. 00:06:51.05 So after modern humans arose in Africa within the past 200,000 years, 00:06:55.08 one or a few small groups of individuals 00:06:57.25 migrated across the rest of the globe 00:07:00.11 within the past 50,000-100,000 years. 00:07:03.23 Indeed, we think that Europeans... 00:07:07.15 there were no people in Europe, actually, 00:07:09.06 until about 40,000 years ago, 00:07:11.13 and then modern humans crossed the Bering Straits 00:07:14.15 and went into the Americas 00:07:16.28 within the past 30,000 years. 00:07:19.05 The earliest migration event was actually into Australo-Melanesia, 00:07:23.11 dated to about 40,000-60,000 years ago. 00:07:26.14 And then we have much more recent migration events, 00:07:29.03 such as into the Pacific Islands, 00:07:31.12 within the past few thousand years. 00:07:34.11 Now, interestingly, 00:07:36.16 when modern humans migrated out of Africa 00:07:39.08 within the past 50,000-100,000 years, 00:07:42.05 they would have run into Neanderthals, 00:07:44.10 in fact they overlapped in their distribution. 00:07:47.08 So shown here is the distribution of Neanderthals, 00:07:50.22 and the modern humans who lived at that time 00:07:52.25 were referred to as Cro-Magnon, 00:07:55.17 and in fact we did not see anatomically modern humans 00:07:59.09 in this region, in Europe, until about 40,000 years ago. 00:08:03.03 They would have been in the Middle East a little bit earlier, 00:08:05.23 but it appears they overlapped 00:08:08.18 for about at least 10,000 years with Neanderthals. 00:08:12.13 And as we'll discuss later, 00:08:13.27 there is some evidence that there could have been actual admixture 00:08:17.05 between Neanderthal and anatomically modern humans 00:08:20.18 during that time. 00:08:22.26 So now I want to discuss the evolutionary forces 00:08:25.27 that influence the patterns of genetic variation 00:08:28.08 that we see today. 00:08:30.04 And these include mutation, 00:08:32.14 genetic drift, 00:08:33.29 migration, 00:08:35.09 and natural selection. 00:08:37.16 So let's first introduce some terminology. 00:08:40.05 The gene pool refers to the set of all genomes 00:08:42.25 in a specified population, 00:08:44.10 and here we have an example from a population of warthogs. 00:08:47.22 So where we have at a single genetic locus 00:08:51.03 two alleles, big B or little b, 00:08:54.17 and here's an example of an individual 00:08:56.11 who is homozygous for the big B allele, 00:08:59.07 and an individual homozygous for the little b allele, 00:09:02.12 and here's an individual who is heterozygous 00:09:05.08 for big B and little b. 00:09:07.12 And together, the set of alleles in that population 00:09:10.19 represents the gene pool. 00:09:13.28 So when we are doing population genetics analyses, 00:09:16.25 we can't actually go out and look at every genotype 00:09:21.00 for every individual in the population, 00:09:23.14 that would be unfeasible. 00:09:25.13 So what we typically do is to 00:09:26.23 infer frequencies by estimating them 00:09:30.10 from a random sample. 00:09:32.25 So in population genetics 00:09:35.01 generation, each new individual 00:09:37.16 is viewed as drawing from a set of gametes 00:09:39.20 with alternative alleles, 00:09:41.08 so let's use an example here 00:09:43.01 in which we have a set of marbles in a bowl. 00:09:46.05 And initially, we have a distribution of 00:09:51.26 60 of the white marbles 00:09:54.13 relative to 40 of the green marbles, 00:09:56.27 and these, the white and the green, 00:09:58.08 are representing different alleles. 00:10:00.14 So let's say that we're gonna pick... 00:10:02.04 we're gonna reach into this bag 00:10:04.04 and we're gonna randomly draw out 00:10:06.09 another hundred of these marbles. 00:10:09.01 And now in the next generation 00:10:10.26 we have 80 of the white and we have 20 of the green. 00:10:15.02 We're gonna reach back in, 00:10:16.01 we're gonna grab another set of a hundred, 00:10:18.09 and now in the next generation 00:10:20.15 we have 100 of the white alleles and 0 of the green. 00:10:26.08 And this is a demonstration of 00:10:27.15 how we get changes in allele frequency over time. 00:10:31.25 Allele frequencies will also change over time 00:10:34.23 due to genetic drift, 00:10:36.21 which is defined as random fluctuations 00:10:39.01 of allele frequencies from generation to generation, 00:10:42.03 simply due to chance. 00:10:44.19 So as we see, sometimes things could happen, 00:10:47.16 like these bugs are getting squashed, 00:10:50.00 and that's gonna change, perhaps, 00:10:52.07 the allele frequency in the next generation. 00:10:55.19 Here's another example from some lady bugs, 00:10:58.23 and we can see that, perhaps, 00:11:01.03 in the next generation, just by chance, 00:11:03.10 we're gonna see more of these ladybugs 00:11:04.29 with the dark colors, 00:11:06.12 or we might see more that are with the medium colors and dots. 00:11:10.16 And the fact is that drift is just an inevitable fact of life. 00:11:16.15 I also want to define what we mean by neutral evolution. 00:11:20.08 So we define a selectively neutral allele 00:11:22.10 as one that does not affect reproductive fitness of individuals 00:11:25.20 who carry that allele, 00:11:27.20 so it's frequency in the population 00:11:29.25 changes by chance or genetic drift alone. 00:11:32.18 And here we have an example: 00:11:35.04 this is just a substitution 00:11:37.22 in the third position of the codon, 00:11:41.02 and when we have substitutions 00:11:44.09 of nucleotides in the third position, 00:11:46.20 very typically they result in a silent or synonymous change. 00:11:51.05 So here there's been a substitution, 00:11:53.00 but there's no change in the amino acid; 00:11:55.02 it remains as valine. 00:11:57.26 So the rate at which genetic drift occurs 00:12:00.01 is going to inversely proportional to the population size, N, 00:12:03.23 and it's going to be very fast in small populations. 00:12:06.27 And here's an example that we can look at 00:12:08.23 based on computer simulation. 00:12:11.20 So let's assume here that we're looking at a single locus 00:12:15.15 and it has two alleles 00:12:18.06 that are at 50% frequency each, 00:12:21.25 as we can see here. 00:12:23.22 We have a sample size of 25, 00:12:27.06 and we're going to do the simulation 00:12:29.03 over 80 generations. 00:12:31.14 Now, each of these lines here 00:12:34.03 represents a different simulation, 00:12:36.27 and what we can see is that 00:12:38.23 over time alleles are either going to 00:12:44.02 be lost from the population 00:12:46.08 or they're going to reach fixation, 00:12:48.17 which means that they go to 100% frequency. 00:12:52.10 And the rate at which this occurs 00:12:54.00 is going to depend on the sample size. 00:12:56.09 So in a small sample it's gonna be very rapid, 00:12:59.19 but in this example where we have a larger sample, now N=300, 00:13:03.26 you can see that it just takes more time. 00:13:05.23 There's not as much genetic drift occurring. 00:13:08.19 Now, the end result is gonna be the same, 00:13:10.15 it just takes more time. 00:13:14.09 The change in allele frequency also is going to depend 00:13:17.27 on the initial allele frequencies. 00:13:19.20 So in this particular case, 00:13:21.05 we've now changed the starting frequency: 00:13:23.20 it's not 50%, it's now 10%. 00:13:27.06 And you can see that there's much more 00:13:29.28 probability of loss of the allele in this case, 00:13:34.11 and here we have just one of the alleles reaching fixation. 00:13:42.08 So again, in this particular case, 00:13:44.05 about 1 out of 10 will eventually become fixed, 00:13:47.14 or reach 100% frequency. 00:13:51.09 Now here's an example from a large population. 00:13:54.01 It'll take longer for this to occur, 00:13:56.02 but the proportion of alleles are gonna be 00:13:58.12 roughly the same, 00:13:59.29 so again roughly 1 out of 10 will go to fixation, 00:14:03.06 it's just gonna take longer. 00:14:05.16 Other important terms in population genetics 00:14:07.26 are bottleneck and founder effects, 00:14:10.08 and this is because genetic drift 00:14:11.23 has a large effect on allele frequencies 00:14:14.10 when a population originates 00:14:16.05 via a small number of people from a larger population. 00:14:19.16 So here we have an example of a bottleneck, 00:14:22.10 and what a bottleneck means is that 00:14:24.01 there's been a decrease in population size 00:14:26.21 at some time in the past. 00:14:28.14 So you can think of it as a population crash. 00:14:31.10 And what happens when the population is very small, 00:14:34.28 you're going to have a higher rate of genetic drift, 00:14:37.12 and we can see here that these alleles, 00:14:39.20 which are represented by the different colors, 00:14:42.00 have shifted from what we're seeing 00:14:44.18 back at this earlier time. 00:14:46.25 Now we go through the bottleneck, 00:14:48.19 and now we're seeing predominantly 00:14:50.07 these white and black alleles. 00:14:53.09 Another example we can look at is a founder event, 00:14:57.20 which is sort of a special case of a bottleneck event. 00:15:00.11 And in this case it's where a population, a small population, 00:15:05.03 breaks off from the larger population, 00:15:07.25 and again there's going to be increased genetic drift 00:15:10.26 in this initially small population 00:15:13.12 and here, by chance, 00:15:15.05 we just happened to see more of these dark blue 00:15:18.12 and light blue alleles. 00:15:21.09 The pattern of variation that we see 00:15:22.23 in the human genome 00:15:24.09 is also dependent on the effective population size, 00:15:27.17 which we distinguish as capital N sub e. 00:15:32.10 And the definition of the effective population size 00:15:35.10 is the number of breeding individuals in a population. 00:15:38.19 So estimates of Ne 00:15:40.17 are most strongly influenced by population sizes 00:15:43.07 when they're at their smallest, 00:15:45.10 and it could take many generations 00:15:47.02 to recover from a bottleneck event. 00:15:49.11 So estimates of Ne in modern populations 00:15:51.21 reflect the size of the population 00:15:53.20 prior to population expansion. 00:15:56.22 Pretty consistently, studies of nuclear sequence diversity in humans 00:16:00.24 have estimated an effective population size 00:16:03.15 of about 10,000. 00:16:05.19 Now, by contrast, if we look at Chimpanzees, 00:16:08.29 the estimate is closer to 35,000. 00:16:12.14 And so what that means is that 00:16:14.01 humans have undergone a bottleneck 00:16:16.18 sometime during their evolutionary history. 00:16:19.22 So the pattern of genomic variation 00:16:21.25 that we see in modern populations today 00:16:24.00 is a reflection of our evolutionary and demographic history. 00:16:27.14 So how much do we differ? 00:16:29.17 Well, identical twins 00:16:31.27 have no differences at the nucleotide level. 00:16:35.06 If we compare unrelated humans, 00:16:36.29 we differ at about 1 out of 1,000 nucleotide sites. 00:16:41.12 And if we compare humans to our closest genetic relative, the Chimpanzee, 00:16:45.02 we differ at about 1 out of 100 sites. 00:16:47.29 So, as a whole, our species is very similar, 00:16:50.27 and that simply reflects our recent common ancestry 00:16:54.05 from Africa within the past 100,000 years. 00:16:57.06 But when you consider that there are 00:16:58.27 over 3 billion DNA bases in the genome, 00:17:02.02 that results in 3 million differences 00:17:04.16 between each pair of genomes, 00:17:06.05 more than enough to generate the diversity 00:17:08.29 that will make each of us unique. 00:17:12.02 Now I want to introduce a statistic 00:17:14.13 that we typically use to look at how much variation 00:17:17.06 there is among populations, 00:17:20.01 and this is referred to as an Fst statistic. 00:17:24.00 And it's simply looking at the proportion of genetic variation 00:17:27.03 that is within populations, 00:17:29.06 relative to that which is between populations. 00:17:32.18 Fst can be measured based upon heterozygosity, 00:17:37.20 and heterozygosity is simply a measure of genetic variation, 00:17:41.26 which is very simply calculated as 00:17:44.15 1 minus the sum of the allele frequencies squared. 00:17:49.09 And so once we calculate 00:17:51.26 the heterozygosity for each locus, 00:17:53.29 we can look at the average, 00:17:55.23 and we can look at the average within a subpopulation, 00:17:58.03 or in the total combined population. 00:18:00.29 Now, just as an example, 00:18:03.15 if we were to see here that 00:18:06.22 in the case of Fst = 1, 00:18:09.12 it means that there is no overlap at all in the allele frequencies. 00:18:13.15 So we can see that in population 1 they have all A's, 00:18:16.13 and in population 2 they have all B's. 00:18:19.15 And in the case of Fst = 0, 00:18:22.18 there is complete similarity, 00:18:26.08 so here we see exactly the same number 00:18:28.13 of A alleles and exactly the same number of B alleles. 00:18:32.01 And then here's an intermediate case 00:18:33.29 where we have about 0.11, 11%, 00:18:39.07 showing that there's just a small amount of differentiation 00:18:43.04 between these two populations. 00:18:46.09 So what do we see in humans? 00:18:47.29 Well, the average Fst between human populations 00:18:51.04 is about 15%, 00:18:53.15 and what that means is that the majority of genetic variation 00:18:56.04 is found within a population, 00:18:59.07 and only about 15% of the genetic diversity 00:19:02.08 differs between populations. 00:19:04.23 Again, this is reflecting our recent common ancestry in Africa, 00:19:09.00 within the past 50,000-100,000 years. 00:19:14.13 Now, interestingly, 00:19:16.09 if we were to do this calculation from Chimpanzee populations, 00:19:19.08 we see that the value is around 32%, 00:19:22.15 so there's actually a lot more differentiation 00:19:25.04 among Chimpanzee populations 00:19:27.07 than among human populations, 00:19:29.18 again reflecting our overall close genetic similarity to each other. 00:19:36.19 So I now want to talk about the 00:19:38.04 different sources of DNA that we use 00:19:40.04 to reconstruct human evolutionary history. 00:19:43.01 One source of DNA is 00:19:45.29 that which is present in the nuclear genome 00:19:48.06 that's located in the nucleus of the cell. 00:19:51.03 And there's another type of genome 00:19:53.20 which is present in the mitochondria of the cell, 00:19:56.15 and the mitochondria is the energy-producing organelle. 00:20:02.13 So what is the difference between these different genomes? 00:20:06.03 Well, the nuclear genome 00:20:08.09 consists of 22 autosomal pairs of chromosomes 00:20:12.26 and then the sex chromosomes, 00:20:14.15 XX for females and XY for males. 00:20:17.27 The nuclear genome is about 3.4 billion bases in size, 00:20:22.02 and it consists of about 20,000 coding genes. 00:20:25.10 It's inherited from both parents, 00:20:27.21 but it also undergoes extensive recombination each generation. 00:20:32.07 But, one of the reasons it's useful is that there's 00:20:34.18 so many different locations where we can study variation, 00:20:38.08 given that there are 3 billion nucleotides, 00:20:41.02 it's just a little bit more difficult to trace them back 00:20:43.29 to a single common ancestor. 00:20:46.20 By contrast, the mitochondria DNA genome 00:20:50.21 is very small, it's only about 16,000 nucleotides in size, 00:20:55.14 and it's circular, 00:20:57.17 and it's passed on only through the maternal lineage. 00:21:00.19 There's also no recombination 00:21:02.17 and it has a very high mutation rate. 00:21:05.00 All of these features make it very useful 00:21:07.01 for tracing evolutionary history. 00:21:09.27 So let me give you another example of what I'm referring to. 00:21:13.12 The mitochondrial DNA is inherited through the maternal lineage, 00:21:17.05 whereas the nuclear DNA is inherited from both parents. 00:21:22.08 So if we were to trace back from a present day individual, 00:21:25.26 they will have inherited their nuclear genome 00:21:28.20 from their parents, 00:21:30.17 their parents would have inherited from their set of parents, 00:21:33.28 and then their set of parents, and so on. 00:21:36.15 So we can trace it back to a large number of ancestors. 00:21:39.16 But by contrast, if we're tracing back mitochondrial DNA lineages, 00:21:44.00 we can see that they're only passed on 00:21:46.25 through the maternal lineage, 00:21:49.10 so they're essentially inherited from a single lineage. 00:21:52.03 We can trace them back to a single common female ancestor, 00:21:56.01 and that's why they're been very useful 00:21:57.29 for human evolutionary genetics studies. 00:22:00.21 So for example, if we were to consider 00:22:02.26 these dots to be mitochondrial DNA lineages, 00:22:06.20 and let's start at generation 11 at the bottom, 00:22:10.12 shown by the red dots, 00:22:12.06 and imagine those are different mitochondrial DNA sequences 00:22:15.00 from different individuals. 00:22:17.10 At some time in the past, these two individuals, for example, 00:22:22.06 coalesced back to a common ancestor, 00:22:24.26 and then this group coalesces back to a common ancestor here, 00:22:29.29 and ultimately they all coalesce back 00:22:32.20 to a single common ancestor. 00:22:35.03 Now, in the popular literature, 00:22:36.22 the single common ancestor for mitochondrial DNA 00:22:39.04 is often referred to as "mitochondrial Eve", 00:22:42.21 but one thing to remember is that 00:22:45.17 Eve was not alone, she lived within a population, 00:22:49.06 as we can see here by the other colors. 00:22:51.22 But those lineages just never made it 00:22:54.22 down to the present day. 00:22:57.25 So this is a phylogenetic tree 00:23:00.11 constructed by sequencing mitochondrial DNA 00:23:03.10 whole genome lineages 00:23:05.02 from ethnically diverse individuals. 00:23:07.19 So each individual actually represents 00:23:10.29 a branch on this tree, 00:23:13.02 and if two individuals are very closely related to each other, 00:23:16.05 they'll be very close to each other 00:23:19.01 in the tree. 00:23:21.03 So one of the first things you can see 00:23:22.19 using Chimpanzee as an outgroup 00:23:25.01 is that all modern human lineages 00:23:27.25 coalesce at about 170,000 years ago, 00:23:31.12 and so that corresponds very well with the 00:23:33.05 time of origin of anatomically modern humans. 00:23:36.23 So another thing that we can see is that 00:23:39.25 all of the oldest genetic lineages 00:23:42.26 are from African individuals. 00:23:45.22 We can also see that 00:23:48.12 the very oldest lineages 00:23:50.15 are from the San and the Mbuti pygmy hunter-gatherers, 00:23:54.28 and then the more recent lineages 00:23:57.13 are from non-African populations. 00:24:00.01 And that is a pattern that's very consistent 00:24:02.17 with the model of a recent African origin 00:24:05.12 of modern humans. 00:24:07.23 Now, another way that we can compare mitochondrial DNA sequences 00:24:11.21 is to simply count up the number of sites 00:24:14.04 at which they differ when we compare any pair of sequences. 00:24:17.23 And when we do this, 00:24:19.09 we observe that 00:24:22.11 any two African lineages will differ from each other 00:24:25.03 at many more sites than any two non-African lineages. 00:24:29.06 And again, that means that there has been more time 00:24:32.02 for variation to accumulate in Africa, 00:24:34.16 and is consistent with an African origin 00:24:37.08 of modern humans. 00:24:39.20 When we sequence the mitochondrial DNA lineages, 00:24:42.21 we can classify them as haplotypes, 00:24:45.10 and those haplotypes belong to 00:24:47.16 larger subsets of haplogroups. 00:24:50.01 You can think of a haplotype as simply 00:24:52.14 the arrangement of genetic variants along a chromosome, 00:24:55.19 or in the case of the mitochondrial DNA 00:24:57.22 there's just a single genome, 00:24:59.14 so it's really just the different nucleotide differences 00:25:02.27 amongst different mitochondrial DNA lineages. 00:25:06.24 And one of the first things that you can note is that 00:25:09.26 there are different haplogroups 00:25:11.29 in different regions of the world. 00:25:13.19 So here are some that seem to be pretty specific to Africa, 00:25:16.20 but are also present in some regions 00:25:18.20 where there may have been some gene flow 00:25:20.20 from Africa. 00:25:22.21 Then we have others that may be more common in Europe, 00:25:25.12 or in east Asia, 00:25:28.18 or in the Americas. 00:25:30.19 And for that reason, 00:25:32.11 mitochondrial DNA can be very useful for 00:25:34.11 tracing recent human migration events. 00:25:38.13 Now, by contrast, 00:25:40.02 the Y chromosome is also inherited with no recombination, 00:25:45.14 and so it can also be very useful for tracing back 00:25:48.01 through the male lineages. 00:25:50.16 And here is a phylogeny constructed from Y chromosome variation, 00:25:55.07 and as with the mitochondrial DNA, 00:25:58.08 what we see is that the oldest lineages 00:26:01.19 are specific to Africans, 00:26:04.02 and the more recent lineages 00:26:06.05 are found predominantly in Non-Africans, 00:26:08.13 although we do see some in Africans as well. 00:26:11.25 Again, this is consistent with the recent African origin of modern humans. 00:26:18.14 We can also look at Y chromosome haplogroups, 00:26:22.09 and one of the things that's a little bit different 00:26:24.04 is you can see that they're a bit more differentiated 00:26:26.16 between geographic regions. 00:26:29.03 So for example, 00:26:30.24 here we just see haplogroups that are in blue, 00:26:34.04 and we see very distinct haplogroups 00:26:36.20 in the Americas, shown in purple. 00:26:39.26 And one of the reasons for that may have to do with 00:26:43.08 sex-biased migration, 00:26:46.01 that you may have, for example, 00:26:47.16 one male traveling long distances. 00:26:50.06 And it may also have to do with patterns of mating structure. 00:26:54.20 So for example, in some populations or ethnic groups, 00:26:57.23 you may have one male who has many different wives, 00:27:01.05 and because of that the effective population size of the Y chromosome 00:27:07.01 is actually smaller than the mitochondrial DNA, 00:27:09.28 and we tend to get more genetic differentiation 00:27:12.27 around the world. 00:27:15.07 So now I want to talk about analyses of ancient DNA, 00:27:18.27 for example, in this case from Neanderthal, 00:27:22.12 and these are some images of scientists 00:27:25.20 working on a Neanderthal fossil. 00:27:29.10 And this type of analysis is very challenging 00:27:32.01 for a number of reasons. 00:27:33.25 One is that DNA which is that old, 00:27:38.04 on the order of say 30,000 years old 00:27:40.10 to even 100,000 years old, 00:27:42.06 is going to be highly degraded, 00:27:44.24 and if there's any contamination 00:27:46.25 with modern human DNA, 00:27:49.02 that is much more likely to amplify 00:27:51.19 than the old degraded DNA 00:27:54.01 from the archaic species, 00:27:56.21 so one has to be extremely careful when analyzing this DNA. 00:28:01.03 Now, more recently, 00:28:02.24 there was a pinky finger bone 00:28:05.07 identified in a cave in Siberia 00:28:07.22 from a region called Denisova, 00:28:10.11 so it's referred to as the Denisova 00:28:13.21 or Denisovan genome. 00:28:16.11 Here I'm presenting a phylogenetic tree 00:28:18.29 based on mitochondrial DNA variation 00:28:21.24 comparing modern humans, shown in blue here, 00:28:26.09 to Neanderthals shown in red, 00:28:29.01 and the Denisova individual shown in yellow. 00:28:32.23 And what we can see is that the 00:28:34.17 time to most recent common ancestry in humans, 00:28:37.08 as we've already discussed, 00:28:39.00 is about 200,000 years ago. 00:28:41.13 The time to most recent common ancestry 00:28:43.14 between humans and Neanderthals 00:28:46.01 is about 500,000 years ago, 00:28:48.13 for the mitochondrial DNA lineages. 00:28:51.03 And the time to most recent common ancestry 00:28:53.20 with the Denisova mitochondrial lineages 00:28:57.08 is about 1 million years ago. 00:29:00.05 So this is demonstrating a couple of things. 00:29:02.20 From the mitochondrial DNA perspective, 00:29:05.07 there's no evidence of any admixture 00:29:07.13 with anatomically modern humans. 00:29:10.02 The Neanderthal sequences are clearly 00:29:12.18 very distinct from modern humans. 00:29:14.28 It's also showing you that there was another species, Denisova, 00:29:18.15 that appears to be distinct from the Neanderthals, 00:29:21.07 and they diverge even earlier than Neanderthals 00:29:24.09 from modern humans. 00:29:26.21 So if we were to compare pairwise nucleotide diversity, 00:29:31.01 for example, 00:29:33.02 among anatomically modern humans shown in blue, 00:29:35.24 you can see that there's not a lot of diversity, 00:29:38.15 as expected considering that 00:29:40.13 we all have a very recent common ancestry. 00:29:43.04 If you compare the modern human mitochondrial genomes to Neanderthal, 00:29:48.03 you can see that they're more divergent, 00:29:50.07 as expected, given that the mitochondrial DNA lineage 00:29:54.04 diverged about 500,000 years ago. 00:29:57.02 If we compare to the 00:29:59.03 Denisovan mitochondrial DNA lineage, 00:30:01.10 they're even more divergent. 00:30:04.04 And then if we compare to Chimpanzee, 00:30:06.14 of course as expected, 00:30:08.11 given that they diverged at least 5 million years ago, 00:30:11.14 they are the most different in terms of sequence variation. 00:30:15.13 Now, several years ago 00:30:18.13 there was a draft sequence produced of 00:30:21.20 the Neanderthal genome using next-generation sequencing technology. 00:30:25.25 And this was an absolutely amazing feat, 00:30:28.17 but at the time they had very low coverage, 00:30:31.07 meaning that any particular region of the genome 00:30:33.19 was sequenced only about once or twice. 00:30:36.20 Now, more recently, 00:30:38.07 as the technology has improved, 00:30:40.05 they've gotten much better coverage of the Neanderthal sequence, 00:30:43.04 and quite recently they now have a 30-fold coverage, 00:30:46.22 meaning that on average most sites 00:30:49.03 will have sequenced 30 times. 00:30:51.22 And so you'll have a much better accuracy 00:30:54.23 when determining nucleotide variation. 00:31:01.07 So, when the Neanderthal genome 00:31:03.25 was compared to the human genome, 00:31:06.11 what you can do is first 00:31:08.10 look at how much divergence has occurred 00:31:11.02 since modern humans differentiated from Chimpanzees 00:31:15.10 within the past 6.5 million years. 00:31:18.12 And you can look at the divergence 00:31:20.24 that has occurred specifically in the human lineage 00:31:24.06 since they diverged from Neanderthal, 00:31:26.21 and they've only accumulated 00:31:29.07 about 8% of this total divergence. 00:31:34.08 And so the estimate of the time of population divergence 00:31:38.06 between humans and Neanderthals 00:31:40.15 is about 400,000 years ago. 00:31:43.09 Furthermore, it has been estimated that 00:31:45.24 there may have been a small amount of admixture 00:31:48.16 between Neanderthals and anatomically modern humans, 00:31:52.01 as shown by this red arrow here. 00:31:54.18 So the estimated amount of admixture is about 1-2%, 00:32:00.15 of the modern human genome, 00:32:02.17 may be of Neanderthal ancestry. 00:32:05.03 But what is of interest is to note that 00:32:07.24 this is only present in Non-Africans. 00:32:10.13 It is not present in African genomes. 00:32:13.05 And so what we can infer from that is 00:32:15.16 that this admixture event probably occurred 00:32:18.25 before modern humans spread across the globe. 00:32:22.01 It may have occurred, for example, in the Middle East, 00:32:24.28 and that's why we're seeing it present in all Non-Africans, 00:32:29.18 and we don't see it at all in Africans. 00:32:32.15 Now, more recently, there has been 00:32:34.22 whole genome sequencing of the Denisovan individual, 00:32:39.20 and what that has shown is that 00:32:42.09 the Denisovan species, or this individual, 00:32:45.15 appears to have diverged from modern day humans 00:32:48.13 around 800,000 years ago, 00:32:51.09 consistent with what we saw from the mitochondrial DNA. 00:32:55.21 They also observed low levels of heterozygosity in Denisova, 00:32:59.21 suggesting that they may have had 00:33:01.19 a small population size. 00:33:04.06 Additionally, when a phylogenetic tree 00:33:07.24 was constructed from the nuclear DNA variation, 00:33:11.13 they could see that the modern humans 00:33:15.11 tend to cluster together, 00:33:17.09 and as we expect they're divergent 00:33:19.01 from the Denisova and the Neanderthals. 00:33:21.29 The Neanderthals tend to cluster together, 00:33:24.06 so they're clearly divergent from Denisova. 00:33:27.03 But what's interesting is if you look at how much 00:33:31.01 variation there is amongst the modern humans, 00:33:34.11 as indicated by the length of these lineages, 00:33:38.06 and then you compare that to Neanderthals, 00:33:40.14 which have very short branches. 00:33:43.06 What that suggests is 00:33:44.28 that there was not a lot of genetic variation 00:33:47.09 amongst the Neanderthals, 00:33:49.23 and therefore they may have undergone a bottleneck, 00:33:52.11 so they might have undergone a population crash 00:33:54.20 at some point in the past. 00:33:57.07 So in summary, 00:33:59.04 what we can see is that 00:34:01.23 Homo erectus left Africa 00:34:04.05 within the past 2 million years, 00:34:06.28 and spread throughout Eurasia, 00:34:09.09 giving rise, possibly, 00:34:11.09 to species like Homo floresiensis, 00:34:14.17 and surviving until quite recently, 00:34:17.12 as recently as around 25,000 years ago. 00:34:20.28 Then we have other species like Neanderthal and Denisovans, 00:34:27.02 who may have originated from a different species, 00:34:30.07 such as heidelbergensis, 00:34:33.10 and they differentiated sometime 00:34:36.12 around 600,000 or 700,000 years ago in the case of Denisova, 00:34:39.29 or in Neanderthals around 400,000 years ago. 00:34:43.05 And then we have the modern human lineage, 00:34:46.11 Homo sapiens, 00:34:49.00 which arose around 200,000 years ago 00:34:51.07 and spread out of Africa. 00:34:53.21 And when they did so, 00:34:55.02 they would have encountered these other species, 00:34:57.09 and there may have then been low levels of gene flow. 00:35:01.20 And in fact for the case of the Denisovan genome, 00:35:03.23 it appears that the gene flow 00:35:05.26 was predominantly with populations from Oceania, 00:35:10.01 implying that this admixture 00:35:12.17 may have occurred in a different location and a different time. 00:35:16.00 Now, we still don't know exactly 00:35:18.05 how much admixture there may have been 00:35:20.12 between archaic species 00:35:22.23 and modern humans in Africa, 00:35:25.01 but there's some preliminary data suggesting that 00:35:27.10 this has occurred there as well. 00:35:29.14 The problem is that the fossils don't preserve as well in Africa, 00:35:32.19 so we don't have any DNA sequences 00:35:34.26 from archaic lineages in Africa at this point. 00:35:40.01 So in conclusion, 00:35:41.18 Africa has the most genetic diversity in the world. 00:35:44.15 Human dispersions out of Africa 00:35:46.11 populated the entire world, 00:35:48.15 and we are the last of a series of hominin dispersal events 00:35:51.14 out of Africa.