Overview
Sarah Tishkoff studies the evolution and migration of human populations. She describes work from her lab looking at genotypic and phenotypic variation among populations speaking different African languages. Her results provide genetic insights into the history of African ethnic groups and African Americans. Tishkoff’s last talk focuses on the importance of natural selection in human evolution. For instance, mutations that are associated with disease in modern African populations may have been advantageous to the same populations at some point in their past.
African Genomics: Human Evolution
Concepts: Gene pool, allele frequency, genetic drift, neutral evolution, genetic bottleneck, founder effect, effective population size
00:00:07.10 Hi, I'm Sarah Tishkoff.
00:00:08.23 I'm a professor at the University of Pennsylvania
00:00:11.02 in the Departments of Biology and Genetics,
00:00:13.24 and today I'm gonna tell you about my research
00:00:15.18 on African integrative genomics,
00:00:17.29 and implications for human origins and disease.
00:00:21.17 So in Part 1, I'm gonna tell you a bit about
00:00:23.24 human evolutionary history,
00:00:25.24 and what the implications are of that
00:00:27.20 on the patterns of genomic variation
00:00:29.18 that we see in populations today.
00:00:34.05 So I want to start by talking about some of the
00:00:35.26 key challenges in human genomics research.
00:00:38.19 And the first one is to characterize
00:00:40.27 the immense array of genomic and phenotypic diversity
00:00:44.29 across ethnically diverse human populations.
00:00:48.14 Secondly, to understand what the evolutionary processes are
00:00:51.16 that are generating and maintaining that variation.
00:00:54.14 And third, to better understand how
00:00:56.04 gene-gene, gene-protein, and gene-environment interactions
00:00:58.28 contribute to phenotypic variability.
00:01:01.27 So first let's start with the evolutionary history
00:01:05.00 of the hominin lineage
00:01:06.26 that's leading to modern humans,
00:01:10.13 which begins around the time that we
00:01:12.03 diverged from our closest genetic relative
00:01:14.04 the Chimpanzee,
00:01:15.18 sometime between 5-7 million years ago.
00:01:18.14 So shown here are some of the fossils
00:01:20.07 from the different species
00:01:22.17 preceding anatomically modern humans.
00:01:25.16 In blue are shown fossils from the oldest lineages,
00:01:30.06 and in fact one of the oldest is Sahelanthropus,
00:01:34.10 which has been dated to at least 7 million years ago,
00:01:37.29 and there's some debate about whether it even
00:01:39.14 belongs on the hominid lineage
00:01:41.09 or if it actually preceded the Chimpanzee and human divergence.
00:01:45.26 After that, in green,
00:01:47.14 we see the Australopithecus genus.
00:01:50.14 In yellow, we see Paranthropus genus.
00:01:54.09 In orange, we have the genus Homo
00:01:56.24 and the species proceeding anatomically modern humans
00:02:01.13 is Homo erectus, dated to about 2 million years ago.
00:02:06.14 And then we have the origins of
00:02:08.15 Homo neanderthalensis
00:02:11.02 and of anatomically modern humans.
00:02:13.24 Neanderthals are thought to have originated
00:02:16.00 somewhere between 300,000-400,000 years ago,
00:02:19.12 and modern humans originated
00:02:20.27 approximately 200,000 years ago.
00:02:24.03 Here's one of the best examples
00:02:26.11 of Australopithecus afarensis.
00:02:29.07 This was a set of fossils that was
00:02:31.24 discovered in the 1970's by Johanson and Gray,
00:02:36.02 named Lucy,
00:02:38.00 and Lucy was about...
00:02:41.04 she lived about 3.2 million years ago.
00:02:43.29 She was very small, only about 3 feet tall,
00:02:46.13 she had a very small brain,
00:02:48.07 and she was bipedal.
00:02:49.27 And being bipedal, in fact,
00:02:51.07 is one of the characteristics of the hominin lineage.
00:02:57.12 And, interestingly,
00:02:59.17 there have been some fossilized footprints
00:03:01.21 identified in Tanzania,
00:03:03.24 and we can see from these that there
00:03:06.08 appears to have been a mother,
00:03:08.27 from the species Australopithecus afarensis,
00:03:12.08 and she was holding the hands of her child.
00:03:14.29 And they must have been walking
00:03:16.15 in ash from recent volcanic activity,
00:03:20.06 and then that ash hardened and preserved these footprints
00:03:23.06 so that we can see them today,
00:03:24.21 and we can clearly see that they were bipedal.
00:03:29.08 So the species preceding modern humans
00:03:31.28 is called Homo erectus.
00:03:33.24 Homo erectus evolved around 2 million years ago,
00:03:39.02 and then after the origin of Homo erectus in Africa,
00:03:42.24 Homo erectus spread across Eurasia
00:03:47.17 and, indeed, shown here are some of the
00:03:49.21 oldest fossils of Homo erectus,
00:03:52.18 dated to as early as 1.9 million years ago (MYA) in Indonesia.
00:04:00.15 And this species was very successful,
00:04:03.14 lasting to as recently as 25,000 years ago
00:04:06.17 in Southeast Asia.
00:04:09.08 A very interesting recent finding was
00:04:11.20 a set of fossils identified on the island of Flores,
00:04:14.26 which is within Indonesia,
00:04:17.25 and these fossils actually show some characteristics
00:04:21.22 that look very similar to Homo erectus,
00:04:24.19 and for that reason it was proposed that
00:04:27.09 this species may have directly evolved
00:04:30.23 from a Homo erectus ancestor
00:04:33.20 that arrived on that island
00:04:36.07 about 1 million years ago
00:04:37.28 and then evolved in isolation.
00:04:39.25 And two of the very unique features of this species
00:04:42.17 is that they were very short, so again,
00:04:46.01 about the same size as Lucy, around 3 feet tall,
00:04:50.15 and secondly, that they had tiny brains.
00:04:53.14 And there's been a lot of debate about
00:04:55.01 whether this is an adaptation or in fact a pathology,
00:04:58.09 and there's still a lot of research being done,
00:05:01.03 but what was clear is that there were multiple species
00:05:04.01 outside of Africa
00:05:05.29 within the past 2 million years.
00:05:08.20 So now let's move on to the origins of
00:05:10.15 Homo neanderthalensis and Homo sapiens.
00:05:13.12 There's some question about the species preceding
00:05:16.28 Neanderthal and Homo sapiens.
00:05:19.17 Some say that it was heidelbergensis,
00:05:22.04 but there's debate about that.
00:05:24.15 However, what is clear is that the Neanderthals species
00:05:28.10 arose somewhere within the past 300,000-400,000 years,
00:05:32.15 and Homo sapiens arose within the past 200,000 years.
00:05:38.04 And this is a fossil from Neanderthals,
00:05:40.29 we can see a few features such as
00:05:44.02 the double arched and very wide brow ridges,
00:05:47.08 a broad nose,
00:05:48.28 a very large brain size,
00:05:50.27 and a retromolar space,
00:05:52.21 and in fact these species were very robust.
00:05:55.16 The males would have been over 6 feet tall,
00:05:57.15 they had very big bones,
00:05:59.19 and they had rather big brains.
00:06:02.20 In fact, here are some reconstructions of Neanderthal.
00:06:06.28 We have the old reconstruction
00:06:09.03 and then the more recent one as well.
00:06:12.11 So, anatomically modern humans, Homo sapiens sapiens,
00:06:16.06 arose approximately 200,000 years ago.
00:06:19.02 In fact, here these red dots
00:06:21.09 are representing locations where fossils have been found
00:06:24.11 of anatomically modern humans,
00:06:26.27 and the oldest fossil is
00:06:28.22 dated to around 150,000-195,000 years ago,
00:06:32.19 in Southern Ethiopia.
00:06:36.23 We also see evidence of early modern human behavior
00:06:40.10 dated to 70,000 years ago,
00:06:42.11 or even as old as 120,000 years ago,
00:06:45.16 in caves in south Africa
00:06:47.13 and also some from east Africa as well.
00:06:51.05 So after modern humans arose in Africa within the past 200,000 years,
00:06:55.08 one or a few small groups of individuals
00:06:57.25 migrated across the rest of the globe
00:07:00.11 within the past 50,000-100,000 years.
00:07:03.23 Indeed, we think that Europeans...
00:07:07.15 there were no people in Europe, actually,
00:07:09.06 until about 40,000 years ago,
00:07:11.13 and then modern humans crossed the Bering Straits
00:07:14.15 and went into the Americas
00:07:16.28 within the past 30,000 years.
00:07:19.05 The earliest migration event was actually into Australo-Melanesia,
00:07:23.11 dated to about 40,000-60,000 years ago.
00:07:26.14 And then we have much more recent migration events,
00:07:29.03 such as into the Pacific Islands,
00:07:31.12 within the past few thousand years.
00:07:34.11 Now, interestingly,
00:07:36.16 when modern humans migrated out of Africa
00:07:39.08 within the past 50,000-100,000 years,
00:07:42.05 they would have run into Neanderthals,
00:07:44.10 in fact they overlapped in their distribution.
00:07:47.08 So shown here is the distribution of Neanderthals,
00:07:50.22 and the modern humans who lived at that time
00:07:52.25 were referred to as Cro-Magnon,
00:07:55.17 and in fact we did not see anatomically modern humans
00:07:59.09 in this region, in Europe, until about 40,000 years ago.
00:08:03.03 They would have been in the Middle East a little bit earlier,
00:08:05.23 but it appears they overlapped
00:08:08.18 for about at least 10,000 years with Neanderthals.
00:08:12.13 And as we'll discuss later,
00:08:13.27 there is some evidence that there could have been actual admixture
00:08:17.05 between Neanderthal and anatomically modern humans
00:08:20.18 during that time.
00:08:22.26 So now I want to discuss the evolutionary forces
00:08:25.27 that influence the patterns of genetic variation
00:08:28.08 that we see today.
00:08:30.04 And these include mutation,
00:08:32.14 genetic drift,
00:08:33.29 migration,
00:08:35.09 and natural selection.
00:08:37.16 So let's first introduce some terminology.
00:08:40.05 The gene pool refers to the set of all genomes
00:08:42.25 in a specified population,
00:08:44.10 and here we have an example from a population of warthogs.
00:08:47.22 So where we have at a single genetic locus
00:08:51.03 two alleles, big B or little b,
00:08:54.17 and here's an example of an individual
00:08:56.11 who is homozygous for the big B allele,
00:08:59.07 and an individual homozygous for the little b allele,
00:09:02.12 and here's an individual who is heterozygous
00:09:05.08 for big B and little b.
00:09:07.12 And together, the set of alleles in that population
00:09:10.19 represents the gene pool.
00:09:13.28 So when we are doing population genetics analyses,
00:09:16.25 we can't actually go out and look at every genotype
00:09:21.00 for every individual in the population,
00:09:23.14 that would be unfeasible.
00:09:25.13 So what we typically do is to
00:09:26.23 infer frequencies by estimating them
00:09:30.10 from a random sample.
00:09:32.25 So in population genetics
00:09:35.01 generation, each new individual
00:09:37.16 is viewed as drawing from a set of gametes
00:09:39.20 with alternative alleles,
00:09:41.08 so let's use an example here
00:09:43.01 in which we have a set of marbles in a bowl.
00:09:46.05 And initially, we have a distribution of
00:09:51.26 60 of the white marbles
00:09:54.13 relative to 40 of the green marbles,
00:09:56.27 and these, the white and the green,
00:09:58.08 are representing different alleles.
00:10:00.14 So let's say that we're gonna pick...
00:10:02.04 we're gonna reach into this bag
00:10:04.04 and we're gonna randomly draw out
00:10:06.09 another hundred of these marbles.
00:10:09.01 And now in the next generation
00:10:10.26 we have 80 of the white and we have 20 of the green.
00:10:15.02 We're gonna reach back in,
00:10:16.01 we're gonna grab another set of a hundred,
00:10:18.09 and now in the next generation
00:10:20.15 we have 100 of the white alleles and 0 of the green.
00:10:26.08 And this is a demonstration of
00:10:27.15 how we get changes in allele frequency over time.
00:10:31.25 Allele frequencies will also change over time
00:10:34.23 due to genetic drift,
00:10:36.21 which is defined as random fluctuations
00:10:39.01 of allele frequencies from generation to generation,
00:10:42.03 simply due to chance.
00:10:44.19 So as we see, sometimes things could happen,
00:10:47.16 like these bugs are getting squashed,
00:10:50.00 and that's gonna change, perhaps,
00:10:52.07 the allele frequency in the next generation.
00:10:55.19 Here's another example from some lady bugs,
00:10:58.23 and we can see that, perhaps,
00:11:01.03 in the next generation, just by chance,
00:11:03.10 we're gonna see more of these ladybugs
00:11:04.29 with the dark colors,
00:11:06.12 or we might see more that are with the medium colors and dots.
00:11:10.16 And the fact is that drift is just an inevitable fact of life.
00:11:16.15 I also want to define what we mean by neutral evolution.
00:11:20.08 So we define a selectively neutral allele
00:11:22.10 as one that does not affect reproductive fitness of individuals
00:11:25.20 who carry that allele,
00:11:27.20 so it's frequency in the population
00:11:29.25 changes by chance or genetic drift alone.
00:11:32.18 And here we have an example:
00:11:35.04 this is just a substitution
00:11:37.22 in the third position of the codon,
00:11:41.02 and when we have substitutions
00:11:44.09 of nucleotides in the third position,
00:11:46.20 very typically they result in a silent or synonymous change.
00:11:51.05 So here there's been a substitution,
00:11:53.00 but there's no change in the amino acid;
00:11:55.02 it remains as valine.
00:11:57.26 So the rate at which genetic drift occurs
00:12:00.01 is going to inversely proportional to the population size, N,
00:12:03.23 and it's going to be very fast in small populations.
00:12:06.27 And here's an example that we can look at
00:12:08.23 based on computer simulation.
00:12:11.20 So let's assume here that we're looking at a single locus
00:12:15.15 and it has two alleles
00:12:18.06 that are at 50% frequency each,
00:12:21.25 as we can see here.
00:12:23.22 We have a sample size of 25,
00:12:27.06 and we're going to do the simulation
00:12:29.03 over 80 generations.
00:12:31.14 Now, each of these lines here
00:12:34.03 represents a different simulation,
00:12:36.27 and what we can see is that
00:12:38.23 over time alleles are either going to
00:12:44.02 be lost from the population
00:12:46.08 or they're going to reach fixation,
00:12:48.17 which means that they go to 100% frequency.
00:12:52.10 And the rate at which this occurs
00:12:54.00 is going to depend on the sample size.
00:12:56.09 So in a small sample it's gonna be very rapid,
00:12:59.19 but in this example where we have a larger sample, now N=300,
00:13:03.26 you can see that it just takes more time.
00:13:05.23 There's not as much genetic drift occurring.
00:13:08.19 Now, the end result is gonna be the same,
00:13:10.15 it just takes more time.
00:13:14.09 The change in allele frequency also is going to depend
00:13:17.27 on the initial allele frequencies.
00:13:19.20 So in this particular case,
00:13:21.05 we've now changed the starting frequency:
00:13:23.20 it's not 50%, it's now 10%.
00:13:27.06 And you can see that there's much more
00:13:29.28 probability of loss of the allele in this case,
00:13:34.11 and here we have just one of the alleles reaching fixation.
00:13:42.08 So again, in this particular case,
00:13:44.05 about 1 out of 10 will eventually become fixed,
00:13:47.14 or reach 100% frequency.
00:13:51.09 Now here's an example from a large population.
00:13:54.01 It'll take longer for this to occur,
00:13:56.02 but the proportion of alleles are gonna be
00:13:58.12 roughly the same,
00:13:59.29 so again roughly 1 out of 10 will go to fixation,
00:14:03.06 it's just gonna take longer.
00:14:05.16 Other important terms in population genetics
00:14:07.26 are bottleneck and founder effects,
00:14:10.08 and this is because genetic drift
00:14:11.23 has a large effect on allele frequencies
00:14:14.10 when a population originates
00:14:16.05 via a small number of people from a larger population.
00:14:19.16 So here we have an example of a bottleneck,
00:14:22.10 and what a bottleneck means is that
00:14:24.01 there's been a decrease in population size
00:14:26.21 at some time in the past.
00:14:28.14 So you can think of it as a population crash.
00:14:31.10 And what happens when the population is very small,
00:14:34.28 you're going to have a higher rate of genetic drift,
00:14:37.12 and we can see here that these alleles,
00:14:39.20 which are represented by the different colors,
00:14:42.00 have shifted from what we're seeing
00:14:44.18 back at this earlier time.
00:14:46.25 Now we go through the bottleneck,
00:14:48.19 and now we're seeing predominantly
00:14:50.07 these white and black alleles.
00:14:53.09 Another example we can look at is a founder event,
00:14:57.20 which is sort of a special case of a bottleneck event.
00:15:00.11 And in this case it's where a population, a small population,
00:15:05.03 breaks off from the larger population,
00:15:07.25 and again there's going to be increased genetic drift
00:15:10.26 in this initially small population
00:15:13.12 and here, by chance,
00:15:15.05 we just happened to see more of these dark blue
00:15:18.12 and light blue alleles.
00:15:21.09 The pattern of variation that we see
00:15:22.23 in the human genome
00:15:24.09 is also dependent on the effective population size,
00:15:27.17 which we distinguish as capital N sub e.
00:15:32.10 And the definition of the effective population size
00:15:35.10 is the number of breeding individuals in a population.
00:15:38.19 So estimates of Ne
00:15:40.17 are most strongly influenced by population sizes
00:15:43.07 when they're at their smallest,
00:15:45.10 and it could take many generations
00:15:47.02 to recover from a bottleneck event.
00:15:49.11 So estimates of Ne in modern populations
00:15:51.21 reflect the size of the population
00:15:53.20 prior to population expansion.
00:15:56.22 Pretty consistently, studies of nuclear sequence diversity in humans
00:16:00.24 have estimated an effective population size
00:16:03.15 of about 10,000.
00:16:05.19 Now, by contrast, if we look at Chimpanzees,
00:16:08.29 the estimate is closer to 35,000.
00:16:12.14 And so what that means is that
00:16:14.01 humans have undergone a bottleneck
00:16:16.18 sometime during their evolutionary history.
00:16:19.22 So the pattern of genomic variation
00:16:21.25 that we see in modern populations today
00:16:24.00 is a reflection of our evolutionary and demographic history.
00:16:27.14 So how much do we differ?
00:16:29.17 Well, identical twins
00:16:31.27 have no differences at the nucleotide level.
00:16:35.06 If we compare unrelated humans,
00:16:36.29 we differ at about 1 out of 1,000 nucleotide sites.
00:16:41.12 And if we compare humans to our closest genetic relative, the Chimpanzee,
00:16:45.02 we differ at about 1 out of 100 sites.
00:16:47.29 So, as a whole, our species is very similar,
00:16:50.27 and that simply reflects our recent common ancestry
00:16:54.05 from Africa within the past 100,000 years.
00:16:57.06 But when you consider that there are
00:16:58.27 over 3 billion DNA bases in the genome,
00:17:02.02 that results in 3 million differences
00:17:04.16 between each pair of genomes,
00:17:06.05 more than enough to generate the diversity
00:17:08.29 that will make each of us unique.
00:17:12.02 Now I want to introduce a statistic
00:17:14.13 that we typically use to look at how much variation
00:17:17.06 there is among populations,
00:17:20.01 and this is referred to as an Fst statistic.
00:17:24.00 And it's simply looking at the proportion of genetic variation
00:17:27.03 that is within populations,
00:17:29.06 relative to that which is between populations.
00:17:32.18 Fst can be measured based upon heterozygosity,
00:17:37.20 and heterozygosity is simply a measure of genetic variation,
00:17:41.26 which is very simply calculated as
00:17:44.15 1 minus the sum of the allele frequencies squared.
00:17:49.09 And so once we calculate
00:17:51.26 the heterozygosity for each locus,
00:17:53.29 we can look at the average,
00:17:55.23 and we can look at the average within a subpopulation,
00:17:58.03 or in the total combined population.
00:18:00.29 Now, just as an example,
00:18:03.15 if we were to see here that
00:18:06.22 in the case of Fst = 1,
00:18:09.12 it means that there is no overlap at all in the allele frequencies.
00:18:13.15 So we can see that in population 1 they have all A's,
00:18:16.13 and in population 2 they have all B's.
00:18:19.15 And in the case of Fst = 0,
00:18:22.18 there is complete similarity,
00:18:26.08 so here we see exactly the same number
00:18:28.13 of A alleles and exactly the same number of B alleles.
00:18:32.01 And then here's an intermediate case
00:18:33.29 where we have about 0.11, 11%,
00:18:39.07 showing that there's just a small amount of differentiation
00:18:43.04 between these two populations.
00:18:46.09 So what do we see in humans?
00:18:47.29 Well, the average Fst between human populations
00:18:51.04 is about 15%,
00:18:53.15 and what that means is that the majority of genetic variation
00:18:56.04 is found within a population,
00:18:59.07 and only about 15% of the genetic diversity
00:19:02.08 differs between populations.
00:19:04.23 Again, this is reflecting our recent common ancestry in Africa,
00:19:09.00 within the past 50,000-100,000 years.
00:19:14.13 Now, interestingly,
00:19:16.09 if we were to do this calculation from Chimpanzee populations,
00:19:19.08 we see that the value is around 32%,
00:19:22.15 so there's actually a lot more differentiation
00:19:25.04 among Chimpanzee populations
00:19:27.07 than among human populations,
00:19:29.18 again reflecting our overall close genetic similarity to each other.
00:19:36.19 So I now want to talk about the
00:19:38.04 different sources of DNA that we use
00:19:40.04 to reconstruct human evolutionary history.
00:19:43.01 One source of DNA is
00:19:45.29 that which is present in the nuclear genome
00:19:48.06 that's located in the nucleus of the cell.
00:19:51.03 And there's another type of genome
00:19:53.20 which is present in the mitochondria of the cell,
00:19:56.15 and the mitochondria is the energy-producing organelle.
00:20:02.13 So what is the difference between these different genomes?
00:20:06.03 Well, the nuclear genome
00:20:08.09 consists of 22 autosomal pairs of chromosomes
00:20:12.26 and then the sex chromosomes,
00:20:14.15 XX for females and XY for males.
00:20:17.27 The nuclear genome is about 3.4 billion bases in size,
00:20:22.02 and it consists of about 20,000 coding genes.
00:20:25.10 It's inherited from both parents,
00:20:27.21 but it also undergoes extensive recombination each generation.
00:20:32.07 But, one of the reasons it's useful is that there's
00:20:34.18 so many different locations where we can study variation,
00:20:38.08 given that there are 3 billion nucleotides,
00:20:41.02 it's just a little bit more difficult to trace them back
00:20:43.29 to a single common ancestor.
00:20:46.20 By contrast, the mitochondria DNA genome
00:20:50.21 is very small, it's only about 16,000 nucleotides in size,
00:20:55.14 and it's circular,
00:20:57.17 and it's passed on only through the maternal lineage.
00:21:00.19 There's also no recombination
00:21:02.17 and it has a very high mutation rate.
00:21:05.00 All of these features make it very useful
00:21:07.01 for tracing evolutionary history.
00:21:09.27 So let me give you another example of what I'm referring to.
00:21:13.12 The mitochondrial DNA is inherited through the maternal lineage,
00:21:17.05 whereas the nuclear DNA is inherited from both parents.
00:21:22.08 So if we were to trace back from a present day individual,
00:21:25.26 they will have inherited their nuclear genome
00:21:28.20 from their parents,
00:21:30.17 their parents would have inherited from their set of parents,
00:21:33.28 and then their set of parents, and so on.
00:21:36.15 So we can trace it back to a large number of ancestors.
00:21:39.16 But by contrast, if we're tracing back mitochondrial DNA lineages,
00:21:44.00 we can see that they're only passed on
00:21:46.25 through the maternal lineage,
00:21:49.10 so they're essentially inherited from a single lineage.
00:21:52.03 We can trace them back to a single common female ancestor,
00:21:56.01 and that's why they're been very useful
00:21:57.29 for human evolutionary genetics studies.
00:22:00.21 So for example, if we were to consider
00:22:02.26 these dots to be mitochondrial DNA lineages,
00:22:06.20 and let's start at generation 11 at the bottom,
00:22:10.12 shown by the red dots,
00:22:12.06 and imagine those are different mitochondrial DNA sequences
00:22:15.00 from different individuals.
00:22:17.10 At some time in the past, these two individuals, for example,
00:22:22.06 coalesced back to a common ancestor,
00:22:24.26 and then this group coalesces back to a common ancestor here,
00:22:29.29 and ultimately they all coalesce back
00:22:32.20 to a single common ancestor.
00:22:35.03 Now, in the popular literature,
00:22:36.22 the single common ancestor for mitochondrial DNA
00:22:39.04 is often referred to as "mitochondrial Eve",
00:22:42.21 but one thing to remember is that
00:22:45.17 Eve was not alone, she lived within a population,
00:22:49.06 as we can see here by the other colors.
00:22:51.22 But those lineages just never made it
00:22:54.22 down to the present day.
00:22:57.25 So this is a phylogenetic tree
00:23:00.11 constructed by sequencing mitochondrial DNA
00:23:03.10 whole genome lineages
00:23:05.02 from ethnically diverse individuals.
00:23:07.19 So each individual actually represents
00:23:10.29 a branch on this tree,
00:23:13.02 and if two individuals are very closely related to each other,
00:23:16.05 they'll be very close to each other
00:23:19.01 in the tree.
00:23:21.03 So one of the first things you can see
00:23:22.19 using Chimpanzee as an outgroup
00:23:25.01 is that all modern human lineages
00:23:27.25 coalesce at about 170,000 years ago,
00:23:31.12 and so that corresponds very well with the
00:23:33.05 time of origin of anatomically modern humans.
00:23:36.23 So another thing that we can see is that
00:23:39.25 all of the oldest genetic lineages
00:23:42.26 are from African individuals.
00:23:45.22 We can also see that
00:23:48.12 the very oldest lineages
00:23:50.15 are from the San and the Mbuti pygmy hunter-gatherers,
00:23:54.28 and then the more recent lineages
00:23:57.13 are from non-African populations.
00:24:00.01 And that is a pattern that's very consistent
00:24:02.17 with the model of a recent African origin
00:24:05.12 of modern humans.
00:24:07.23 Now, another way that we can compare mitochondrial DNA sequences
00:24:11.21 is to simply count up the number of sites
00:24:14.04 at which they differ when we compare any pair of sequences.
00:24:17.23 And when we do this,
00:24:19.09 we observe that
00:24:22.11 any two African lineages will differ from each other
00:24:25.03 at many more sites than any two non-African lineages.
00:24:29.06 And again, that means that there has been more time
00:24:32.02 for variation to accumulate in Africa,
00:24:34.16 and is consistent with an African origin
00:24:37.08 of modern humans.
00:24:39.20 When we sequence the mitochondrial DNA lineages,
00:24:42.21 we can classify them as haplotypes,
00:24:45.10 and those haplotypes belong to
00:24:47.16 larger subsets of haplogroups.
00:24:50.01 You can think of a haplotype as simply
00:24:52.14 the arrangement of genetic variants along a chromosome,
00:24:55.19 or in the case of the mitochondrial DNA
00:24:57.22 there's just a single genome,
00:24:59.14 so it's really just the different nucleotide differences
00:25:02.27 amongst different mitochondrial DNA lineages.
00:25:06.24 And one of the first things that you can note is that
00:25:09.26 there are different haplogroups
00:25:11.29 in different regions of the world.
00:25:13.19 So here are some that seem to be pretty specific to Africa,
00:25:16.20 but are also present in some regions
00:25:18.20 where there may have been some gene flow
00:25:20.20 from Africa.
00:25:22.21 Then we have others that may be more common in Europe,
00:25:25.12 or in east Asia,
00:25:28.18 or in the Americas.
00:25:30.19 And for that reason,
00:25:32.11 mitochondrial DNA can be very useful for
00:25:34.11 tracing recent human migration events.
00:25:38.13 Now, by contrast,
00:25:40.02 the Y chromosome is also inherited with no recombination,
00:25:45.14 and so it can also be very useful for tracing back
00:25:48.01 through the male lineages.
00:25:50.16 And here is a phylogeny constructed from Y chromosome variation,
00:25:55.07 and as with the mitochondrial DNA,
00:25:58.08 what we see is that the oldest lineages
00:26:01.19 are specific to Africans,
00:26:04.02 and the more recent lineages
00:26:06.05 are found predominantly in Non-Africans,
00:26:08.13 although we do see some in Africans as well.
00:26:11.25 Again, this is consistent with the recent African origin of modern humans.
00:26:18.14 We can also look at Y chromosome haplogroups,
00:26:22.09 and one of the things that's a little bit different
00:26:24.04 is you can see that they're a bit more differentiated
00:26:26.16 between geographic regions.
00:26:29.03 So for example,
00:26:30.24 here we just see haplogroups that are in blue,
00:26:34.04 and we see very distinct haplogroups
00:26:36.20 in the Americas, shown in purple.
00:26:39.26 And one of the reasons for that may have to do with
00:26:43.08 sex-biased migration,
00:26:46.01 that you may have, for example,
00:26:47.16 one male traveling long distances.
00:26:50.06 And it may also have to do with patterns of mating structure.
00:26:54.20 So for example, in some populations or ethnic groups,
00:26:57.23 you may have one male who has many different wives,
00:27:01.05 and because of that the effective population size of the Y chromosome
00:27:07.01 is actually smaller than the mitochondrial DNA,
00:27:09.28 and we tend to get more genetic differentiation
00:27:12.27 around the world.
00:27:15.07 So now I want to talk about analyses of ancient DNA,
00:27:18.27 for example, in this case from Neanderthal,
00:27:22.12 and these are some images of scientists
00:27:25.20 working on a Neanderthal fossil.
00:27:29.10 And this type of analysis is very challenging
00:27:32.01 for a number of reasons.
00:27:33.25 One is that DNA which is that old,
00:27:38.04 on the order of say 30,000 years old
00:27:40.10 to even 100,000 years old,
00:27:42.06 is going to be highly degraded,
00:27:44.24 and if there's any contamination
00:27:46.25 with modern human DNA,
00:27:49.02 that is much more likely to amplify
00:27:51.19 than the old degraded DNA
00:27:54.01 from the archaic species,
00:27:56.21 so one has to be extremely careful when analyzing this DNA.
00:28:01.03 Now, more recently,
00:28:02.24 there was a pinky finger bone
00:28:05.07 identified in a cave in Siberia
00:28:07.22 from a region called Denisova,
00:28:10.11 so it's referred to as the Denisova
00:28:13.21 or Denisovan genome.
00:28:16.11 Here I'm presenting a phylogenetic tree
00:28:18.29 based on mitochondrial DNA variation
00:28:21.24 comparing modern humans, shown in blue here,
00:28:26.09 to Neanderthals shown in red,
00:28:29.01 and the Denisova individual shown in yellow.
00:28:32.23 And what we can see is that the
00:28:34.17 time to most recent common ancestry in humans,
00:28:37.08 as we've already discussed,
00:28:39.00 is about 200,000 years ago.
00:28:41.13 The time to most recent common ancestry
00:28:43.14 between humans and Neanderthals
00:28:46.01 is about 500,000 years ago,
00:28:48.13 for the mitochondrial DNA lineages.
00:28:51.03 And the time to most recent common ancestry
00:28:53.20 with the Denisova mitochondrial lineages
00:28:57.08 is about 1 million years ago.
00:29:00.05 So this is demonstrating a couple of things.
00:29:02.20 From the mitochondrial DNA perspective,
00:29:05.07 there's no evidence of any admixture
00:29:07.13 with anatomically modern humans.
00:29:10.02 The Neanderthal sequences are clearly
00:29:12.18 very distinct from modern humans.
00:29:14.28 It's also showing you that there was another species, Denisova,
00:29:18.15 that appears to be distinct from the Neanderthals,
00:29:21.07 and they diverge even earlier than Neanderthals
00:29:24.09 from modern humans.
00:29:26.21 So if we were to compare pairwise nucleotide diversity,
00:29:31.01 for example,
00:29:33.02 among anatomically modern humans shown in blue,
00:29:35.24 you can see that there's not a lot of diversity,
00:29:38.15 as expected considering that
00:29:40.13 we all have a very recent common ancestry.
00:29:43.04 If you compare the modern human mitochondrial genomes to Neanderthal,
00:29:48.03 you can see that they're more divergent,
00:29:50.07 as expected, given that the mitochondrial DNA lineage
00:29:54.04 diverged about 500,000 years ago.
00:29:57.02 If we compare to the
00:29:59.03 Denisovan mitochondrial DNA lineage,
00:30:01.10 they're even more divergent.
00:30:04.04 And then if we compare to Chimpanzee,
00:30:06.14 of course as expected,
00:30:08.11 given that they diverged at least 5 million years ago,
00:30:11.14 they are the most different in terms of sequence variation.
00:30:15.13 Now, several years ago
00:30:18.13 there was a draft sequence produced of
00:30:21.20 the Neanderthal genome using next-generation sequencing technology.
00:30:25.25 And this was an absolutely amazing feat,
00:30:28.17 but at the time they had very low coverage,
00:30:31.07 meaning that any particular region of the genome
00:30:33.19 was sequenced only about once or twice.
00:30:36.20 Now, more recently,
00:30:38.07 as the technology has improved,
00:30:40.05 they've gotten much better coverage of the Neanderthal sequence,
00:30:43.04 and quite recently they now have a 30-fold coverage,
00:30:46.22 meaning that on average most sites
00:30:49.03 will have sequenced 30 times.
00:30:51.22 And so you'll have a much better accuracy
00:30:54.23 when determining nucleotide variation.
00:31:01.07 So, when the Neanderthal genome
00:31:03.25 was compared to the human genome,
00:31:06.11 what you can do is first
00:31:08.10 look at how much divergence has occurred
00:31:11.02 since modern humans differentiated from Chimpanzees
00:31:15.10 within the past 6.5 million years.
00:31:18.12 And you can look at the divergence
00:31:20.24 that has occurred specifically in the human lineage
00:31:24.06 since they diverged from Neanderthal,
00:31:26.21 and they've only accumulated
00:31:29.07 about 8% of this total divergence.
00:31:34.08 And so the estimate of the time of population divergence
00:31:38.06 between humans and Neanderthals
00:31:40.15 is about 400,000 years ago.
00:31:43.09 Furthermore, it has been estimated that
00:31:45.24 there may have been a small amount of admixture
00:31:48.16 between Neanderthals and anatomically modern humans,
00:31:52.01 as shown by this red arrow here.
00:31:54.18 So the estimated amount of admixture is about 1-2%,
00:32:00.15 of the modern human genome,
00:32:02.17 may be of Neanderthal ancestry.
00:32:05.03 But what is of interest is to note that
00:32:07.24 this is only present in Non-Africans.
00:32:10.13 It is not present in African genomes.
00:32:13.05 And so what we can infer from that is
00:32:15.16 that this admixture event probably occurred
00:32:18.25 before modern humans spread across the globe.
00:32:22.01 It may have occurred, for example, in the Middle East,
00:32:24.28 and that's why we're seeing it present in all Non-Africans,
00:32:29.18 and we don't see it at all in Africans.
00:32:32.15 Now, more recently, there has been
00:32:34.22 whole genome sequencing of the Denisovan individual,
00:32:39.20 and what that has shown is that
00:32:42.09 the Denisovan species, or this individual,
00:32:45.15 appears to have diverged from modern day humans
00:32:48.13 around 800,000 years ago,
00:32:51.09 consistent with what we saw from the mitochondrial DNA.
00:32:55.21 They also observed low levels of heterozygosity in Denisova,
00:32:59.21 suggesting that they may have had
00:33:01.19 a small population size.
00:33:04.06 Additionally, when a phylogenetic tree
00:33:07.24 was constructed from the nuclear DNA variation,
00:33:11.13 they could see that the modern humans
00:33:15.11 tend to cluster together,
00:33:17.09 and as we expect they're divergent
00:33:19.01 from the Denisova and the Neanderthals.
00:33:21.29 The Neanderthals tend to cluster together,
00:33:24.06 so they're clearly divergent from Denisova.
00:33:27.03 But what's interesting is if you look at how much
00:33:31.01 variation there is amongst the modern humans,
00:33:34.11 as indicated by the length of these lineages,
00:33:38.06 and then you compare that to Neanderthals,
00:33:40.14 which have very short branches.
00:33:43.06 What that suggests is
00:33:44.28 that there was not a lot of genetic variation
00:33:47.09 amongst the Neanderthals,
00:33:49.23 and therefore they may have undergone a bottleneck,
00:33:52.11 so they might have undergone a population crash
00:33:54.20 at some point in the past.
00:33:57.07 So in summary,
00:33:59.04 what we can see is that
00:34:01.23 Homo erectus left Africa
00:34:04.05 within the past 2 million years,
00:34:06.28 and spread throughout Eurasia,
00:34:09.09 giving rise, possibly,
00:34:11.09 to species like Homo floresiensis,
00:34:14.17 and surviving until quite recently,
00:34:17.12 as recently as around 25,000 years ago.
00:34:20.28 Then we have other species like Neanderthal and Denisovans,
00:34:27.02 who may have originated from a different species,
00:34:30.07 such as heidelbergensis,
00:34:33.10 and they differentiated sometime
00:34:36.12 around 600,000 or 700,000 years ago in the case of Denisova,
00:34:39.29 or in Neanderthals around 400,000 years ago.
00:34:43.05 And then we have the modern human lineage,
00:34:46.11 Homo sapiens,
00:34:49.00 which arose around 200,000 years ago
00:34:51.07 and spread out of Africa.
00:34:53.21 And when they did so,
00:34:55.02 they would have encountered these other species,
00:34:57.09 and there may have then been low levels of gene flow.
00:35:01.20 And in fact for the case of the Denisovan genome,
00:35:03.23 it appears that the gene flow
00:35:05.26 was predominantly with populations from Oceania,
00:35:10.01 implying that this admixture
00:35:12.17 may have occurred in a different location and a different time.
00:35:16.00 Now, we still don't know exactly
00:35:18.05 how much admixture there may have been
00:35:20.12 between archaic species
00:35:22.23 and modern humans in Africa,
00:35:25.01 but there's some preliminary data suggesting that
00:35:27.10 this has occurred there as well.
00:35:29.14 The problem is that the fossils don't preserve as well in Africa,
00:35:32.19 so we don't have any DNA sequences
00:35:34.26 from archaic lineages in Africa at this point.
00:35:40.01 So in conclusion,
00:35:41.18 Africa has the most genetic diversity in the world.
00:35:44.15 Human dispersions out of Africa
00:35:46.11 populated the entire world,
00:35:48.15 and we are the last of a series of hominin dispersal events
00:35:51.14 out of Africa.
African Genomics: African Population History
Concepts: Genetic diversity in African and African American populations, evolutionary history of humans
00:00:07.20 So in the second part of this lecture series,
00:00:10.13 I'm going to be discussing
00:00:12.03 African population history
00:00:14.02 based on patterns of genetic diversity.
00:00:18.23 So why do I think it's important
00:00:20.08 that we study African genetic variation?
00:00:22.28 Well, for one,
00:00:24.17 if we want to learn more about modern human origins,
00:00:26.23 we need to be looking in Africa,
00:00:28.15 which is the site of modern human speciation.
00:00:32.05 Secondly, if we want to learn more about African-American ancestry,
00:00:36.14 this will be an important region to study.
00:00:40.21 Third is that Africa is a region
00:00:42.13 with a very high level of infectious disease,
00:00:44.27 with HIV, malaria, and TB being three of the biggest killers,
00:00:49.21 but there's also an increasing level of
00:00:51.23 non-communicable diseases like diabetes, for example,
00:00:55.08 and cardiovascular disease.
00:00:57.11 And African populations have been greatly underrepresented
00:01:00.22 in the biomedical research,
00:01:03.00 and so we really need to give more focus
00:01:05.10 to these populations so that we can come up with better diagnostics
00:01:08.27 and better treatments for these diseases.
00:01:13.11 And lastly, we know that people differ in regards to drug response,
00:01:17.10 and this is likely due to variation at drug metabolizing genes,
00:01:21.02 but again, we currently know very little
00:01:23.03 about the extent of variation among Africans at these loci.
00:01:30.17 So first I have to give you a little bit of information
00:01:32.22 about African population history.
00:01:35.06 There are over 2,000 ethnic groups in Africa
00:01:37.26 speaking distinct languages,
00:01:40.12 and these languages have been classified
00:01:42.15 into four different language families.
00:01:45.17 So in blue are languages
00:01:48.15 classified as Afro-Asiatic.
00:01:50.27 They're found predominantly in the north and northeast of Africa,
00:01:55.23 and these would include, for example,
00:01:57.13 Semitic languages which are also spoken in the Middle East,
00:02:01.02 and they would also include Cushitic languages
00:02:04.14 spoken in northeast Africa.
00:02:07.02 And then in red we have populations
00:02:10.14 that are speaking Nilo-Saharan languages,
00:02:13.10 these tend to be pastoralist groups,
00:02:15.20 like the Maasai for example, who live in Kenya and Tanzania.
00:02:19.07 And these populations are mainly found
00:02:21.12 in central and eastern Africa
00:02:24.28 although there are a few groups who have migrated
00:02:27.14 to the west of Africa.
00:02:30.00 The most broad-spread language family
00:02:34.27 consists of the Niger-Kordofanian languages,
00:02:37.20 shown in yellow or orange here.
00:02:40.21 And the most common subfamily
00:02:43.18 is the family of Bantu languages.
00:02:47.06 Now, those are thought to have originated in Cameroon or Nigeria
00:02:51.02 around 5,000 years ago,
00:02:53.18 together with the development of iron tool technology,
00:02:56.26 which led to much better methods for practicing agriculture.
00:03:02.27 And so these populations
00:03:04.25 had a technological advantage in a sense,
00:03:07.20 and they were able to rapidly
00:03:09.29 expand across Africa into east Africa
00:03:13.12 and then south Africa,
00:03:15.14 or from west Africa
00:03:20.18 along the western coast into southern Africa.
00:03:24.14 The fourth language family, shown in green here,
00:03:28.05 is classified as Khoisan,
00:03:30.17 and it consists of languages that have click consonants.
00:03:34.03 So these are found predominantly
00:03:36.20 amongst the San hunter-gatherers in southern Africa,
00:03:40.24 and also amongst two groups called the Hadza and the Sandawe,
00:03:45.08 who live in Tanzania.
00:03:47.18 Now, despite the importance of studying Africa,
00:03:50.23 there have been relatively few genomics studies in that region,
00:03:54.05 and there's a number of reasons for that,
00:03:56.04 and one of which is just the challenges of
00:03:58.16 doing research in areas that sometimes
00:04:00.22 have little infrastructure.
00:04:02.25 And so I wanted to show you some examples of
00:04:05.21 the field work that we've done over the past 12 years.
00:04:08.25 We've mainly been studying
00:04:10.18 minority populations in Africa
00:04:12.13 who practice indigenous lifestyles,
00:04:14.22 and they live in very remote areas,
00:04:16.14 so we have to, for example, have a 4-wheel drive vehicle,
00:04:21.03 and this work has been done no only by myself,
00:04:23.21 but by my students and postdocs
00:04:25.27 and African collaborators over many years.
00:04:30.22 So here's an example, I like this,
00:04:32.13 it shows my postdocs Alessia Ranciaro and Simon Thompson,
00:04:37.06 and they were doing an expedition in Ethiopia in 2010.
00:04:41.04 We basically have to bring all of our lab equipment with us,
00:04:44.28 and I like this because it shows both the outside perspective of the car,
00:04:48.04 and also the inside perspective.
00:04:51.25 These are some of the other challenges that they faced.
00:04:54.20 They were there during the wet season,
00:04:56.03 making it extremely challenging to travel.
00:05:02.01 In each of these regions,
00:05:03.19 we typically start by doing what you could think of as
00:05:06.05 "Town Hall meetings", in which we explain the research
00:05:09.04 to the community,
00:05:11.01 and we explain both the risks and the benefits,
00:05:12.23 and make sure that they understand
00:05:14.11 why we're doing this research,
00:05:16.00 and how it might benefit or not benefit the community.
00:05:19.00 Ultimately though,
00:05:20.25 we have to obtain individual informed consent
00:05:23.12 to do this research.
00:05:27.09 We also measured a number of phenotypes,
00:05:29.11 like height and weight.
00:05:32.12 More recently, we've been looking at more detailed
00:05:34.29 anthropometric cardiovascular and metabolic traits.
00:05:41.13 From each of these samples,
00:05:42.24 we typically obtain blood intravenously,
00:05:45.29 and we've started to also obtain RNA.
00:05:50.07 But one of the challenges is processing these samples
00:05:53.12 in regions where there's no electricity.
00:05:55.25 So here's an example where we set up the so-called
00:05:58.19 "Bush Lab": we had to set up our centrifuge
00:06:00.28 and hook it up to the car battery.
00:06:05.02 But in other areas, we can find a local clinic,
00:06:07.08 they'll often have a generator,
00:06:09.06 and so then we're able to hook up a larger centrifuge.
00:06:12.06 One of the ways in which we obtain DNA...
00:06:15.20 and the DNA, I should note,
00:06:17.08 is only present in the white cells of blood,
00:06:19.24 so the first thing we're gonna do is we're gonna
00:06:21.13 break open all the red cells.
00:06:23.27 And we do that by adding a solution
00:06:27.00 that's going to cause them to burst open.
00:06:29.25 Then we're going to spin down the samples in this centrifuge,
00:06:34.04 and we have to repeat this several times,
00:06:36.09 and we're gonna end up with these little pellets at the bottom
00:06:39.01 of the white cells, and that's where we're gonna find the DNA.
00:06:45.10 Here are some other challenges of processing in the field.
00:06:48.13 After we've isolated the DNA,
00:06:50.03 we add another buffer, which is going to
00:06:53.01 preserve the samples at room temperature,
00:06:55.21 but here's a case where Simon Thompson
00:06:57.15 actually had to bring a generator with him
00:06:59.26 and set up the entire lab in the bush
00:07:02.18 when he was studying the Hadza hunter-gatherers of Tanzania.
00:07:08.15 Another very important thing
00:07:11.07 is to increase training and capacity building in Africa
00:07:15.13 so that they can do this research themselves,
00:07:17.21 and that's something that I've spent a lot of time doing,
00:07:20.09 and I think is very important.
00:07:23.23 Also equally important is actually
00:07:25.02 returning results to participants,
00:07:28.01 and it's really surprising how little this is done,
00:07:31.07 but I can assure you that people
00:07:32.26 really appreciate it when we return the results,
00:07:36.06 and I think it's also an ethical obligation
00:07:38.20 so that they can benefit from what we learn from these studies.
00:07:44.13 So I want to start by talking about some of the phenotypic variation
00:07:47.10 that we see in Africa.
00:07:49.03 This is an example of skin melanin levels,
00:07:52.07 or skin pigmentation.
00:07:54.09 So, the higher the value here,
00:07:56.15 the darker the skin color.
00:07:59.01 And I wanna just make the point that
00:08:00.16 we see a lot of variation in skin pigmentation levels
00:08:04.27 across diverse Africans.
00:08:07.18 And one of the things that we're interested in looking at is
00:08:10.10 correlations with vitamin D for example,
00:08:12.10 because we know that vitamin D is produced by UV light,
00:08:16.21 and that people with darker skin
00:08:18.12 may produce less vitamin D, for example.
00:08:21.02 And vitamin D can have important health implications,
00:08:23.15 so this is relevant to know.
00:08:25.29 It's also an interesting trait to look at how people
00:08:28.09 have adapted to different environments.
00:08:31.21 Here are the results of a principal component analysis
00:08:34.17 for a number of cardiovascular traits,
00:08:37.11 and these are different populations.
00:08:40.27 If the populations cluster close to each other,
00:08:43.19 it means that they're very similar for these traits,
00:08:45.28 and we've color-coded them based on shared language and ethnicity.
00:08:50.26 And what's interesting is that they tend to cluster
00:08:53.04 based on language and culture.
00:08:55.12 So here are the Nilo-Saharan speakers,
00:08:57.06 here are the Afro-Asiatic speakers,
00:08:59.18 and in yellow here are the Niger-Kordofanian speakers,
00:09:04.08 but we see two exceptions.
00:09:06.13 These are two groups that live on the coast of Kenya,
00:09:09.00 in geographic proximity to the Bantu-speaking groups,
00:09:13.10 suggesting that not only are genetic factors important,
00:09:16.02 but environment factors are probably quite important as well.
00:09:22.18 And here we can see tremendous variation
00:09:25.16 for height, weight, and BMI in Africa.
00:09:29.00 And again, we're seeing that
00:09:31.07 populations tend to cluster based on shared ethnicity,
00:09:35.02 and at the extremes
00:09:36.23 we have the very short statured pygmies from central Africa,
00:09:40.13 and then we have the very tall and thin individuals
00:09:43.27 from Kenya and other places... and the Sudan.
00:09:49.02 And so, as we'll talk about in the last section of my lecture series,
00:09:52.18 this may be due to adaptation to different environments.
00:09:58.17 So now I want to tell you about the patterns of
00:10:00.21 genetic variation and genetic structure in Africa,
00:10:04.19 and this is based on a paper that we published several years ago,
00:10:08.14 in which we looked at genome-wide variable markers,
00:10:13.21 and these were genotyped in over 2,500 Africans
00:10:17.06 from 121 ethnic groups
00:10:19.04 that are shown by these dots here.
00:10:21.12 But note that even though this was
00:10:23.06 more than had ever been done before,
00:10:25.16 it still represents just a fraction of the
00:10:27.16 2,000 ethnic groups in Africa,
00:10:30.06 so we're still missing a lot of the variation.
00:10:33.06 We then looked at 98 African-Americans
00:10:36.20 from 4 regions in the US
00:10:38.22 and a comparative dataset of about 1,500 non-Africans.
00:10:44.13 So let me first tell you about the levels of genetic variation that we saw,
00:10:48.05 and that's indicated by the height of this bar.
00:10:51.03 And I've color-coded this by geographic region,
00:10:53.20 so shown in orange are people from Africa,
00:10:57.01 and as nearly every study has shown,
00:10:59.11 Africans have the highest level of genetic variation.
00:11:02.27 And then we see decreasing variation
00:11:05.00 as we move west to east
00:11:07.00 across Eurasia into
00:11:09.06 East Asia, Oceania, and the Americas.
00:11:13.10 So the patterns of genetic diversity that we're seeing
00:11:16.10 simply reflect our evolutionary and demographic history.
00:11:20.10 We see the highest levels of diversity in Africa,
00:11:22.20 which is the site of origin of modern humans,
00:11:25.11 and then when small groups of people
00:11:27.23 migrated out of Africa within the past 50,000-100,000 years,
00:11:32.03 there was a population bottleneck,
00:11:34.06 and so we see a decrease in genetic diversity.
00:11:37.26 And as humans migrated west to east across Eurasia
00:11:41.07 and into the Americas
00:11:43.00 and into Oceania and so on,
00:11:45.01 there were a series of founding events and again,
00:11:47.16 a concomitant decrease in genetic diversity.
00:11:51.21 So this is a phylogenetic tree
00:11:54.01 constructed based on pair-wise genetic distances
00:11:56.14 between populations.
00:11:58.07 You can't see any details on this tree,
00:12:00.13 I just want to point out some overall trends.
00:12:02.27 And I've color-coded these such that
00:12:05.24 the populations shown in black,
00:12:08.21 the black branches, are non-Africans,
00:12:12.11 and then the Africans are shown here.
00:12:14.23 So the first thing that you can see from this tree
00:12:16.27 is that non-Africans are distinguished from Africans,
00:12:20.19 and that the non-African populations
00:12:22.19 are clustering by major geographic region.
00:12:25.25 So we have people from India, central Asia, Europe,
00:12:29.07 Middle East, east Asia, and the Americas,
00:12:34.01 and then Oceania.
00:12:36.10 And even within Africa,
00:12:38.13 populations are clustering by major geographic region,
00:12:41.22 so here are populations from the north of Africa,
00:12:44.02 from eastern Africa,
00:12:45.15 from west-central Africa,
00:12:47.17 and then from southern Africa,
00:12:49.10 with one exception:
00:12:51.25 down here, at the root of this tree,
00:12:54.08 we see the San hunter-gatherers from southern Africa,
00:12:58.22 but clustering near the San are the pygmies,
00:13:01.17 who today live in central Africa.
00:13:04.08 And that's really intriguing and maybe telling us something
00:13:06.16 about the history of these populations,
00:13:08.24 and I'll discuss that more in a moment.
00:13:13.14 Now, we can also compare genetic distances,
00:13:17.02 which are shown on the y-axis,
00:13:19.10 to geographic distances between pairs of populations,
00:13:22.20 shown on the x-axis.
00:13:24.28 And we see a significant positive correlation,
00:13:28.19 but we can also see a lot of scatter here.
00:13:32.01 And what that means is that there are some populations
00:13:34.26 that are geographically very close,
00:13:38.17 but they're genetically very different,
00:13:41.17 and those probably represent recent migration events
00:13:44.24 of genetically differentiated populations.
00:13:47.19 And then on the other end of the spectrum,
00:13:50.02 we have some populations that are genetically very similar to each other,
00:13:53.27 but geographically very far apart.
00:13:56.21 And those may reflect, for example,
00:13:58.24 the Bantu people, who migrated from western Africa
00:14:02.03 to eastern and southern Africa,
00:14:03.18 so they're gone quite a long geographic distance,
00:14:07.03 but genetically they're still very similar to each other.
00:14:11.07 So now I want to move away from looking at populations
00:14:13.28 and I want to talk about looking at variation amongst individuals.
00:14:18.20 And the first thing I want to show you is
00:14:20.28 a principal component analysis based on individual genotypes.
00:14:25.07 And so each of these circles
00:14:28.10 actually represents a person,
00:14:30.16 and if they cluster together
00:14:32.21 it means that they're genetically similar to each other.
00:14:35.22 So, as shown here, the first principle component
00:14:38.15 accounts for as much of the variability in the data as possible,
00:14:42.08 and each succeeding component
00:14:44.08 accounts for as much of the remaining variability as possible.
00:14:47.28 So the first principal component
00:14:50.09 essentially is differentiating
00:14:52.24 the African groups
00:14:55.04 from the non-African groups.
00:14:57.14 The second principal component
00:14:59.23 is differentiating the Native Americas,
00:15:03.05 Eastern Asians,
00:15:04.20 and Oceanin populations
00:15:06.12 from the rest of the world.
00:15:07.26 And the third principal component
00:15:09.27 is distinguishing the Hadza hunter-gatherers from Tanzania
00:15:13.11 from the rest of the world.
00:15:15.12 This next result is based on a probabilistic analysis
00:15:20.24 that simultaneously infers ancestral population clusters,
00:15:26.11 which are represented by the different colors shown here,
00:15:29.27 and then we have...
00:15:31.28 this is actually composed of a series of lines,
00:15:34.21 and each line represents an individual.
00:15:37.18 And an individual can have mixed ancestry
00:15:42.04 from different ancestral population clusters.
00:15:45.18 So what we tend to see outside of Africa,
00:15:48.03 which is shown along the bottom here,
00:15:50.10 is that individuals are clustering
00:15:52.05 by major geographic region.
00:15:54.08 So, in blue we have individuals
00:15:56.26 who self-identify as European or Middle Eastern,
00:16:00.27 and then here we have individuals from southern India,
00:16:04.25 here we have individuals from Pakistan,
00:16:08.09 central Asia,
00:16:09.27 east Asia,
00:16:11.03 Oceania,
00:16:12.29 and the Americas.
00:16:14.27 But what I want you to note is all the colors
00:16:17.27 that we see here in Africa.
00:16:20.22 That's representing the very large amount of genetic diversity,
00:16:24.15 not only within,
00:16:26.11 but among African populations,
00:16:28.15 compared to the whole rest of the globe.
00:16:31.20 I'll just point out a couple of trends.
00:16:35.10 In orange colors are populations from central and west Africa
00:16:38.27 who speak Niger-Kordofanian and Bantu languages.
00:16:43.09 In purple are populations
00:16:45.17 that speak Afro-Asiatic languages
00:16:47.21 and originated from northern or northeast Africa.
00:16:51.26 In red are populations that speak Nilo-Saharan languages
00:16:55.23 and they most likely originated from southern Sudan.
00:17:01.11 We have populations that are speaking Chadic languages,
00:17:05.16 a group called the Fulani who are nomadic pastoralists.
00:17:08.28 Most of the north Africans
00:17:10.27 have a lot of European or Middle Eastern admixture.
00:17:14.23 And then we have the hunter-gatherer groups,
00:17:16.15 like the Hadza,
00:17:18.09 the Sandawe,
00:17:19.23 pygmies from central Africa,
00:17:21.22 and the San hunter-gatherers from southern Africa.
00:17:26.08 Now, we repeated this analysis within Africa,
00:17:30.01 and again we inferred 14 ancestral population clusters,
00:17:34.15 but for ease of viewing I'm just going to pool individuals together
00:17:37.20 and show them as pie charts.
00:17:39.25 Now, first I'm showing you the 3 populations
00:17:42.13 that had been studied as part of the
00:17:44.07 HapMap and Thousand Genomes Initiative.
00:17:47.06 These are NIH-funded programs
00:17:50.22 to characterize genetic variation
00:17:52.28 across ethnically diverse human populations
00:17:56.02 and making that data publically available
00:17:58.02 so that it could be used by a wide range of
00:18:00.16 biomedical research scientists.
00:18:04.00 Now, what I want to point out is that
00:18:05.15 when we look at the rest of Africa,
00:18:08.15 we see quite a bit more variation.
00:18:11.27 And so, for example, populations in east Africa
00:18:15.08 look distinct from populations in western Africa,
00:18:19.25 northern,
00:18:21.04 and southern Africa.
00:18:23.00 It's also interesting
00:18:24.16 because we can see remnants of historic migration events.
00:18:27.11 So for example, I mentioned to you the Bantu migration.
00:18:30.18 The people who speak Niger-Kordofanian or Bantu languages
00:18:33.23 are represented by shades of orange,
00:18:35.26 and you can actually see that they appear
00:18:38.13 to have originated, as I said,
00:18:40.11 in Cameroon/Nigeria region,
00:18:43.03 and then they migrated
00:18:45.10 across Africa into eastern Africa,
00:18:48.03 where they admixed with the indigenous populations there,
00:18:51.19 and they also migrated into southern Africa,
00:18:54.05 where the admixed with the populations there.
00:18:57.05 We can also see remnants of migration of individuals
00:19:01.08 from northeast Africa who speak Afro-Asiatic languages
00:19:04.28 into Kenya and Tanzania.
00:19:07.22 We see migration of people who speak Nilo-Saharan languages,
00:19:11.20 originating from southern Sudan.
00:19:13.11 There was one group that went west,
00:19:16.07 and we think that some of these people who speak Chadic languages,
00:19:20.19 which are actually classified as Afro-Asiatic,
00:19:22.25 genetically they look very similar to the Nilo-Saharans.
00:19:26.02 So in fact there may have been a language substitution
00:19:28.15 at some point in the past.
00:19:30.23 And then we have migration of the Nilo-Saharan pastoralists
00:19:34.08 into Kenya and into Tanzania.
00:19:38.23 We can also see that some of the hunter-gatherer groups are very distinct.
00:19:42.27 Here are the Hadza hunter-gatherers, who speak with a click in Tanzania.
00:19:47.08 Here are the Sandawe, who speak with a click, also in Tanzania,
00:19:50.04 but their languages are very divergent from each other.
00:19:53.14 Here are the San hunter-gatherers shown in light green,
00:19:56.05 also speaking with a click, but again,
00:19:58.06 their languages are very differentiated
00:20:00.06 from the other two populations who speak with clicks in Tanzania.
00:20:05.08 And then we have the pygmy populations from central Africa.
00:20:10.13 Interestingly, the pygmy population called Mbuti,
00:20:14.09 who lives the furthest to the east,
00:20:16.26 appears to possible share some common ancestry with the San.
00:20:22.01 And in fact several pieces of data that we've studied
00:20:26.15 suggest that there could have been a
00:20:28.16 proto Khoesan-Pygmy hunter-gatherer population in Africa
00:20:32.16 that probably existed greater than 50,000 years ago,
00:20:36.01 and then underwent population divergence and differentiation
00:20:40.06 and then migration within the past 50,000 years,
00:20:43.22 but there's still a lot of work to be done
00:20:45.14 to try to differentiate this population history.
00:20:48.23 So next I wanna talk about what we found
00:20:51.05 in terms of African American ancestry.
00:20:53.22 We looked at African Americans
00:20:55.21 originating from four regions in the US:
00:20:58.15 Chicago, Pittsburgh, Baltimore, and North Carolina.
00:21:02.05 Now, not surprisingly, you can see that the majority of ancestry
00:21:06.17 is this western Niger-Kordofanian ancestry,
00:21:10.05 shown in orange.
00:21:12.08 The other major component of their ancestry,
00:21:14.21 which is summarized here, is European ancestry,
00:21:17.19 which ranges from about 0% to greater than 50%.
00:21:22.18 And then we see small amounts of ancestry from other populations,
00:21:25.22 including some other African populations
00:21:29.18 who speak Chadic languages, for example,
00:21:33.09 from western Africa.
00:21:34.27 We see a small amount of ancestry from east Africa,
00:21:37.25 and also very small amounts of
00:21:40.07 east Asian and Native American ancestry,
00:21:43.02 at least in these particular populations.
00:21:45.23 If you look at populations from other regions,
00:21:48.17 you may see more ancestry from those regions.
00:21:54.02 And again, this is reflecting the history of the transatlantic slave trade,
00:22:00.15 originating from west Africa,
00:22:03.10 and actually a very large source of the slave trade
00:22:05.29 was from Angola,
00:22:07.24 and we currently know very little about genetic variation in that region.
00:22:11.12 And that's going to be important to know
00:22:13.28 for some studies in which knowing variation
00:22:17.29 in African ancestral populations will be important
00:22:20.15 for identifying disease risk alleles
00:22:23.28 in African American or Afro-Caribbean populations.
00:22:28.28 I want to tell you about another study that I did with collaborators,
00:22:32.18 in which we looked at
00:22:35.22 over 250,000 single nucleotide polymorphisms, or SNPs.
00:22:41.11 These are just regions of the genome
00:22:43.18 that differ at a single nucleotide,
00:22:46.24 and we looked at them predominantly
00:22:49.12 in western populations along the coast of Africa,
00:22:54.05 and one group from southern Africa.
00:22:57.10 And when we do this principal component analysis,
00:22:59.22 one of the interesting results
00:23:02.07 is that the distribution really reflects the geography of these populations,
00:23:08.02 and that's not a huge surprise.
00:23:09.29 It means that people who live near each other
00:23:12.06 tend to mate with each other,
00:23:13.28 and people who live further apart are not intermixing as often,
00:23:17.27 and so they tend to be more genetically differentiated.
00:23:23.23 We then did a principal component analysis
00:23:26.25 including the African American individuals,
00:23:30.08 shown here in sort of fuchsia color,
00:23:33.29 and shown in red are Europeans,
00:23:37.03 and then here we have the different west African populations.
00:23:42.01 And we could actually determine
00:23:44.00 the amount of European or African ancestry in any individual
00:23:49.06 -- African American individual --
00:23:51.19 by looking at their position along principal component 1.
00:23:56.05 So for example, this individual here,
00:23:58.24 this African American individual,
00:24:00.29 appears to have more European ancestry,
00:24:03.17 whereas this African American individual
00:24:06.04 seems to have more west African ancestry.
00:24:11.08 And then, using an approach that was developed by Carlos Bustamante's lab,
00:24:16.15 it was possible to actually scan along chromosomes,
00:24:19.16 so here we're showing
00:24:22.22 the different chromosomes starting at 22, 21, 20,
00:24:25.25 and so on down to chromosome 1.
00:24:28.12 And as you scan along the chromosome,
00:24:30.04 at any particular region,
00:24:32.11 you can infer if somebody has African ancestry,
00:24:36.25 which is shown in blue,
00:24:39.12 European ancestry, which is shown in red,
00:24:43.15 or mixed ancestry, which is shown in green.
00:24:47.27 And what we see is that most African Americans
00:24:50.23 have a mixture of ancestry.
00:24:53.01 So they tend to have a lot of,
00:24:54.16 not surprisingly, African ancestry shown in blue.
00:24:58.03 There are regions of mixed ancestry shown in green,
00:25:01.13 but also note that there are some regions of the genome
00:25:04.20 which are only of European ancestry,
00:25:07.26 and this differs quite a bit amongst different individuals.
00:25:10.11 Here's an example of someone who appears
00:25:12.13 to have undergone very recent admixture;
00:25:16.08 they have a lot of African ancestry.
00:25:19.27 Here's someone who has very recent European ancestry,
00:25:23.01 so we see a lot of regions of the genome
00:25:24.24 where they're of mixed ancestry.
00:25:27.20 Here's someone who has...
00:25:29.24 they self-identify as African American,
00:25:31.27 but they have almost no African ancestry,
00:25:34.24 so that goes to show you that there can be a lot of genetic variation
00:25:38.00 that may not always correlate with self-identified ethnicity.
00:25:42.29 The other important point here is that,
00:25:45.29 in the future,
00:25:48.25 the ideal that we have is to develop
00:25:51.02 more personalized medicine
00:25:53.27 that is tailored for the individual.
00:25:56.14 And here's someone that, for example,
00:25:58.16 if they went to the doctor and they self-identified
00:26:00.22 as African American,
00:26:02.20 the doctor might prescribe certain drugs that, say,
00:26:04.26 are more effective in African Americans.
00:26:07.11 But what if, at that particular position,
00:26:09.27 where they have only European ancestry,
00:26:12.12 what if there was a drug metabolizing enzyme gene
00:26:16.01 at that particular point,
00:26:18.27 and so that would be of pure European ancestry,
00:26:21.24 and so that might be important to know.
00:26:24.00 So this has important implications for
00:26:26.00 future design of future personalized medical approaches for treatment.
00:26:32.25 So in conclusion, people from different geographic regions
00:26:35.29 are genetically more similar to each other,
00:26:38.08 so for example, Asian individuals
00:26:40.15 will be more similar to other Asian individuals,
00:26:43.02 Europeans more similar to other Europeans.
00:26:46.02 But in Africa,
00:26:47.25 there has been more time to accumulate genetic variation,
00:26:50.29 they're had larger effective populations sizes
00:26:53.18 so they've maintained a lot of variation,
00:26:55.26 and they've live in diverse environments,
00:26:58.05 so they tend to be highly differentiated from each other,
00:27:01.05 although we also can see that
00:27:03.16 there's been a history of admixture throughout much of Africa.
00:27:07.29 So therefore, Africans have the highest level of genetic variation,
00:27:12.16 both within and between populations,
00:27:15.01 and we saw that African Americans
00:27:17.05 have ancestry from west Africa and Europe,
00:27:19.21 and that the ancestry varies along chromosomes,
00:27:22.03 which has important implications for personalized medicine.
00:27:26.18 And that concludes this portion of my lecture,
00:27:28.26 and for this section I'd like to acknowledge
00:27:30.23 the many individuals who contributed,
00:27:34.28 together with our funding organizations.
African Genomics: Natural Selection
Concepts: Natural selection in humans, evolution of lactose tolerance
00:00:07.17 For the last part of my lecture series,
00:00:10.11 I wanna talk about examples of natural selections in humans,
00:00:14.29 and the two particular examples
00:00:17.01 that I'm going to be talking about
00:00:19.00 are the evolution or lactose tolerance in east Africa,
00:00:22.10 and of pygmy short stature.
00:00:25.04 So if we're going to be talking about natural selection,
00:00:27.11 we have to first of course
00:00:28.29 acknowledge Charles Darwin,
00:00:31.12 who came up with the theory of natural selection.
00:00:36.10 In fact, to quote from Darwin, he said,
00:00:39.13 "This preservation of favourable variations
00:00:42.03 and the rejection of injurious variations,
00:00:44.21 I call Natural Selection.
00:00:47.06 Variations neither useful nor injurious
00:00:49.27 would not be affected by natural selection,
00:00:52.18 and would be left a fluctuating element,
00:00:55.01 as perhaps we see in the species called polymorphic."
00:00:58.10 And that was from his classic book
00:01:00.06 On The Origin of Species,
00:01:01.24 published in 1859,
00:01:03.25 and you might recognize from our first lecture,
00:01:07.03 that this is really talking about genetic drift,
00:01:09.27 random fluctuations.
00:01:13.13 However, part of the evolutionary change that we see
00:01:18.12 is not just going to be due to random genetic drift,
00:01:21.03 it's also going to be due to natural selection.
00:01:24.18 And so, according to that theory,
00:01:27.10 natural variation exists and is heritable,
00:01:30.02 more organisms are born than can survive,
00:01:32.11 and therefore organisms best suited to the environment
00:01:35.07 survive more often,
00:01:36.25 and slight differences can accumulate in a species over time.
00:01:40.24 So this is the idea of gradual evolution of a species
00:01:43.27 by natural selection.
00:01:45.18 And this is Huxley,
00:01:47.03 who was also known as Darwin's bulldog
00:01:50.11 because he was the big proponent of his theory,
00:01:52.17 and he said,
00:01:54.05 "How extremely stupid not to have thought of that!"
00:01:57.12 So when Darwin first came up with his theory of natural selection,
00:02:01.08 there was really no concept of genetics
00:02:04.12 as we know it today.
00:02:06.02 In fact, it wasn't until the late 1800s
00:02:08.12 that Mendel proposed his theory of genetics.
00:02:13.02 So in the 1930s and 1940s
00:02:15.22 there was sort of a synthesis of natural selection
00:02:19.09 and genetics and mathematics,
00:02:22.16 population genetics,
00:02:23.25 and at that time it was proposed that genetic variation in populations
00:02:27.05 arises by chance through mutation and recombination,
00:02:31.15 that evolution consists primarily of changes in the
00:02:34.12 frequencies of alleles between one generation and another,
00:02:37.25 largely as a result of genetic drift,
00:02:40.24 gene flow,
00:02:42.05 and natural selection.
00:02:43.27 And that speciation occurs gradually when populations
00:02:46.00 are reproductively isolated, for example,
00:02:48.20 by geographic barriers.
00:02:52.14 And so if we look at this timeline,
00:02:54.12 starting with the Origin of Species,
00:02:56.21 and then Mendelian inheritance
00:02:59.04 is actually rediscovered in 1900,
00:03:01.19 it was first proposed in the late 1880s,
00:03:04.01 but very few people knew about it at that time.
00:03:06.27 And then in the early 1900s
00:03:09.09 we have the theoretical foundations of population genetics
00:03:12.21 and then, as I mentioned,
00:03:14.09 the modern synthesis in the 30s.
00:03:16.19 And then in the 70s we have Kimura's theory of neutral evolution,
00:03:21.19 which was proposing that most changes and speciation events
00:03:25.07 are simply due to random genetic drift
00:03:27.22 and to new mutation events.
00:03:29.20 And I think that today we would say
00:03:31.14 it's a combination of all of the above.
00:03:33.19 There's certainly a lot of genetic drift that occurs,
00:03:36.02 but we know that natural selection
00:03:37.29 is having a very important influence
00:03:40.26 on the variation that we see
00:03:43.05 in terms of phenotypic variation and even disease susceptibility.
00:03:48.04 So let's look what happens
00:03:49.08 when a neutral mutation occurs in a population,
00:03:52.07 as indicated by this individual in green.
00:03:55.25 Let's look what happens as we proceed forward in generations,
00:03:59.08 and you can see there's not too many changes
00:04:01.21 in allele frequency.
00:04:03.29 But what happens when we have a beneficial mutation,
00:04:09.05 which means that it increases the fitness of the individual,
00:04:12.20 meaning that they're more likely to produce children,
00:04:16.12 and their children are more likely to produce more children,
00:04:19.00 and so on and so forth.
00:04:21.15 And so we can see that each generation,
00:04:24.04 this beneficial mutation is going to spread,
00:04:27.22 until eventually it may be nearly fixed
00:04:32.06 in the population.
00:04:34.17 So I want to tell you about some of our studies
00:04:37.20 focused in African populations
00:04:39.14 in which we're trying to identify
00:04:41.02 genetic signatures of natural selection,
00:04:43.19 and regions of the genome that are targets of natural selection.
00:04:47.29 And this is important
00:04:49.22 because it's thought that mutations associated with diseases
00:04:52.16 in modern populations,
00:04:54.13 like hypertension
00:04:56.03 , diabetes,
00:04:57.08 obesity,
00:04:58.10 and asthma,
00:04:59.08 may have been selectively advantageous or adaptive
00:05:01.23 in past hunter-gatherer environments.
00:05:04.04 So if we can identify these regions
00:05:06.25 that are targets of selection, or actual variable sites
00:05:09.18 that are targets of selection,
00:05:11.16 those may be functionally important
00:05:13.14 and may give us a clue about disease risk.
00:05:16.11 So here I'm showing you a few of the populations
00:05:18.12 that we've studied in Africa,
00:05:20.23 and we have people who are living at very different climates,
00:05:23.15 high altitude, low altitude,
00:05:26.03 savannah, and tropical environments, for example.
00:05:30.10 We have people who have very different diets,
00:05:32.22 so agriculturalists,
00:05:34.08 hunter-gatherers,
00:05:35.23 or pastoralists.
00:05:37.14 And they have very different infectious disease exposures,
00:05:40.05 so they've likely undergone local adaptation
00:05:42.13 to different environments.
00:05:45.25 And I'm going to, as I mentioned,
00:05:47.16 tell you about two examples today.
00:05:49.15 The first one is the evolution of lactose tolerance
00:05:51.29 in east African pastoralist populations.
00:05:57.07 So, the ability to digest the sugar lactose,
00:06:00.21 which is quite common in milk,
00:06:03.07 is due to an enzyme called lactase-phlorizine hydrolase,
00:06:07.15 or known as lactase for short.
00:06:09.28 And lactase is expressed specifically
00:06:13.01 in the brush border cells of the small intestine,
00:06:16.25 and in individuals who maintain high levels of this enzyme
00:06:20.29 as adults,
00:06:22.23 they're able to break down the complex sugar lactose
00:06:26.16 into glucose and galactose,
00:06:29.14 which is rapidly taken up into the bloodstream.
00:06:32.23 However,
00:06:35.19 most mammals, and most humans,
00:06:38.19 shut down lactase activity
00:06:40.23 shortly after weaning.
00:06:42.28 So, as adults, they do not have an active form of this enzyme.
00:06:46.24 And what's going to happen is
00:06:48.19 they're not going to be able to break down that complex sugar.
00:06:51.26 It's going to go down into the lower gut,
00:06:54.13 it's going to be attacked by bacteria,
00:06:56.25 and you're going to have severe intestinal distress.
00:07:01.00 Now, it has been noted for many years by anthropologists
00:07:04.27 that there is a very strong correlation
00:07:06.27 between the lactose tolerance trait,
00:07:09.19 or you could think of it also as the lactase persistence trait,
00:07:13.26 because there's persistence of the enzyme activity as adults.
00:07:18.15 And they've seen a strong correlation
00:07:20.20 between the prevalence of that trait
00:07:23.14 with populations who traditionally practice cattle domestication
00:07:28.04 and dairying.
00:07:30.05 So for example, this trait is most common in northern Europe,
00:07:33.19 it decreases in frequency as one moves
00:07:36.23 into southern Europe
00:07:38.29 and into the Middle East.
00:07:40.24 It's very uncommon in eastern Asia
00:07:43.26 and in the Americas,
00:07:46.13 and it's uncommon in western Africa,
00:07:48.26 which is one of the reasons that we see high levels
00:07:51.17 of lactose intolerance in African Americans, for example.
00:07:55.24 But in regions of Africa where there's a high prevalence
00:07:59.04 of cattle domestication, pastoralism, and dairying,
00:08:03.14 we see a high prevalence of this trait.
00:08:07.15 So, in 2002,
00:08:10.18 there was an elegant study done
00:08:12.20 by Leena Peltonen's group in Finland,
00:08:14.28 in which they identified a genetic mutation
00:08:17.08 that regulates lactose tolerance in Europeans.
00:08:20.20 And it was located near the...
00:08:23.25 upstream of the lactase gene.
00:08:26.01 When we sequenced that region in east African pastoralists,
00:08:29.04 they didn't have it,
00:08:31.10 so we knew they must have something else.
00:08:33.13 So in order to identify those mutations,
00:08:35.21 we did something that's called a lactose tolerance test.
00:08:38.29 So, basically what we do is
00:08:42.11 we give people the sugar lactose in a powdered form,
00:08:46.20 we add water, and it basically tastes like orange Kool-Aid,
00:08:51.09 and then we have to line people up
00:08:54.17 and have them drink the lactose at the same time.
00:08:57.27 This is a group of Maasai women from Tanzania.
00:09:03.28 This is a group of pastoralists from southern Ethiopia.
00:09:11.23 And then we can use a standard diabetes monitoring kit,
00:09:16.03 and what we can do is to measure the blood glucose,
00:09:19.29 starting at baseline before they drink the lactose,
00:09:23.29 and then every 20 minutes we're gonna measure this,
00:09:27.06 over a period of about an hour.
00:09:30.02 And then we're gonna look at the maximum rise
00:09:32.25 in blood glucose.
00:09:35.15 If individuals have a rise
00:09:37.09 that is greater than 1.7 millimolar (mM)
00:09:39.20 we consider them to be lactose tolerant,
00:09:42.20 or to have the lactase persistent trait,
00:09:45.03 shown in light blue.
00:09:47.07 And if they have a rise that is less than 1.1 mM,
00:09:51.12 they're considered to be intolerant,
00:09:53.25 shown in dark blue.
00:09:55.18 So, we measured this trait
00:09:57.12 in nearly 500 individuals
00:09:59.17 from Tanzania, Kenya, and the Sudan,
00:10:02.00 and then we looked for association
00:10:04.10 with genetic variation that we identified
00:10:06.21 by resequencing the region
00:10:08.28 where the European variant had been identified.
00:10:13.13 And in doing so we identified
00:10:15.12 three novel genetic polymorphisms
00:10:18.21 that are associated with the lactose tolerance trait in east Africa,
00:10:22.17 and those are shown here by the boxes.
00:10:26.29 The most common was this one at position 14010,
00:10:31.06 but we also saw those others
00:10:32.24 at positions 13915 and 13907,
00:10:36.03 located roughly 14,000 basepairs
00:10:38.25 upstream of the lactase gene
00:10:41.18 which is located on chromosome 2.
00:10:44.07 Now, one of the really interesting things about this is that,
00:10:48.11 one, these regulatory mutations were pretty far away,
00:10:51.28 about 14,000 basepairs from the gene,
00:10:54.26 and they were located in an intron
00:10:57.25 in a non-coding region of a neighboring gene called MCM6.
00:11:03.00 So this is demonstrating that
00:11:04.25 functionally important variation
00:11:07.13 can actually be located in non-coding regions,
00:11:10.21 and we were able to show,
00:11:13.13 using in vitro cell line studies,
00:11:16.20 that these variants that are derived,
00:11:20.19 shown in the different colors here,
00:11:23.22 that they regulate expression
00:11:26.15 of the lactase gene using the lactase promoter.
00:11:31.10 Now, they're located very close to the mutation
00:11:35.01 associated with lactose tolerance in Europeans,
00:11:38.20 located at position 13910,
00:11:41.14 but they arose independently
00:11:43.17 due to a process called convergent evolution,
00:11:46.13 and probably due to a very strong
00:11:48.27 selective force to be able to drink milk that contains lactose,
00:11:56.15 in these different regions of the world.
00:12:00.27 What's also interesting
00:12:02.19 is that the variants that we identified
00:12:04.16 have a very distinct geographic distribution.
00:12:07.09 So the one that we found that was most common in our study
00:12:10.06 was at position 14010,
00:12:12.08 and we can see that it is pretty localized
00:12:15.01 to east Africa, to Tanzania and Kenya,
00:12:17.26 and that's the most likely site of origin of that mutation.
00:12:21.11 Interestingly, we also see it a bit in south Africa,
00:12:26.01 probably reflecting migration of pastoralists
00:12:28.29 from east Africa into that region.
00:12:32.16 The variant position at 13915
00:12:35.08 appears to have originated in the Middle East,
00:12:37.21 and we could see that it was introduced into northeast Africa,
00:12:40.17 probably by migration.
00:12:43.10 And then the variant at position 13907
00:12:46.26 likely arose in northeast Africa.
00:12:49.21 But again, one of the important take-home points is that
00:12:53.02 we have a functionally important variant
00:12:55.07 that's occurring at high frequency, sometimes as high as 40%,
00:12:59.12 and it's very geographically restricted,
00:13:02.22 and there are likely to be other mutations like that,
00:13:05.06 some of which may have implications for disease susceptibility,
00:13:09.11 again emphasizing the importance
00:13:11.23 to look amongst ethnically diverse Africans.
00:13:16.29 So the next thing we wanted to do
00:13:19.21 was to look for a signature of positive selection,
00:13:23.17 and this is the method in which we can do that.
00:13:27.21 So imagine, here in red,
00:13:30.25 imagine that this is a new mutation that has occurred, say,
00:13:34.15 one of the mutations associated with lactose tolerance.
00:13:38.00 And it's adaptive,
00:13:39.10 meaning that it increases the fitness of individuals who have it,
00:13:43.25 meaning that they're more likely to have children,
00:13:45.27 and their children are more likely to have children,
00:13:47.22 and so on.
00:13:49.28 And so it's going to increase in frequency
00:13:52.24 in the population,
00:13:54.29 and it's going to drag with it
00:13:57.15 the neighboring variants nearby.
00:14:00.03 So, you can see that when it originated, it had...
00:14:02.29 it was on a chromosome with a green variant
00:14:05.18 and a black variant.
00:14:08.03 And now these got dragged along to high frequency,
00:14:11.04 through a process known as hitchhiking.
00:14:14.07 Now, if this had gone to fixation,
00:14:17.12 meaning that everybody has it,
00:14:19.00 we would have called it a full selective sweep.
00:14:21.20 In this case, it hasn't quite reached a full selective sweep,
00:14:26.15 so we call it a partial sweep.
00:14:29.24 Now, that could just mean that
00:14:31.17 there hasn't been time for it to go to a full sweep,
00:14:33.06 or it could be that for some reason
00:14:35.17 there may be some negative aspects of having it,
00:14:38.02 and there's a reason that both variants are maintained in the population.
00:14:43.17 Now, after the sweep occurs,
00:14:45.19 you're going to have new mutation events
00:14:47.26 and new recombination events
00:14:49.21 shuffling up the variants
00:14:52.04 that are linked to the mutation that's adaptive.
00:14:56.12 And so that will decrease the association
00:15:01.00 observed between the mutation and the flanking variation.
00:15:05.00 And in fact,
00:15:06.15 if we have an estimate of the recombination rate,
00:15:08.27 we can use computational methods
00:15:10.24 to estimate how old this mutation is.
00:15:14.18 And that's exactly what we did here.
00:15:17.12 So shown on top
00:15:19.28 is an example from the most common mutation
00:15:22.27 that we found associated with lactose tolerance,
00:15:25.03 at position 14010.
00:15:27.18 Individuals who have the C variant
00:15:29.26 are able to digest milk,
00:15:31.22 and individuals who are homozygous are shown as red.
00:15:35.28 And what we did is we genotyped markers
00:15:38.28 going a distance of about 3 million nucleotides,
00:15:42.26 and what we would do is that if someone is homozygous,
00:15:46.20 starting at the lactose tolerance mutation,
00:15:49.02 and then we go to the next mutation.
00:15:51.02 If they're homozygous,
00:15:53.00 then we continue going.
00:15:55.13 If they underwent a recombination,
00:15:57.05 we stop the line.
00:15:59.13 And what we can basically see is that homozygosity
00:16:02.28 extends about 2 million basepairs
00:16:06.01 on chromosomes that have the lactose tolerance mutation.
00:16:09.15 But if we look at chromosomes that have the ancestral mutation,
00:16:14.00 they have almost no extended haplotype homozygosity.
00:16:19.07 And so this is a classic signature of a selective sweep.
00:16:22.25 It means that this variant
00:16:24.16 was under very strong positive selection
00:16:28.14 and it rapidly increased in frequency in the population,
00:16:32.04 dragging with it the neighboring variation.
00:16:38.16 Now, here I'm showing the European variant,
00:16:41.17 in this case the T variant
00:16:43.20 is associated with lactose tolerance,
00:16:45.27 and it shows a very similar pattern.
00:16:50.22 So using computational approaches,
00:16:52.27 we were able to estimate the age of the African mutation
00:16:57.22 to be somewhere between about 3,000-7,000 years of age.
00:17:01.28 These are the populations
00:17:03.25 that had the oldest age estimates,
00:17:06.08 and they include individuals
00:17:08.07 who speak Cushitic languages.
00:17:10.04 They came from Ethiopia,
00:17:12.10 and they practiced agro-pastoralism.
00:17:15.01 They came into Kenya and Tanzania
00:17:17.16 within the past 5,000 years.
00:17:20.23 And then we saw it at very high prevalence
00:17:23.26 and an old age estimate in Nilo-Saharan-speaking groups,
00:17:27.11 and these would include, for example, the Maasai.
00:17:30.09 Now, they came into the region more recently,
00:17:32.17 from southern Sudan,
00:17:34.08 within the past 3,000 years, so if I were to guess,
00:17:37.05 I would think perhaps this mutation
00:17:39.04 arose in the Cushitic speaking populations.
00:17:42.03 But irregardless, it quickly, rapidly spread
00:17:45.07 to all of the populations in the area
00:17:47.21 because it was so selectively advantageous
00:17:51.22 and adaptive to have this mutation.
00:17:55.03 Now, because we see the correlation
00:17:59.18 between the practice of cattle domestication and pastoralism
00:18:04.11 and the rise in this mutations,
00:18:06.16 this is a really excellent example
00:18:08.22 of gene-culture co-evolution.
00:18:12.03 And in fact, what's really interesting is
00:18:15.01 that the date estimates that we came up with correlate really well
00:18:18.25 with the archaeological data,
00:18:20.17 which shows that cattle domestication
00:18:22.14 arose in the Middle East or north Africa
00:18:27.04 somewhere between 8,000-10,000 years ago,
00:18:29.26 and that corresponds with the age estimate for the European mutation,
00:18:33.18 which we inferred to be about 9,000 years old.
00:18:37.25 But cattle domestication was not introduced
00:18:40.25 south of the Saharan desert
00:18:44.20 until roughly 5,000 or 5,500 years ago,
00:18:48.21 correlating very well with the age estimate
00:18:52.03 for the mutation we found in eastern Africa.
00:18:54.24 And then it was introduced
00:18:56.13 much more recently into southern Africa.
00:19:00.12 But one could argue that perhaps Mendelian traits like lactose tolerance,
00:19:05.04 which are regulated by a single locus or gene of major effect,
00:19:10.23 are in a sense the low hanging fruit;
00:19:12.20 they're the easiest to identify.
00:19:15.04 So one thing that my lab is interesting in doing
00:19:17.10 is looking at more complex traits,
00:19:19.23 and perhaps one of the most classic complex traits is height.
00:19:23.28 So, height is highly heritable,
00:19:26.19 genome wide association studies in tens of thousands of Europeans
00:19:30.12 have identified hundreds of loci,
00:19:33.06 each of very small effect,
00:19:35.06 and explaining only a very small proportion of the variation in height.
00:19:39.20 Now, interestingly, most of these are not part of
00:19:42.22 the growth hormone/IGF1 pathway,
00:19:45.07 which we know plays a very important role in idiopathic short stature,
00:19:49.16 for example.
00:19:53.05 Now, in Africa, we see some of the broadest distributions,
00:19:56.21 or ranges in height,
00:19:59.02 ranging from the very short statured Pygmies in central Africa,
00:20:03.28 and then we see some of the tallest individuals
00:20:07.13 in the Sudan and in eastern Africa.
00:20:10.23 And it's thought that these differences
00:20:12.14 may be partly due to adaptation
00:20:14.29 to different environments.
00:20:16.27 So what I want to tell you today is about
00:20:18.19 our genetic studies of short stature
00:20:22.01 in Pygmy populations from central Africa.
00:20:25.14 And, for you to fully understand and appreciate the work we've done,
00:20:29.17 I think I should first tell you a little bit about
00:20:32.07 how we went about collecting these samples
00:20:34.07 and how challenging it could be.
00:20:35.25 So, this is...
00:20:37.18 to get to one of the groups that we studied in Cameroon,
00:20:39.27 you have to cross this river,
00:20:41.28 and you have a person who has a ferry,
00:20:44.05 he's actually using a hand crank here
00:20:47.13 to get us across.
00:20:50.12 And I guess I'm very fortunate
00:20:52.19 because as a woman, I was able to get shade,
00:20:54.15 but not everybody was that lucky.
00:20:56.28 And here are some other hazards that we run into,
00:20:59.15 but I'm smiling because the head is cut off of this snake.
00:21:03.01 But I actually have to give credit to Dr. Alain Froment,
00:21:06.24 who has been studying the Pygmy populations in Cameroon
00:21:09.16 for greater than 30 years,
00:21:11.20 and he did the majority of the sample collection
00:21:14.01 in this case.
00:21:16.21 So, the genetic basis of short stature in Pygmies
00:21:19.26 is a question that's been of tremendous interest
00:21:22.08 to endocrinologists and human geneticists alike
00:21:25.07 for most than 50 years.
00:21:27.08 The particular populations that we studied
00:21:29.25 are located in Cameroon, three different groups from Cameroon,
00:21:34.27 who mean male height is 152 cm.
00:21:40.11 And they live in very close connection and interaction
00:21:44.29 with neighboring populations who speak Bantu languages
00:21:47.29 and practice agriculture,
00:21:50.06 and their mean male height is 170 cm,
00:21:54.04 so that's quite a difference between the two.
00:21:58.16 So, the Pygmy short statured phenotype in humans
00:22:01.26 has arisen independently in different global populations.
00:22:05.12 Typically, these are populations
00:22:07.03 that live in tropical environments,
00:22:09.14 so there have been a number of hypotheses
00:22:11.11 about why this trait might be adaptive.
00:22:14.28 And these include thermoregulation,
00:22:19.04 limited food resources,
00:22:21.19 locomotion - that it may be easier to move
00:22:23.28 in a dense tropical environment if you're short,
00:22:26.18 and more recently there's a theory
00:22:30.10 that this is due to a life-history tradeoff,
00:22:32.10 and I'm going to focus on that theory.
00:22:35.08 And that has to do with the fact that
00:22:37.14 Pygmies have a remarkably short lifespan.
00:22:40.11 Their chance of living to age 15
00:22:42.06 is only about 40%,
00:22:44.18 and if they make it to age 15,
00:22:46.23 the expected lifespan is only around 25 years of age.
00:22:50.01 Now, that is due largely to very high infectious disease burden
00:22:54.05 and a very challenging life in dense tropical forests.
00:22:59.24 Now, what the study showed is that
00:23:02.18 Pygmies appear to be reaching reproduction...
00:23:05.25 they appear to be reproducing and reaching puberty
00:23:08.09 at a significantly earlier age
00:23:11.01 than other Africans.
00:23:13.13 And the growth trajectory in Pygmies
00:23:14.28 appears to be similar to other populations until the point of puberty,
00:23:19.21 and then they lack the adolescent growth spurt.
00:23:22.15 So this may be some sort of a tradeoff:
00:23:24.18 there's selection to reproduce earlier
00:23:26.22 because they're dying very young,
00:23:28.22 but that may be a tradeoff,
00:23:30.24 in that they're not undergoing the adolescent growth spurt.
00:23:35.20 Now, there have been only a handful
00:23:37.28 of physiologic and metabolic studies in Pygmies,
00:23:42.00 but nearly all of these are pointing towards
00:23:44.18 disruptions of the growth hormone/IGF1 pathway,
00:23:47.19 so this is in contrast to what we're seeing in European populations.
00:23:52.05 However, there's been quite a bit of dispute of
00:23:54.28 where along this pathway these disruptions are occurring.
00:24:00.04 So, in order to try to address these questions,
00:24:03.06 we genotyped one million single nucleotide polymorphisms
00:24:08.06 in 67 pygmy individuals
00:24:10.23 and 58 of the neighboring Bantu individuals.
00:24:14.14 And here we can see a plot,
00:24:17.10 similar to what I've shown you before,
00:24:19.09 based on structure analysis.
00:24:21.09 And to remind you,
00:24:23.02 this is composed of a series of lines,
00:24:24.23 and each line represents a person,
00:24:26.16 and they can have ancestry
00:24:28.08 from different ancestral populations,
00:24:31.01 represented by the different colors.
00:24:33.04 So here in orange
00:24:34.29 are individuals who speak the Bantu language
00:24:38.00 and practice agriculture,
00:24:40.08 and in dark green are individuals who self-identify as Pygmies.
00:24:44.19 And what you can see is that there's been
00:24:46.22 a lot of admixture between the Pygmies
00:24:49.29 and the neighboring Bantu people.
00:24:52.03 Now, interestingly, this tends to be unidirectional,
00:24:54.28 and it tends to be gene flow between males
00:24:57.27 from the Bantu population
00:25:00.03 with females of the Pygmy population.
00:25:02.22 This is largely due to socioeconomic factors.
00:25:06.23 Now, when we look at a correlation
00:25:08.26 between ancestry and height,
00:25:11.03 we observed a very strong and significant positive correlation.
00:25:15.04 So, we can see that Pygmies who have more of the Bantu ancestry
00:25:19.23 tend to be taller.
00:25:21.19 And, so this is showing
00:25:22.25 that there's a strong genetic component to this trait.
00:25:26.17 We've also worked with collaborators
00:25:28.11 to develop methods
00:25:30.19 to infer tracts of Pygmy and Bantu ancestry
00:25:35.11 across the chromosome.
00:25:36.29 So here, these are the different chromosomes,
00:25:38.18 starting with chromosome 1
00:25:40.05 and going up to chromosome 22,
00:25:42.25 and here I'm showing you an example from chromosome 3.
00:25:46.04 And in blue is showing tracts of the genome
00:25:49.03 that are Pygmy ancestry,
00:25:50.24 and in red are tracts of the genome that are Bantu ancestry,
00:25:54.23 and what we tend to see are very, very short tracts of Bantu ancestry.
00:25:58.28 And that's reflected in the fact that admixture
00:26:01.08 has been occurring over thousands of years.
00:26:06.11 Now, the next question that we wanted to address
00:26:08.17 is how do the genomes of the Pygmy hunter-gatherers
00:26:12.04 differ from the genomes of the Bantu agriculturalists
00:26:17.00 and from other groups, such as the Maasai pastoralists
00:26:20.28 from east Africa.
00:26:22.28 And to do that,
00:26:25.03 we use a number of scans of natural selection
00:26:27.28 across the genome.
00:26:29.29 Without getting into detail about the methods,
00:26:32.26 I'll just point out that you can see by the different colors here
00:26:37.00 across the different chromosomes,
00:26:39.00 here's chromosome 22 and going down to chromosome 1,
00:26:42.04 that we found a number of regions of the genome
00:26:44.22 that are targets of selection.
00:26:47.05 But there was one region in particular,
00:26:49.26 on chromosome 3,
00:26:52.04 where we saw a cluster of targets of natural selection.
00:26:57.01 And this was over about a 15 million basepair region.
00:27:01.14 Now, given our small sample size,
00:27:03.20 we have very little power
00:27:05.15 to detect a genome-wide association.
00:27:09.04 And so what we did is,
00:27:10.26 under the hypothesis that this is an adaptive trait,
00:27:13.17 we just focused on the regions of the genome
00:27:16.07 that are targets of selection, shown here,
00:27:19.11 and then we looked for an association with height.
00:27:22.10 And one of the strongest, most significant associations
00:27:25.09 was exactly in that same 15 million basepair region
00:27:29.19 of chromosome 3.
00:27:31.23 And indeed, it encompassed several genes,
00:27:34.15 one of which is DOCK3,
00:27:36.18 which has been shown to be associated with height
00:27:39.09 in non-African populations,
00:27:41.08 so we replicated that finding.
00:27:43.20 But nearby was another gene called CISH,
00:27:47.09 which is a member of the cytokine signaling family,
00:27:50.10 plays a very important role in regulating
00:27:52.28 IL-2 cytokine signaling pathway,
00:27:56.18 and studies have shown that it's associated
00:27:58.26 with resistance to a number of infectious diseases
00:28:01.18 in Africa.
00:28:04.01 Now, interestingly,
00:28:05.29 CISH also directly inhibits
00:28:07.14 human growth hormone receptor action
00:28:10.06 by blocking the STAT5 phosphorylation pathway.
00:28:13.15 And so we know that studies in mice
00:28:15.17 show that when this gene is overexpressed,
00:28:18.06 the mice are short statured.
00:28:20.23 Now, this led me to the hypothesis that,
00:28:24.14 could it be that there could actually be selection
00:28:26.19 for immune function
00:28:28.11 that is indirectly resulting
00:28:30.05 in short stature in Pygmies,
00:28:32.05 because that gene plays an important role in both.
00:28:35.29 And we need to do further functional studies,
00:28:38.20 and look at differences in gene expression
00:28:40.13 to test this hypothesis.
00:28:44.04 The last study I wanna tell you about is a study
00:28:46.20 in which we sequenced the entire genomes,
00:28:49.15 at high coverage,
00:28:51.07 of 15 African hunter-gatherers,
00:28:53.22 including 5 Pygmies,
00:28:55.28 5 Hadza,
00:28:57.10 and 5 Sandawe.
00:28:59.26 We identified over 13 million variants,
00:29:02.29 3 million of which are completely novel;
00:29:05.29 they have never previously been identified.
00:29:08.13 And that's just from 15 individuals,
00:29:10.14 so you can imagine how much variation is out there.
00:29:13.16 Many of these are novel variants...
00:29:15.27 many of these novel variants are in known regulatory sites.
00:29:21.04 So now, combining the two studies,
00:29:24.08 we wanted to ask the question,
00:29:26.03 which pathways are enriched for genes near targets of selection?
00:29:29.16 And these enriched pathways
00:29:31.25 include genes involved in neuro-endocrine signaling,
00:29:35.01 reproduction,
00:29:36.06 metabolism,
00:29:37.11 and immune function,
00:29:38.22 and interestingly, based on the whole genome sequencing study,
00:29:42.08 we saw an enrichment for genes
00:29:44.06 that play a role in pituitary function in Pygmies,
00:29:47.13 including follicle-stimulating hormone receptor,
00:29:50.13 growth hormone receptor,
00:29:52.11 HESX1, which I'll tell you more about in a moment,
00:29:55.11 and thyrotropin-releasing hormone receptor.
00:29:58.15 In fact, TRHR was one of the biggest hits
00:30:02.13 that we saw in terms of these studies of selection.
00:30:05.17 And what's interesting is that this gene
00:30:08.22 plays an important role in the hypothalamic-pituitary-thyroid axis,
00:30:12.28 influencing a number of traits that could potentially
00:30:15.14 be of adaptive significance in Pygmies.
00:30:18.26 And also of interest was that anthropologists
00:30:21.18 have noted that there is a significant difference
00:30:24.23 in the prevalence of Goiter
00:30:27.00 among Pygmies and neighboring Bantu groups.
00:30:29.24 So the Pygmies have a much lower frequency of Goiter
00:30:33.16 compared to the neighboring Bantu populations,
00:30:36.16 and this could reflect a biological adaptation in Pygmies
00:30:41.20 to a low iodine environment.
00:30:43.24 It's very deleterious to get Goiter
00:30:46.22 because it can also lead to a diseased called Cretinism,
00:30:49.27 which of course is going to be very deleterious.
00:30:52.18 So again, here's an example
00:30:54.10 where something like adaptation to diet
00:30:56.13 could indirectly influence growth
00:30:58.28 or other phenotypes in the Pygmy population.
00:31:04.01 The last thing we wanted to do
00:31:06.01 was to look for regions of the genome,
00:31:08.08 using the whole genome sequencing data,
00:31:10.13 that are specific to Pygmies,
00:31:12.20 and those are shown in green here.
00:31:16.02 Now, we identified 25 clusters in the genome,
00:31:19.23 and the largest cluster
00:31:22.27 was right in that same region of chromosome 3
00:31:25.14 that we had previously identified.
00:31:28.00 But we had missed it in the prior study,
00:31:30.11 and the reason why is because
00:31:32.17 it contains these Pygmy-specific variants,
00:31:35.08 that were not captured by the SNP array that we used,
00:31:39.17 and thus demonstrating the great importance
00:31:42.00 of doing resequencing for identifying novel
00:31:44.24 and potentially functionally important variation
00:31:47.15 in ethnically diverse populations.
00:31:50.28 Now, this cluster consisted of
00:31:55.10 44 SNPs in 100% association with each other
00:31:59.16 over 170,000 nucleotide,
00:32:03.06 shown here,
00:32:05.24 and it contained a very interesting candidate gene called HESX1.
00:32:10.10 HESX1 codes for a transcription factor
00:32:13.05 that plays a very important role
00:32:15.04 in regulating the development
00:32:17.15 at the anterior pituitary in the brain,
00:32:20.14 and that's the site of production of growth hormone,
00:32:22.23 as well as other reproductive hormones.
00:32:25.11 Now, interestingly,
00:32:27.06 we identified a non-synonymous,
00:32:29.28 so an amino acid change, basically,
00:32:33.23 in this gene
00:32:36.03 that had been previously associated
00:32:38.13 with idiopathic short stature in humans.
00:32:41.26 But it turns out that this varian
00:32:44.01 t is present at about a 20% frequency in other Africans.
00:32:47.12 So what we hypothesize is that
00:32:49.13 there's something about this region
00:32:51.22 that may be altering gene expression of HESX1
00:32:55.07 or other genes in that region.
00:32:58.01 Upstream, we found another cluster
00:33:01.18 near this gene POU1F1, also known at Pit-1 in mouse,
00:33:07.13 and again this codes for a transcription factor
00:33:09.18 that plays a critical role in regulating growth hormone expression.
00:33:14.23 So another excellent candidate gene.
00:33:17.28 Now, what is interesting is that
00:33:19.27 both of these clusters, or genes,
00:33:23.18 are amongst the most differentiated regions
00:33:26.27 of the Pygmy genomes,
00:33:28.27 compared to genomes from elsewhere in Africa.
00:33:31.29 So we then picked out some of the SNPs in these regions
00:33:37.13 and genotyped them in a larger set
00:33:39.19 of western and eastern Pygmies,
00:33:41.26 and we showed that they are statistically
00:33:44.02 associated with short stature in Pygmies.
00:33:47.29 So the next step is going to be
00:33:49.24 to try to make transgenic models
00:33:52.01 that express these variants using transgenic mouse models,
00:33:56.06 and see what the phenotype looks like.
00:34:00.19 So that leads us to a number of hypotheses.
00:34:03.19 One, is that alterations in the growth hormone/IGF1 pathway
00:34:07.15 play a role in the short stature trait in Pygmies.
00:34:13.01 Two, is that anterior pituitary hormones
00:34:15.10 may play a central role in the Pygmy phenotype,
00:34:18.09 influencing growth, reproduction,
00:34:20.15 metabolism, and immunity.
00:34:24.00 And thirdly, that short stature
00:34:26.16 could be a byproduct of selection
00:34:28.11 acting on pleiotropic loci.
00:34:31.04 So if we look here,
00:34:32.21 one of the candidate loci that we identified is HESX1.
00:34:36.13 That's going to influence expression and development
00:34:39.20 of the anterior pituitary,
00:34:42.02 site of production of growth hormone.
00:34:44.20 Growth hormone expression is also regulated
00:34:46.23 by this other gene we found, POU1F1.
00:34:50.04 And this CISH regulates growth hormone receptor.
00:34:54.17 Now, if we look at the downstream effects
00:34:56.24 of growth hormone,
00:34:59.07 growth hormone, when it binds to growth hormone receptor,
00:35:02.18 will trigger off expression of IGF1,
00:35:06.12 predominantly from the liver, but from other tissues as well.
00:35:10.06 IGF1 will have an effect on muscle growth
00:35:13.14 and also on bone growth and height,
00:35:16.02 but the other impact, or the other role of growth hormone
00:35:20.12 is that it also influences insulin metabolism,
00:35:24.06 it influences fat metabolism.
00:35:28.01 And then we know that infectious disease
00:35:30.01 alters immune response and cytokine levels,
00:35:33.08 and that these can influence gene expression from CISH,
00:35:36.11 or other genes that are in this pathway.
00:35:40.09 So, when we go back to Africa to study the Pygmies,
00:35:42.28 what we would ultimately like to do next
00:35:45.16 is to measure all of the phenotypes,
00:35:48.01 because if you want to understand something
00:35:50.04 like the evolution of short stature in Pygmies,
00:35:52.19 I think you can't just be looking at stature
00:35:55.09 because the growth hormone pathway
00:35:58.25 plays a role in all of these different traits,
00:36:01.01 so we need to be looking at this as an integrative picture.
00:36:06.01 And in fact, our approach in the future
00:36:08.26 is to use an integrative genomics approach
00:36:11.24 combining whole genome data,
00:36:14.15 data on protein variation from blood,
00:36:17.25 epigenetic variation,
00:36:19.21 which can be influenced by diet and environment,
00:36:22.12 gene expression,
00:36:24.10 we're starting to look at the microbiome,
00:36:27.16 which is the spectrum of bacteria in the gut,
00:36:32.05 because that can not only be influenced by diet,
00:36:35.16 it can also have an influence on the metabolome,
00:36:38.12 or the set of all the metabolites, for example,
00:36:40.27 in blood.
00:36:42.20 And we want to combine that information
00:36:44.25 together with information on diet
00:36:46.22 and other environmental factors,
00:36:48.29 to try to identify genetic and environmental factors
00:36:52.15 that play a role in short stature
00:36:55.05 and in other anthropometric,
00:36:56.25 cardiovascular,
00:36:58.01 and metabolic traits.
00:37:00.20 One of the other approaches we can take
00:37:02.20 to distinguish the role of genetics and environment is, for example,
00:37:06.00 to look at individuals of the same or similar ethnic background,
00:37:10.29 but living in an urban versus a rural environment.
00:37:16.21 We can also take a different...
00:37:18.14 the opposite approach.
00:37:20.00 We can look at individuals who have
00:37:22.06 very different genetic ancestries,
00:37:25.03 but live in similar environments.
00:37:27.13 So for example,
00:37:29.20 this is a girl who is from the Fulani population,
00:37:33.17 and here's a neighboring...
00:37:35.19 an individual from the Tupuri population.
00:37:38.26 So they are genetically very differentiated,
00:37:41.20 but live in a similar environment,
00:37:43.16 yet the Fulani seem to have some innate resistance
00:37:47.06 to malaria infection.
00:37:50.03 By contrast, in the San,
00:37:53.09 from southern Africa,
00:37:54.29 are very differentiated from the Bantu,
00:37:57.15 but the San seem to have an innate susceptibility
00:38:01.09 to TB infection.
00:38:03.20 So again, by contrasting populations with different ancestry,
00:38:07.26 and living in different environments,
00:38:09.11 we may identify clues about the genetic basis
00:38:12.10 of differences in phenotypic variation
00:38:14.26 and disease susceptibility.
00:38:17.23 So in conclusion,
00:38:20.20 Africans have the highest levels of genetic diversity
00:38:23.04 within and among populations.
00:38:26.28 The demographic history of Africans
00:38:29.00 and local adaptation to different environments
00:38:31.04 has resulted in population
00:38:33.01 or region specific genetic variation.
00:38:36.25 And we need to be including
00:38:38.21 ethnically diverse Africans in genomic studies
00:38:41.17 to better identify both unique rare, and common variants
00:38:45.28 which may be of functional importance,
00:38:47.28 including those that play a role in disease risk
00:38:50.13 in these populations.
00:38:52.14 And I will just end by thanking
00:38:54.04 the many individuals
00:38:55.25 who contributed to these studies,
00:38:57.29 and my funding agencies,
00:39:00.16 and particular thanks to the Africans
00:39:02.20 who have contributed to these studies.
Related Resources
- Sarah Tishkoff iBioSeminar: African Genomics: Human Evolution and Migration
Speaker Bio
Sarah Tishkoff
Sarah Tishkoff studied anthropology and genetics as an undergraduate at the University of California, Berkeley. She received her PhD in genetics from Yale University and was a post-doctoral fellow at Pennsylvania State University. From 2000-2007, she was a faculty member in the Department of Biology at the University of Maryland. Currently, Dr. Tishkoff is the… Continue Reading
Leave a Reply