Timestamps
(0:00:00) Intro
(0:00:33) Genomic Prediction
(0:05:54) IVF
(0:12:34) Phenotypic data
(0:15:42) Predicting height
(0:28:27) Pleiotropy
(0:39:14) Optimism
(0:45:03) Gene editing
(0:48:27) Super intelligent humans
(1:01:27) Regulation
(1:06:36) Human values
(1:17:38) Should you do IVF?
(1:26:06) 23andMe
(1:29:03) Jeff Bezos
(1:34:29) Richard Feynman
(1:43:43) Where are the superstar physicists?
(1:45:37) Is physics a good field to get into?
Links
Transcript
[00:00:00] Dan: All right. Today I'm talking with Steve Hsu. By day, Steve is a professor of theoretical physics at Michigan State University but he also runs a blog called Information Processing, hosts his own podcast called Manifold, has founded several technology startups, including Genomic Prediction and SuperFocus. Beyond physics, he's an expert in machine learning and computational genomics. Steve, I've really been looking forward to this one. Welcome.
[00:00:23] Steve Hsu: My pleasure.
[00:00:25] Dan: Let's get right into your work of genetics. Maybe a good entry point here is with your startup. Do you mind talking a little bit about the background on Genomic Prediction?
[00:00:33] Steve: Sure. Genomic Prediction was founded, gosh, must be five, almost six years ago. Our motivation was that on the research side, I and my research group and some collaborators had been working on something called polygenic scores. I guess it's now called polygenic scores. From our perspective, it was more or less just AIML on big datasets, like hundreds of thousands of people where you have their genome and then you have some phenotype information or disease history, information about that individual. We were trying to do the most basic thing, which is, if you think about the DNA revolution, what would you like to be able to do with this DNA? What use is this DNA?
Well, you'd like to be able to take the DNA of an individual person and predict aspects of that person just based on the DNA, like whether they're high risk for prostate cancer or whether they're taller than average, or whether they're prone to obesity. These are all really just fundamental questions that you'd be interested in from a basic science perspective. We've done both theoretical and empirical work, actual work with large datasets, and were increasingly confident that it was going to be possible to predict to some practical accuracy lots of complex traits describing an individual person.
If you now think about this, from the business perspective, is there actually any remunerative use of this capability aside from just publishing papers in Nature? What would that be? A unique thing about IVF is that when a family is going through IVF, it is typical for them to have more embryos. Not every family. I don't want to-- Obviously, there are some families that really struggle with this and they're lucky to get one viable embryo, but it's common, or not uncommon, for families to have many more embryos than they want to use. At that stage where the embryo is just like 100 cells, it's very little to distinguish two embryos from each other. What is that family supposed to do?
The traditional method of picking which embryo to transfer was to consult the human embryologist who has lots of "experience" doing this. By the way, this is a theme if you read my blog about "expert judgment" and "fake expert judgment," in which if you study data science and machine learning, you immediately realize a lot of times humans think they have signal in their prediction and they don't have any signal in their prediction. Typical embryologists would say, "Oh, embryo three, that looks like a good one. Embryo two is a little rough around the edges. Look, it's asymmetrical. There's a little glob of extra cells on the left there. Let's not use that."
Actually, if you look into the literature, there's essentially zero support, rigorous statistical support for human embryologists in general or even more specifically, the embryologist at the clinic where you are having any better-than-chance capability to make that decision for you. This is something our critics often forget, is that the thing that we're trying to surpass, the benchmark we're trying to surpass is random chance, is like no information at all as to which embryo is better than the other.
What we knew would be the case is that once you could genotype these embryos, the genomic predictors or polygenic scores that we had built would give you quite a bit more information about, for example, the future health prospects, risks, their risks associated with a particular embryo versus one of its brothers or sisters. That was a logic, [unintelligible 00:04:45] logic that already there's a lot of IVF going on, already there's a lot of people facing what we call the embryo choice problem.
There's essentially zero information as far as we can tell being applied to a real information, being applied to solve the embryo choice problem, but polygenic scores and polygenic risk factors and all that stuff would enable us much, much better than [inaudible 00:05:11] technology. That's the basis of where the company is from.
[00:05:15] Dan: It seems like this work in genomics is at the center of a couple of different disciplines. We're at this, like, what seems to be a happy coincidence with multiple different accelerating technologies right now. I want to pick this apart one by one. Let's maybe actually start with IVF because this one to me, a priority, seems like it would be one of the hardest ones to shoot up a human female with a bunch of hormones and get 20 plus eggs or something, but this has been around for a little while. Can you just describe a little bit more about how common IVF is and a little bit more about what's going on with that right now?
[00:05:54] Steve: Now, just to be completely clear, because I'm a scientist and a professor, I don't take any credit for IVF technology that was largely pre-existing before our company was founded. Although we are making actually important contributions now to IVF, overall, the basic wet lab technology of how to do that, how to produce embryos and transfer them is not due to us. I will say that on the board of scientific advisors to genomic prediction is one of the scientists who was on the first team that did the very first IVF, produced the very first IVF baby in the UK.
We have that person on our advisory board. Our patient advocate in the company, Elizabeth Carr, is the first IVF baby born in the United States. She's the first US IVF baby and she just celebrated her 40th birthday not that long ago. It's been around for a while. The basic observation is that by administering the hormone cycle to a woman, you can cause her to overproduce eggs in her active cycle, and those can be easily harvested. It's basically a nurse with a long needle that is able to harvest those eggs and they're fertilized outside the woman's body. In vitro with the lab.
They're fertilized by a technician and then you generate potentially a large number of embryos, which then are allowed to grow until they're typically about 100 cells. By the time we had come onto the market, it was becoming more and more common, certainly at the best clinics, to freeze the embryos once they reach roughly 100 cells. The reason for that is to give the mother's body some time to recover from the hormonal cycle that it just went through before that transfer occurs. That was found to get better results. Already, the natural cycle is, there's going to be a lag period where the embryos are frozen in liquid nitrogen.
By the way, freezing and thawing of embryos seems to not damage them any way. They seem to work just as well. They're very robust little molecular machines. Basically, you freeze them in liquid nitrogen, you thaw them, and generally, you might have some loss, but generally, it doesn't damage them. Now, it had already before we entered the IVF scene come commonplace for a small biopsy of a few cells to be taken from the part of the embryo which is going to become the placenta. It's not actually the child, but it has cells which have the same DNA as a child that's going to become the placenta later on.
The biopsy would be taken from that for genetic screening purposes. In our startup design, if you're a startup guy and you're introducing a new technology, you don't want to disrupt the pre-existing workflow of the industry. You want to fit your new innovation in such a way that it doesn't disrupt what had already been there. That same biopsy if sent to a genetic prediction, we would amplify the DNA to the point where we'd be able to get a whole genome genotype for the first time. Actually, whole genome genotype for each embryo, and then compute all that predictive polygenic scores and stuff like that.
That's a quick sketch for people who are not really familiar with IVF. In terms of utilization, there are some countries where about 10% of all babies born now are coming from IVF. This is a consequence of women having careers, delaying marriage, obtaining more education, just being older when they finally get to the point where they're ready to start a family. Women's fertility starts decaying. It's just the truth. It's something people are generally not aware of, women's fertility tends to decay already starting typically in the late 20s, early 30s. That might shock you because you're like, "Wait, we have this plan and my wife is going to be 35 when we have our kid." Well, you better look up the statistics.
If you're just a little bit unlucky, by the time you're 35, you could have real fertility problems. Almost everybody has fertility challenges before they turn 40. Where you are and when your fertility decline begins and accelerates, it obviously depends on the individual, but 30 to 40 is typically where that's happening.
Now it's increasingly common or especially like highly educated professionals to require IVF. In countries like Denmark, Israel, the state health care system actually supports the pay for IVF. In those countries, it's widely utilized. The percentage of all births that come via IVF is high. It's like 10%.
In the US and other developed countries, it might be like 5% or 3%. It's a significant number. If you're not familiar with IVF and I just take you to some kindergarten and I point to the kids, there will be IVF kids among that group of kids on the playground that we're looking at. It's a non-negligible component of how humans reproduce today.
[00:11:38] Dan: Okay. I think that's important point to send home. This is this is happening today. People do IVF and you'll be given a set of eggs. You have a choice, which is do you want to pick which one you want to use? What you all are doing is giving them options to say, based on some data, we can give you information about what is likely to happen based on which selection you make.
I'd be interesting to get a little bit into how you all make those predictions. There's kind of two components, right? There's actually gathering the data. We need a bunch of genotypes with labeled phenotypes for different traits. Then, we also need the machine learning models. Actually, do the AWS run or whatever, right? Let's maybe talk about the data first. I'm really curious.
I noticed that in your papers and others, the UK Biobank is referenced a lot. Can you talk a little bit about where data exists today and where most people are getting it to do research?
[00:12:35] Steve: Yes. Most of the data that's available for researchers like us to do our work comes from government-funded large biobanks. UK Biobank is probably the best known one. There's also more than one funded by the NIH. There's one called All of Us, which is approaching, I think-- Well, it's not quite at a million, but the target is to get to a million individuals.
There's something called the Million Veterans Project, which is run by the health care arm of the Veterans Administration, which is also around a million people. My lab in particular also collaborates with the Taiwan biobank. In Taiwan, we have available something like a million genotyped individuals to analyze. There's a biobank in Finland. There's one in Japan. It's all all over the place, really.
However, it is true that most scientists in this field spend a lot of time trying to get access to data. A lot of what we're doing is data limited, not algorithm limited. I would say the innovations developed by my group in terms of ML algorithms to do this kind of work. They're already good enough, actually, to work quite well. The main thing we're waiting for is actually just to get more data.
You might say, "Oh, well, wait. Don't you have millions of people to analyze?" Yes. We need to break it down to, say you're studying a particular disease. What you really need to get signal is you need to compare the genotypes of cases. Cases in medical research means individuals with the disease condition and control, people who don't have the disease condition.
If the incidents of the disease is 1% or 5%, suddenly, the number of cases is much smaller than the total size of your biobank, right? Most of your signal is really coming from the cases. In the case, for example, if you're setting cognitive ability, it's very hard to get cognitive scores, for example, for individual samples. It's a complicated field. It's now a big field.
When we started, when we published our first paper on height in 2017 where we successfully predicted height with a pretty good accuracy, a few centimeter accuracy, which was a shock to the research community. From that point where we were one of only a few groups really focused on polygenic scores. Now, I think each year there must be, I'm guessing toward a thousand issue papers published every year which are doing something with polygenic scores. Now it's become a big international area of research.
[00:15:17] Dan: Let's dig in actually to the height thing because this seemed to me actually a really big deal. Tell me if I'm getting this right. Was it that you all predicted how much data ahead of time you would need to accurately predict height? Then, you got that data and you were able to accurately predict it. Is that roughly how this went? That seems like a really big deal. Could have implications for other traits moving forward as well.
[00:15:42] Steve: Correct. I won't get the years exactly right. I'll get them roughly right. I think in 2014-ish, around 2014, maybe it was a little bit earlier than that, we published some theoretical papers. At that time, there were no big biobanks available for analysis, but there were smaller data sets.
We did some theoretical work. There's a bunch of fancy math here that goes under the name compressed sensing or L1 penalized regression. There are even papers, very well known papers by this-- in which one of the co-authors is this Fields medalist Terry Tao on this subject. There's a purely mathematical question of if I give you noisy data of a certain kind and you use a particular set of algorithms, how much data do you need to solve the problem to recover the signal fully? That's a signal processing problem or you could think of it as information theory problem, whatever you want, problem in analysis. It's pretty widely studied mostly in the math, applied math community, a little bit in electro engineering, in computer science. Much less so in genomics, even computation genomics.
Very few people in the field of computational genomics are aware of these compressed sensing theorems, which I was aware of. Actually in deciding to work in this field-- that was about when I decided to work in this field -- being a theoretical physicist, I said I'm not just going to randomly choose some area to start working in, I need to understand theoretically what is possible? If I'm going to invest some years of my effort in this direction, I want some theoretical guidance that, oh, it could turn out like you need hundreds of millions of genotypes to get anywhere. If that's true, when I do this back of the envelope calculations, it's going happen for a long time, 20, 30 years. So not interesting to me.
If the answer had come out like that, I would have been, like, let some biologist slave over this, whatever. I did that preparatory theoretical work. That is common in physics. In physics, theory is very highly developed. Before you do anything, you do preparatory theoretical work to understand the problem and then you decide, is this worth doing? Is it worth building this accelerator?
Is it worth putting this satellite in space to look at the Big Bang? We need to understand a bit more about the problem before we invest resources in it. Biomedicine is completed the opposite. Biomedicine is just wild-ass speculations about stuff. Blowing huge sums of money. NIH has more money than all the other sciences and engineering research budgets combined.
Senators understand dying of cancer, but they don't understand anything else. Basically NIH gets all the money and then biomedicine, it's just throwing throwing money in the air like this. They don't have any theoretical understanding of what they're doing. They're just trying stuff. They'll reject what I'm saying.
This is what any theoretical physicist or mathematician or computer science who gets involved in biology will tell you. They're actually in a way not respected. They wouldn't respect a mathematician because generally math is not that useful for them. Math doesn't help them with what they're doing. They have to do this very experiment-intensive, empirically driven stuff.
That's just how their field is because living systems are so complex. What I'm saying is I think not politically correct but it's easily verifiably true. For my own effort I said let me do this theoretical analysis first to figure out whether solving the genome, in other words, predicting phenotype from phenotype, is that a solvable problem with realistic amounts of data and compute that I'm going to have available to me? Or should I just give it a skip and continue thinking about black holes and quantum fields?
I went through that. We published our papers, I think by 2012 or 2014, we had published the results. Which is using some very fancy math with something called the Donoho & Tanner phase transition in compressed sensing. We were able to use that to calculate and predict how much data it would take to "solve" a complex trait using realistic genomic data. We predicted that as soon as we had at least a few hundred thousand genotype individuals and we had height measurements, we would be able to build a reasonably good height predictor. The very first moment when that data set of that kind became available was 2017 when the UK Biobank released its first instances and allowed researchers to apply for access.
Within a month of getting access to UK Biobank data, we had built predictors that had this few centimeter accuracy. It was shocking to the genomics community. If you look at the journal Genetics where we publish the paper, Genetics is the preeminent genetics publication in the United States for those things. It's the journal of the American Society of Human Genetics. It's the leading journal in the US specifically about genetics.
Our article is an editor's starred article or something. They have some way of putting a gold star on an article that it means the editor thinks this is an important advancement in the field. That's an editor's selected article in the fall-- I forgot what it was. I think we posted the pre-print in the fall of 2017, and the paper was published in early 2018. It's an editor's selected article. I have the referee report, so you can see the referees saying like, "Wow, I can't believe this is possible."
People now, so younger researchers who maybe entered the field since then or maybe they were already just starting grad school back then, they will say, and maybe they're even being sincere, but they don't know the history of their own field, they will claim that none of this is surprising. Like, "Oh, we always knew we were going to be able to do this," and something like that. It's all bullshit because if you were in the field in 2017, people were saying like, "Oh, wow, there's all this missing heritability. I don't know how we're ever going to solve this missing heritability problem. I think all genes interact with all other genes, so therefore, modeling complex traits will never be possible."
If you're a serious historian of science, you can go back and analyze what were people saying in 2015, 2016, 2017. Up until the point our paper came out, you can see they thought what we were doing was impossible. It's easy for people to verify this. Now, they will deny it because now polygenic score is everything. It's a big deal. It's a big research area, and they'll claim, "Oh, we knew this all along," whatever. This is like insider baseball for how science works.
There's a joke that says the reaction to scientific discoveries go like this. First reaction, "It's wrong. You can't do that. What are you talking about, Steve? All genes interact with all other genes. You'll never be able to predict this. It's highly non-linear." Number one, it's wrong. Then the next reaction is, "Maybe you're right, but it's trivial. I knew it all along, right? I knew it all along. Of course, you did, but you did it, but it's not that big a step. We all knew it was possible." It's wrong. It's trivial. Then the final one is, "I did it first."
[00:23:32] Dan: [laughs]
[00:23:34] Steve: The person says, like, "You were wrong," and then, "Oh, your result is trivial." He's now saying, "I did it first. Please reference my paper." That's like a common cycle for acceptance of new scientific insights. I'm not the first person to formulate it that way. Even people like, I don't know, Heisenberg and Planck and other people said the same thing. Quantum mechanics, they said, "It's wrong." "Okay, you guys might be right, but this is trivial," and then like, "Oh, wait, I did it first. I knew it was going to work like that."
Anyway, so we predicted how much data we would need. When that amount of data became available, we solved the problem, and now, our result was replicated by several other prominent groups, and now, it's even been replicated more thoroughly by a study which used, I think, several million individuals for whom they had height. Now, it's basically incontrovertible that trait as complex as human height, for which you'd need to know the state of about 10,000 individual regions of your genome, so about 10,000 different genetic variants that are used by the ultimate predictor in order to calculate the estimated height. There's 10,000 individual, low side, different effect sizes.
Now, again, in 2017 or 2015, people would have said it is insane to think that crazy data science AI/ML guys will be able to build something that complicated that depends on-- think about biologists, right? How many biologists can count to 10,000, right? They would say, like, "It's insane. Genes all interact with each other. You guys will never be able to build a predictor that uses 10,000 different genetic variants scattered over all chromosomes that impact this really complicated trait," but now, it's been fully validated. Anybody who knows about computational genomics agrees. There are very good height predictors. They use about 10,000 SNPs.
Furthermore, even though I just stated, like, "It's impossibly complex from the viewpoint of a lab biologist," from the viewpoint of information theory, from a computer science viewpoint, it's incredibly simple because there are 3 billion different variants that could have affected height, 3 billion. 10,000 out of 3 billion is a very low fraction of all regions of DNA that actually matter.
Actually, as we predicted, height would turn out to be this sparse trait. All traits, actually, all complex traits of humans are sparse. They only depend on a small fraction of the 3 billion variants that exist in the genome. Furthermore, the effects are largely additive. It's a largely linear model that accounts for this. It's not even this weirdly non-linear thing [unintelligible 00:26:49].
Anyway, in a way height is the poster child for what can be done in complex trait genomics. It's pretty well understood, at least by the people who are experts in this field. I think the understanding of that is not propagated that far outside of computational biologists or, really, computational genomicists who really actually [unintelligible 00:27:14] computer scientists and statisticians, to old school genetics scientists who made their whole career about one gene or something, they don't use a lot of math, that group doesn't fully understand what I just said to you. They're going out to other biologists who don't deal with DNA data on a regular basis. They have no idea what I just did. Anyway, so that's my history of this field [unintelligible 00:27:40].
[00:27:41] Dan: I can remember back in-- I was the grade school in the 2000s, but there was a big deal about the Human Genome Project, right? There was like Dolly, the clone sheep, and it seems like there was all this optimism about the field, and then it went dark for the next 10 or 20 years. Everyone got excited about the iPhone or whatever. Now, this stuff is so recent, and what has been shocking me as a layman is exactly what you said. Intuitively, I would think that these polygenic traits, they're just impossibly complicated, and maybe if you reduce the risk of diabetes, you're jacking up cancer or heart disease or something, but you guys have also shown that typically, that is not true. You could even create a generalized health score. In some cases, there's even mild optimistic effects for other health measures. Is that right as well?
[00:28:27] Steve: Yes. It's a great point that you're raising. I just want to point out that someone who approaches this subject with an open mind, and I think your background is in engineering and software development. If you have decent math chops and a scientific technical background and you just approach this area with an open mind, you can read papers written by our group or other groups in the field and understand, like, "What is being said here now about genetic architecture?"
Because up till now, although we could read out genomes, we didn't really know what they did, right? That was the Human Genome Project that when you were a kid, around 2000, they sequenced the first human genome. Those guys then, as always, with all biotech entrepreneurs and tech entrepreneurs overhyping what's happening, right? That was just the ability to read out an individual DNA.
The analysis by which you figure out, "What can I predict based on that?" took this next 20 years, basically, and accumulation of large numbers of genomes, right? Another fairy tale that was invented by biologists when they literally knew nothing about-- the first genome had not been sequenced, they knew nothing about machine learning, they knew nothing about compressed sensing, et cetera, there was this weird fairy tale invented by them called pleiotropy. Pleiotropy says, a particular gene or a particular locus in your genome is bound to have multiple effects. Right? Exactly why that had to be true-- Okay. You could say it this way, a gene, by definition, is a region of DNA that codes for a protein and there are only about 20,000 different genes in the human genome. Of course, between each gene there are tons of other DNA switches that are doing other things which people, until recently, had no idea what was going on with these other things.
Of course, from an information theoretic standpoint, you realize it can't just be the information in the protein-coding regions, there had to be these other switches that are controlling things. Otherwise, how could I be that different from a worm? Even though we have pretty much the same-- we're using the same proteins, different, slightly different variations of these proteins, me and the worm, right? But we're very different, right? There must be other information that's involved in making me versus making the worm.
I think to be totally blunt, a high IQ kid who's 12 years old who reads a popular book about DNA would understand this, but it's still today confusing to most biologists. There was this feeling that because these genes are everything, so for a long time these guys thought genes are everything and this is junk DNA, okay? Genes are everything, but there are only 20,000 genes.
Therefore, if I modify one gene, I have to affect lots of different things because for sure that protein is playing a role in lots of different things going on in your body. Hence the idea of pleiotropy. Now, once you have predictors that-- we have predictors for 20, 30, 40 different complex traits ranging from diabetes risk, schizophrenia risk, to height to BMI. You can just look at the predictors, and they're sparse.
Remember, they use only a fraction of all the information that's on your genome. Each one of them is only using a fraction. Now, the fraction is scattered all over the genome on your chromosomes, but it's not using most of the SNPs, most of the genetic variants on your genome. You can just ask this trivial question. You can say like, "Well, how many of the SNPs used for height are common to the predictor built for schizophrenia? How many of those are common to the predictor built for heart disease? How many of those are common to the one that's predicting your diabetes risk? Or whether you have brown hair?"
Someone, again, who knows a little linear algebra, but most biologists don't know any linear algebra, would say, "Oh, why not compute a correlation matrix between these predictors? Look at the correlation between variance accounted for between these different predictors, and let's see how disjoint-- using a fancy mathematical term, let's see how disjoint are the input SNPs, the sets of input SNPs for each of these different phenotypes, different traits.
How disjoint are they? Are they very disjoint, in which case pleiotropy is bullshit, or are they hopelessly not disjoint and overlapping, in which case pleiotropy is correct. Turns out they're largely disjoint. Now, how many biologists actually understand this today? Very, very few. You have to understand something about like information theory or maybe at least linear algebra, correlation matrices, and polygenic risk predictors or trait predictors.
Then combining that, it's trivial to then, "Oh, gee, that old term in the textbook I had in graduate school called pleiotropy, whatever happened to that?" Then if you understand these three or four different conceptual areas, you can just immediately answer the question, but again, hasn't propagated very far. Now, another way to say this is the following. It's just like basic math that when two physicists talk, we're always talking like this kind of math, but again, not everybody can do this.
If I'm trying to explain all these results to some guy who his day job is studying black holes in Anti-de Sitter Space or something, and he's like, "Steve, I hear you've been dabbling in this biology stuff. Why are you doing that?" Then like, "Well, what did you guys find out?" I'm explaining all this and then the guy will immediately ask me, he'll say, "Well, between any two humans, how many individual variations are there? My genome and your genome, how many differences are there?"
The answer is it's on the order of a few million out of 3 billion base pairs. It's about one per thousand. There are millions of ways in which my genome is different from yours, right? If the most complex trait that we found so far, which is height, uses 10,000 of those variations, if I divide a few million, let's suppose the average predictor, polygenic predictor is using a few thousand genetic variants. Let's suppose I divide a few million by a few thousand, the answer is 1,000.
Physicists are always doing these [unintelligible 00:35:21] math calculations, right? It turns out there's enough information between any pair of two humans to specify about 1,000 different complex traits if those complex traits were all independent of each other. It's a trivial information theory calculation that says the common variation found, differences between two randomly selected humans is about enough to account for 1,000 completely independent complex traits which are roughly like the complex traits which we have studied so far.
That means if you're playing Dungeons & Dragons, and normally in Dungeons & Dragons, your strength is independent from your dex, which is independent from your intelligence, which is independent from your wisdom, which is independent from your charisma. I forgot if there are more. Constitution, maybe it's six. Characters in D&D live in a six-dimensional independent-- six independent dimensions, right?
It looks like humans, given the amount of information in our genomes and the amount that we vary from each other, could live in 1,000-dimensional space of complex traits. Even assuming exactly zero pleiotropy, exactly zero pleiotropy, where each set is entirely situated from the other. I don't know if I went through that too fast, but it's kind of trivial logic. Trivial for some people logic that says, wait a minute, there might be some pleiotropy but actually, there are a lot of independent ways that I can tune people's genomes and get what I want.
The size of my spleen is actually not directly related to my IQ and the length of this finger could be somewhat independent of the length of this finger. There are 1,000 different things, ballpark, that could be basically not interfering with each other or could be interfering only weakly with each other. That's the situation which if you look at papers we wrote like 2020 and 2021 or something, all this is laid out in those papers, but not understood by a lot of people.
By the way, part of the problem is that to get money from NIH, you have to focus in on one narrow thing, maybe one disease or something. Then people who know the diabetes polygenic risk index, they know that, but they know nothing about heart disease. They know nothing about height. They would never study height. They would not study IQ. They don't even know what IQ is.
Of course, if you're a jeweler with these magnifying glasses and you're looking only at one part of this watch, you don't know what the weather is outside. That is kind of how [unintelligible 00:38:08] specialized in biomedical sciences.
[00:38:14] Dan: This seems like-- I keep using the word, but happy coincidences. It's a miracle, right? It's like, oh, wow, there's actually room for potentially a lot of important traits that we can just maximize with potentially no downside. I don't know if you're like a sci-fi person and you get excited, maybe some people are repulsed by this, but if you get excited about it and think it could improve human outcomes in the future.
I'm curious, let's put the societal barriers and constraints aside. Things like data collection and general ethics of choosing between embryos and things like this, but from a purely technical point of view, what do you think are the biggest barriers to us in the future being able to have people that have substantially better health outcomes and they're more fit, they're better looking, they're smarter. Do you see any real potential technical gotchas or are we looking at mostly an issue of ethics and data?
[00:39:14] Steve: I think you have the right perspective on this that in a way things turned out in the most positive way. Because it could have been that the encoding of genomes is so complicated we would never decipher it or we would need 100 times more data than we have today in order to decipher even much simpler traits. That could have been the case.
That was the main thing I was worried about. I was mainly worried about nonlinear interactions making the code space much more complicated than it turned out to be. That was my main worry in 2015 or something like that. Then once we got the data and started working with it, we realized, nah, it ain't like that. I gave an interview to The Sunday Times a while back, and the title of it was something like yes, superhumans are possible, because that's what the science is telling us.
Now, again, most people are not aware that the science is strongly pointing toward the answer being yes, superhumans are possible, but I'm telling you, yes, superhumans are possible. If you look at one of the things we did in my lab is we created an overall health index. We took genomic predictors for 20 different disease risks, the major disease risks that kill people. We took about 20 of them and we created an index summing over each of the risks for those major disease conditions. Then weighted by the life expectancy reduction typically associated with those conditions. Most impactful has the biggest coefficient and the least impactful has the smallest coefficient. You just sum it up and you make a health index.
Similar things have been studied by other groups too, like the Finns also studied this. They have a nice study of their Finnish population gene bank, also using a similar longevity index. It looks like to us because, again, the individual SNPs that are controlling health risk one, like heart disease, versus diabetes, number two, lung cancer, all these things, most of those sets of SNPs are disjoint. It looks to like us like we could, just using what we already know, the information that's in the predictors as it is, you could hypothesize a human who's a low-risk outlier for each of those 20.
When we genotype people or embryos, we can just compute this health score, right? We can see like, "Oh, Johnny is low risk for this, medium risk for this. Oh, poor Johnny. He's a high-risk outlier for high blood pressure." We can do that. Now, there's no reason as far as we can tell judging by the specific genetic variants, the individual SNPs that are involved, there's no reason there cannot be a person who's botton-- one percentile risk, low risk for each of these 20 patients. Who knows how long that person would live. That person might live 150 years.
The number of people who have been simultaneously low risk for all of these conditions that have ever lived might be zero. There haven't been enough humans around to realize the luck involved in being a one-percentile positive outlier in terms of low risk for all of these killer conditions. Maybe that person has never lived. Maybe it's 0.01 to the 20th chance of this happening or something, right?
You could get there in principle by engineering or by having a huge number of embryos we're able to choose from. As far as we can tell, this particular phenotype, longevity, or disease risk, that's not what people are afraid of. People are not afraid of for us to talk about this, you can even get funding from NIH to talk about longevity predictor, whatever. The ones people don't like are the ones like intelligence. The question is like, "Oh, well, if I do this trick with intelligence, I look at the predictor--" Genetic predictors for intelligence are not that strong yet, but you can at least see some trends.
It does look like there's a lot of variation of the graphs there, and you could shift the mean. You could move an individual, many, many standard deviations if you wanted to. In principle, if you had perfectly accurate editing capability, how far could you shift an individual in their IQ score? The answer is you could shift them really far, probably beyond any human genius, the historical genius that has ever lived. That particular analysis, all of a sudden now, if you're a left-wing extremist, whatever, or woke person, you would say, "Oh, my god, you're talking about shifting intelligence. You must be a eugenicist." Like, "You must secretly salute the Nazi flag in your basement or something." That one's a little more fraught, but the math is basically the same math we were talking about when we were talking about longevity.
[00:44:34] Dan: Let's talk about intelligence. We've been building up to this, but I think this is where it gets really, really interesting. One of these technologies that we need other than the data and the training runs is we also need the ability to either pick from a bunch of embryos, which is going to be a generation-by-generation improvement, or we can edit an existing embryo. You mentioned it in that response, but we haven't talked much about editing. What does the current state of editing look like?
[00:45:03] Steve: Yes. Of course, editing a human embryo is a big no-no. Certain guys spent a few years in jail after doing this in China. Obviously, the first thing we should say is, well, this is not considered an ethically okay thing to do. We're just talking about science fiction. You and I just happen to be interested in the character Khan from Star Trek. You're too young to remember this, but anyway. Okay, we're having a science fiction discussion. It's about Dune. In Dune, they were trying to breed the Kwisatz Haderach, okay? That's the novel Dune. We're talking about science fiction now.
Currently, there's been continued improvement in CRISPR technology. The thing that you're contemplating though is a situation where you might make hundreds of edits to an embryo. There are a hundred different or hundreds of different places where you figure out, "Okay, I want to make a change here." Then you do it. Currently, the technology, as far as I know, does not exist to do that without also having a significant risk of off-target edits, so edits that were unintended by you. There's a wet lab gene editing, their biological limitation right now to our technology that makes this hard.
The other limitation, which is a little bit subtle, is that when we build these predictors, we're using a particular SNP, the state of your genome at a particular place in order to help us make the prediction, but we don't know that that particular SNP that we're using in the predictor is itself causal. The state of nearby SNPs, because they tend to be inherited together, are correlated. We might be using a tag SNP in order to do the prediction. We only need correlations to do the prediction. We might be using a tag SNP to do the prediction, but the causal SNP is actually next to it. If we edit the tag SNP, but not the causal SNP, we get no effect.
The more difficult problem, which is more of an information-theoretic or computational problem, is to determine not just which snips are enough for me to predict the phenotype, but which ones are actually causal because I actually want to change the phenotype. That is also an unsolved problem. That problem will require a lot more data than we currently have because of this problem that it's hard to tell. In almost all the people in our gene bank, the state of the SNP is the same as the state of-- these two states are correlated like at 0.9. It's very hard for me to tell whether this is the causal one or this is the causal one. You have little clusters of these things in your genome. There's a technical problem at the computation level that is a roadblock to gene editing supermen.
[00:48:12] Dan: Okay, so gene editing, it sounds like there's still some work to be done. What about if we just iterated over a bunch of IVF generations? I think there was a paper done on this actually that tried to figure out how many standard deviations you could get out of intelligence.
[00:48:27] Steve: Well, it depends on a lot of parameters like how many embryos you have to choose from and then also how good is your predictor for making the selection. These calculations are pretty straightforward. Once you make the model assumptions, you could do the calculations. The problem is we don't know the parameters because we don't know how good our predictor for intelligence will be in 2030 or 2035, et cetera. There are a lot of unknown parameters that would have to feed into the calculation.
[00:48:57] Dan: How good do you think it'll be for intelligence if we had just straight IQ tests, the best IQ tests that we have today?
[00:49:03] Steve: If you had well-phenotyped individuals, so they had been given even just the SAT or some very good IQ test, cognitive test, and you had-- my estimate is if you had, let's say, a few million people with such phenotypes attached, you could then build an IQ predictor with accuracy maybe plus or minus 10 IQ points. From that, then you could start selecting. Well, it also depends on how many IVF resources you give to people. If it becomes a social convention for women to freeze a bunch of eggs, do extraction cycles when they're 20 and maybe they could get 100 eggs and freeze them and then use them later in life. You're selecting the best out of 100 and you have a pretty good predictor. You could be moving the mean in the population, like, one standard deviation every generation. That would be huge because in a few generations you would have an unrecognizable human population. Like you'd have-- The whole population of the planet is like the student body of MIT or something. Yes, of course this is all hypothetical science fiction and possibly morally wrong.
[00:50:25] Dan: Just based on the human species, how many standard deviations is it possible to go-- to push this intuition a little bit, if you think of like, if we just engineered dogs, presumably there would be a limit and we wouldn't have dogs that could-- Maybe they could get close to us today, but presumably they won't have the same limit that humans do. Is there a way to think about that?
[00:50:47] Steve: It's pretty hard to do it from a mathematical perspective. What we can conclude by the fact that the level of polygenicity of cognitive ability is probably at least as great as height, so it's probably at least 10,000 different variants that are controlling the common variation in cognitive ability. There's a little math involved in this next inference, but it turns out it's the square root of that number that determines how many standard deviations are up for grabs. If you take the square root of 10,000, let's say very conservatively, you could say there are at least 30, maybe 100 standard deviations up for grabs. You could shift the mean.
Now again, like a standard deviation is 15 IQ points. If you shift 30 standard deviations, the IQ went up by 30 times 15, which is 450 IQ points, and we have no idea what the hell that means, because we're used to thinking about variations of 10, 15, 20, 30 IQ points, not 400 IQ points. The inference that you can make is that there's an unimaginable amount of variation up for grabs.
If you think like, that's crazy, Steve, some other limiting factors are going to intercede before you get plus 100 or plus 200 IQ points. Might be true, but our experience in agricultural breeding where the similar analysis applies, so if you look at a plant-- by the way, this whole field that I'm talking about, which again, I'm always like taking the piss from these stupid wokesters, is like, they don't like me talking about this stuff, but if they go down to the Monsanto lab or the Iowa State University Agricultural Breeding Center, people are doing essentially the same mathematics with plants and animals in an agricultural setting. Oops, I guess it's not BS.
In the agricultural setting, they have a similar situation where the milk production of a cow or the number of eggs laid per month by a chicken or the rate of growth of corn or the size of the ear of the corn or the drought resistance of the corn, they can tell by the same analysis I just gave you for IQ, they can tell that there are many, many standard deviations up for grabs.
If they aggressively start breeding these plants or using polygenic scores for selection, which by the way, now has become completely standard in breeding of dairy cattle and stuff like this. I think maybe even in for some cases like breeding chickens and stuff like this. There are many, many cases of multi-multi-standard deviation shifts that have been accomplished, actually accomplished by animal breeders and plant breeders. The eggs that you ate for breakfast are laid by chickens that lay almost one egg per day. In the while they might lay like one egg per month or maybe a couple eggs per month, but these chickens are laying an egg almost every day. The chickens that are populating all of our farms, they're ravenously hungry. They just want to eat and lay eggs. They are in the wild population less than one in a million, one in a billion wild chickens from the old population were anything like the modal chicken on a farm today.
That tells you right away, like stuff that's unimaginable, like when you look at the wild guy, can be produced through controlled genetics. This is just true. If you ask any animal science guy, it's people who do animal breeding, plant breeding, they'll just tell you yes, of course, that's how we have agricultural revolutions and that's why farms are so productive today and yada, yada, yada, yada. I think it's very unlikely that as a purely scientific statement, what I'm saying about what's possible with human intelligence, I think it's very unlikely that I'm wrong about this.
Now, it could be like, "Oh, the humans that are plus 100 IQ points, whatever, they start to have to have bigger brains and their their skulls have to be bigger." Brain size is correlated at about 0.4 with IQ. Other things could happen that you don't like. Like, oh, maybe all women would have to have cesarean sections in the future because the babies' brains are getting so big because you did this weird genetic editing to them. I'm not endorsing any of this, but the fact that there are many standard deviation up for grabs, I think for people who actually understand the science is not arguable.
[00:55:51] Dan: Just so people understand, a 130 IQ, as you said, that's like two standard deviations, so that's like you're in the top 2.5% of the population. A 145 score, you're in the top 0.14%, you're getting into 160 to 200 is like Einstein, speculative on some of the smartest people of all time. What this is showing is using the same logic we do with the chickens that we eat today in my kitchen, you could theoretically push this to get somebody that's up in the 500s or higher.
[00:56:27] Steve: Yes. Now, I don't know what it means to have an IQ of 500 or this or that, but I know that person probably is a hell of a lot smarter than me.
[laughter]
Now the other funny aspect of this conversation, it's like, if you can tell like I'm sick of talking to biologists and stuff or biomedical people, like yes, it's true. I spend most of my time these days working on things more related to AGI and AI and large language models, stuff like that. There, if you say like, "Oh, you know what, we're going to 10X the training data and 10X the compute resources, and I have a slightly better algorithm, am I going to get an AI which is significantly better, smarter than GPT-4?" People are like, "Yes, wait, Steve, that's what we're doing. What are you talking about?" Oh, oh, you were in biologist mode or something, you couldn't understand things for a while. Like why would that be shocking to you?
Why would it-- like right now, GPT is not good at certain things, but the guys who are working on this are very confident that we're going to pass some threshold and suddenly the new models will be magically really good at some of these things, like theory of mind or doing mathematics or whatever it is. That's not shocking to them. For people who think very deeply about intelligence, the operation of an information processing neural network, which by the way, my brain is one, and your brain is one. People who think about that stuff don't think it's crazy to talk about something that is a quantum leap in capability beyond the previous one due to some improvements.
Well, could those improvements be encoded by genetic changes in the organism? Why not? Why the hell not? Like why is it that my dog will never understand doorknobs, but my kid will understand the doorknob by the time he's two or three years old or something? Wow, it must be God or something, the Bible or what could possibly explain that? Oh, I guess it has something to do with DNA, except we shouldn't talk about that for reasons or something. Yes, it's kind of ludicrous.
The idea, like there are people, there are biologists who, like, they'll read some popular article about genetically engineering super geniuses, and then they'll give you like five fallacious reasons why it's impossible that we'll ever genetically-engineer superhumans. They'll exhibit, they'll put their own stupidity on public display by giving you like five fallacious reasons that have nothing to do with the problem. Which if you map their reasoning onto what people are doing, training neural networks, it's just obvious what they're saying is stupid, but anyway.
[00:59:15] Dan: I mean you could stretch this analogy because it plays into the whole concerns about AGI, it's like, well who's going to get smarter faster? [laughs]
[00:59:25] Steve: Oh, this is the main question. I think for the most deep thinking minds in our society today, main question is at what point will machinic intelligences overtake human intelligences? One question. Second question, is there any hope for the wet intelligences to improve themselves, to affect that race? The race between-- we're not getting better, we're actually getting worse if you calculate carefully, and then they're getting better, but the fact that we could get better at accelerated rate through these biological technologies. The average people don't want to talk about this, but serious people are thinking about stuff like this. Yes, that's a question of our age actually.
[01:00:22] Dan: It's also almost like symbiotic too because some of the hardest problems of creating smart intelligence are human intelligence to build the thing to hit escape velocity. You get smarter humans, they could be the ones that are now pushing the machines a little bit further.
[01:00:39] Steve: A conversation that I regularly have with billionaires who are very focused on AI is whether improved humans, like if we suddenly were able to make much smarter humans, would have a better chance of solving the alignment problem so that the risk of the machinic intelligence is doing bad things to us is reduced. Should we try to slow down AI developments so that improvements in human intelligence can have a better shot at solving the alignment problem? Not solving but improving engineering around the alignment problem. These are serious conversations people have. Again, it sounds like science fiction, but serious people are actually having these conversations.
[01:01:27] Dan: It's here and maybe this is a good segue into we laid the groundwork for what is possible, but we glossed over some of the societal challenges to actually getting here. One thing I wanted to ask you about is when I think of the really groundbreaking stuff, it seems like you have nuclear on one end and we've proven to be pretty good at regulating it. It's very physical. It's not easy to get your hands on plutonium if you don't know what you're doing or have special authorization. Then you have AI on the other, which is the jury's still out.
It's like once you do a training run, you can gate the thing behind an API, but unless you're confiscating like H100s or something, it's very difficult to actually understand how you're going to stop people from doing these big runs. This seems to fit somewhere in the middle. I think at one time, I did 23andMe quite a while ago when it was first coming out. You could just download your raw data. There are some sketchy sites that will tell you things that 23andMe either doesn't want to for reputational reasons. It might not even be real science, but they can try to tell you more about your genome than 23andMe will give you.
The question here is basically, once someone does one of these runs and let's say they get their hands on the intelligence data and they do it, what is to stop me with my 23andMe, or an embryo that could be my child from me just sending it to them and saying like, "Hey, can you go pick the best one?" It's just something that's going to be really challenging actually in the long term for anyone to stop from happening.
[01:03:10] Steve: I think the main difference when you compare these genetic technologies to either in Silico AI or to nuclear weapons is the barrier to entry to do the genomic stuff is actually low other than the difficult part of assembling large data sets, that's hard. Once you have the large data sets, the computational analysis and the IVF stuff, it's kind of low CapEx stuff in comparison to the other two. Now, the part that's slow though about genetics is it proceeds only at a human generational timescale, so you have a lot of time. If you made some mistake and you screwed something up for this generation, you have a lot of time to study them and fix it.
I think the nightmare scenario is about runaway human genetic engineering or whatever are actually way overblown. There might be a particular family that suffers or even part of a generation of people that suffers, but the human species is not going to destroy itself by genetically engineering itself. I don't think that's a huge risk. More risky is their ability to genetic engineer viruses and things like that. That's a whole different thing.
I don't think there's going to be any stopping the genetic technologies that we're talking about because even if the United States is taken over by wokesters, and by the way, just as a historical footnote for your listeners, if you want to have some fun, go look up a guy called Lysenko in the Soviet Union. Under Stalin, the study of genetics was actually forbidden and lots of scientists were thrown into the gulag for studying genetics because it violated socialistic tenets of the equivalence of individual men and stuff like this. Anyway, long story, but the point is, in America, we're flirting with ending up in that place, but it's not going to stop in China or Taiwan or Japan.
The other people are going to eventually get millions of IQ scores or college entrance exam scores link to genomes. Yes, eventually there will be predictors. These predictors can easily run on your phone or on very limited hardware. Eventually, this cat cannot be kept in the bag. Even if it is kept in the bag by some risk-averse woke Americans or Europeans, it's not going to be kept in the bag in Asia. No way. There's no holding that back, but it's going to play out over multi-generational timescales, and so my AI friends, it's like I have different groups of friends. I have the physics friends, I have the AI friends, I have the computational genomic friends.
It's like when the AI friends look over at the computational genomic friends who are like, "Don't those guys realize everything they're doing is going to be replicated by some AI in 20 years in five seconds." Who cares how smart humans are? Because by that time, machines are going to be a like hundred times smarter. In a way, from that perspective, the issue is like, well, the machines are going to surpass us, and then at some point, the future of the planet will be in the hands of the machinic intelligences, not the eighth brains.
[01:06:37] Dan: I wonder too, if you just surveyed the United States, how many people would actually care about optimizing for intelligence? What if instead we just get everyone looks like Brad Pitt and Angelina Jolie or celebrities?
[01:06:50] Steve: Oh, totally right. Many more people care about that than care about brain power. The smart kid was never the most popular kid in school. It's not even very easy for people who are more in the middle of the distribution to even understand how the people in the tails drive progress. Because by definition, if you're in the middle of the distribution, you don't know how a large language model works. You don't know how a thermonuclear bottle works. You don't know how your microwave oven works.
You don't know how your internal combustion engine works, so you don't even know that, "Wow, if I did a really careful accounting, I'd notice predominantly, it's these weird smart kids that we didn't like in high school who are responsible for all of those things that I just listed," which each of which has totally radically changed society, but these people don't know it or care about it. Much more media to them is like, "Is Johnny going to be popular in school?" I just watched the quarterback's documentary on Netflix. Did you see this?
[01:07:56] Dan: I didn't, no. I'll have to check it out.
[01:07:58] Steve: Anyway, the point is, to them, to people watching that, the Quarterback. NFL quarterback is acme of human development. It's like Kirk Cousins is Superman. You're an MSU guy, right?
[01:08:13] Dan: Yes, I was there during Kirk Cousins.
[01:08:16] Steve: Well, Kirk Cousins, I don't know how he was in college, but in the documentary, because he plays for the Vikings now, and he's not the top level for NFL, but he's a really good quarterback. If you look at his life, how disciplined the guy is, the guy's like totally fit and ripped, he's doing everything he can to be a better quarterback. Every weekend he engages in a heroic competition, which uses his brain a lot actually as well as his athletic gifts, but the average person can understand that. They can say like, "Wow, if you watch this documentary, you'll come out of it thinking things like Patrick Mahomes and Kirk Cousins, Mariota even. These guys are the ideal.
Their lives are heroic and exciting and everybody can understand why they're awesome." Whereas like, "Wow, I heard these two geeks sit around for an hour just talking about some math and DNA or something. What the hell is that all about? I wouldn't want my kid doing that?" That's my view of it.
[01:09:18] Dan: Maybe it's like the great filter you get just smart enough to get gene editing, but then the preferences of everyone just shoots us towards just completely hedonistic lifestyle and making everyone star athletes and just living for pleasure and fun.
[01:09:31] Steve: Oh, yes, it could totally go that way, but the thing is that there are enough people who are smart and want smart kids that they are going to shoot off in some direction. Let's imagine a world where, I don't know how big of a Dune fan you are, but let's suppose we have the-- Do you know what the Butlerian Jihad is?
[01:09:52] Dan: Yes, they kill all the robots or they get rid of technology.
[01:09:57] Steve: Yes. In the Dune universe, the humans have a close brush with AI taking over, but they defeat the AI's. Then they passed this law on penalty of death, thou shall not create a machine in the image of the human mind, ie, no LLMs. Anyway, they manage to cut off AI research and it's just illegal. Their technology base is very strange. They have some advanced stuff and some very primitive stuff, but that's the Dune world. Imagine that we have the Dune world so we don't have to talk about machinic intelligence, but we continue developing genetic engineering somehow. Then what happens?
Yes, you might have a lot of people that all they care about is that their kid looks like Kirk Cousins or something, but that sliver of people who are already in the 1% of intelligence, that's a big enough group of people that they can conduct their own effort. Now what will happen is behind the scenes, instead of AIs developing new tech that reshapes the world, it's these genetically engineered super brains that are developing the quantum computers, whatever it is that you're allowed to do in this hypothetical universe that are actually secretly shaping society behind the scenes, but this bulk of people have no idea.
Just as today, this bulk of people actually really have no idea what are the forces shaping society, what are these solar panels, and does it matter that the cheapest way to generate electricity now is actually from solar panels made in China? Nobody in this bulk of the bell curve understands anything I just said or what the consequences of it are. It'll just be more of that on steroids in this hypothetical scenario.
[01:11:50] Dan: Is there a lot of venture money going into this? Because it's weird to me that you don't hear really anything about that these sci-fi potentials. Like I said earlier, you heard about the human genome project in the 2000s and then it's been dead. Honestly, even as someone who just reads the internet all the time, you have to go to your blog or Gwern's blog to actually read scientific papers to get this stuff. It's not really posted all over. There's a couple of PopSci articles, but it's not that popular.
Is there a lot of venture money going into it? So much could go right here and it seems like the probability is quite high.
[01:12:28] Steve: It takes a certain kind of reasoning to follow the path that you just outlined and that reasoning is most prevalent in the EA world, effective altruism world, rationalist world, where they're trying to follow the science, they have open minds, and they are interested in human flourishing and they like IQ points. In that community, there are people really interested in this and there are venture capitalists in that community who want to fund this stuff. Believe it or not, even among the pool of venture guys, there are normies. Like, "Oh let's build the next e-commerce." There are normies, and then there are guys who are like, "I like deep tech edgy investments."
Those guys who would invest in genomic creation typically and stuff like this. Or they just see the appeal of they know IVF is going to blow up because human fertility is going to become more and more of a challenge as women pursue high power careers and people get older and older before they have kids. They just see that as a business opportunity, but they're not fundamentally interested in this, some people call it like transhumanist future or something like this. It's a whole mix. Not all venture guys are really open to these ideas.
[01:13:51] Dan: It surprises me a little bit because it's not really in your face, but it seems like it could be really big.
[01:13:57] Steve: Yes. Oh, I think it's one of these things that will be really big. I would predict before too long this 10% babies produced through IVF, which is true only in a few countries now, will become more than norm in developed countries around the world. At that point, you're talking about 10% of all babies born, and then if there's, wow, there's one company that actually can do embryo gene typing in that space, wow, that's a pretty powerful company. It's got a lot of resources. Wow, they're actually assembling the data set for intelligence themselves on the download, and wow, they're the only ones now that can actually predict IQ, and by the way, they can demonstrate.
There's an asymmetry related to this ML and polygenic traits, which is that I don't have to reveal my predictor to you in order to prove to you that my predictor works. In other words, let's suppose I make the public claim, "Hey, you know what? General prediction, despite the attacks of all you woke morons, we've actually completed our cognitive ability predictor and it correlates 0.6 with actual IQ. It's good enough, we're going to start making superhumans walk off. Oh, by the way, the government of Singapore has invited us to build a 10,000-square-meter research institute and genotyping lab there. If you want to bomb Singapore, go for it, but otherwise, leave us alone.
Oh, and by the way, people from all East and South Asian countries now are flying to Singapore to have their kids to do their IVFs and that kind of thing," suppose that's the case. Now, you're some normie computational biologist at Harvard funded by NIH or something and you're like, "Steve Hsu, I don't fucking believe you. You guys are lying, I don't think your predictor works."
I would say the following, I'd say, "Okay, pull a few hundred people from people that you've genotyped and you just happen to know their SAT score. You don't tell me their SAT score but we're going to give it to a third party, Dan Schultz, he's going to hold the true SAT scores of these few hundred people or a thousand people, but you're going to give me their genomes." Now, I just run my algorithm, "Oh, it took me five seconds on my iPhone." I just run it. "Here's the flat file with the list of a thousand predicted cognitive scores." Then Dan will run.
He'll calculate the correlation between my predicted IQ score and the one that was actually in the biographical data of these thousand individuals, and Dan says, "Yes, it correlates 0.62." I would say, "Hey, Harvard guy, go fuck off. By the way, I'm not telling you what the structure of my predictor is, but it works. Give me another thousand, I'll show you again." We can get to a point where I can prove to the world I can do X without revealing how I do X. At that point, just to repeat myself, I'll just be like, "Go fuck off, guys. See you later. You don't like it, fine.
Hey, you in particular, when you want to do your IVF cycles, your money's no good here. We're not servicing you. Get the fuck out of here." [laughs] Will we ever see that day? I don't know. Maybe the AIs will take over before any of that shit happen.
[01:17:38] Dan: [chuckles] This raises a question, though because if this gets that good, is there a way to run a calculation? For me, I just got married. Is there a way for me to run a cal--
[01:17:47] Steve: Congratulations.
[01:17:48] Dan: Thank you. My wife, she actually asked if I would ask you about this, but she has one of the variants for CHEK2 which is one of the more rare breast cancer, puts you at elevated risk for breast cancer. The question was, is there a way to run a calculation on her genome and my genome to see what the risk would be and if we could benefit from IVF? Presumably, IVF is a lot to go under, but at some threshold, it could make sense if you had a really high chance of passing it on, so I pose the question to you.
[01:18:21] Steve: Let me make two observations here. Let's suppose in your family, there is a rare genetic mutation. Either you or your wife has this rare genetic mutation and let's suppose that even one copy of it is dangerous. You should then go through IVF because we can do the following. We do the following. We take saliva from mom, we take saliva from dad. The region where that CHEK2 variant is, we then know what her surrounding haplotype, the snips around that, what state she has there. What that chunk of DNA looks like for her and what that chunk of DNA looks like for you.
We can then look into the embryos and say, "These are the embryos that got the CHEK2 variant from your wife and these are the ones that didn't, and you might consider if you're worried about the effects of that, use these, not these." That's just what we call a Mendelian or single gene variant and we can pretty much 100% guarantee you your kid is not going to have it if you go through embryo genotyping. Does that help?
[01:19:42] Dan: It does. What would the chances be if we didn't do it, you just do it naturally?
[01:19:49] Steve: I don't know about CHEK2 whether it's recessed. Does your wife have two copies or one copy?
[01:19:54] Dan: I'd have to ask her. I don't know.
[01:19:55] Steve: There's basically simple math that any particular chunk of DNA that the embryo's going to get one chunk from mom and one chunk from dad, but she could get the chunk, if your wife only has one copy, she could get the chunk of DNA that doesn't have that variant from mom-
[01:20:17] Dan: Got it.
[01:20:17] Steve: 50/50 that she gets at least one copy of it from mom, but we can just like tell you which ones are which, and then you don't implant the ones that have the dangerous variant. These Mendelian issues can be solved. That's the easiest thing for us to do. I just want to make one, because you mentioned breast cancer, I think this is as a public service announcement. I always make this point because I think people are unaware of this. Many people are aware of the BRCA mutation. That's mutations on a particular gene or region that predispose women to breast cancer.
It's pretty rare, depending on your ethnic group only roughly kind one in a thousand or a few per thousand women have this genetic variant. If you do have it, if you have the worst version of it, you could have like a 60%, 70% chance of breast cancer by the time you're 40 or whatever. There's a broad awareness about these BRCA variants, even though in aggregate, the number of women or the number of families that they effect is really quite small. If you're doing polygenic analysis of breast cancer, what you find is that there is a contribution to breast cancer risk from these rare variants, which is only affecting a very small number of people.
Then the rest of the population, most of the variation in whether you're high risk for breast cancer or low risk for breast cancer comes from about a thousand individual loci, which are each only adding or subtracting from the risk a little bit. It's a typical polygenic story, but most people who are high risk for breast cancer are high risk for the polygenic reason, because so few people actually carry the rare mutations. Even though a rare mutation is very dramatic to talk about and that one little change has a huge impact, that's a, in a sense, negligible part of the population.
What's really going on is just like there are some tall people and there are some short people, there are some high breast cancer risk people and some low breast cancer risk people. If you define the set of people who have as much risk as BRCA carriers for breast cancer, there are 10 times as many people who just happen to have the polygenic combinations that make them that at risk than there are actual BRCA carriers. Now, we have the technology to compute. From your 23andMe genotype, I can compute your polygenic breast cancer score.
There are 10 times as many women walking around not knowing that they're high risk for breast cancer as the entire aggregate population of CHEK2, BRCA, blah, blah, blah carriers, but they don't know and they're not told that they're high risk for breast cancer because A, people are just dumb. The medical profession has not updated on the new information coming from this science in polygenic risk. The fact that we can compute this and the fact that the number of affected women defining some threshold is 10 times larger than the number of women walking around with BRCA or CHEK2, that just hasn't penetrated our medical system yet.
10 years from now, I'm sure people will be getting polygenic scores for colon cancer and breast cancer and all this other stuff, but currently, it hasn't penetrated yet. It hasn't, at the moment, penetrated at all into the general consciousness of doctors. Now, in our embryo practice, we often will see a family, they come in and they're like, "You know what? We have a lot of breast cancer in our family, but wow, we're BRCA-negative. We're not carriers of any rare breast cancer thing, but my aunt had breast cancer and my grandma had breast cancer and I'm really worried, can you make sure that my baby girl has lower risk for breast cancer?" Now, we get the genomes and we're looking.
We get mom's genome, dad's genome, and the embryos. We look and we say, "Yes, that's right, this family has mom's part of the family, oh, yes, and maybe dad's part of the family has high polygenic risk for breast cancer." They are not carriers of the rare variant. They just happen to be unlucky. Just like this family, oh, a lot of tall people in this family. Oh, in this family, a lot of people who are going to get breast cancer.
Then we look in the embryos, but the way that mom and dad's genomes recombined mean that even though the family in aggregate is way above average in their polygenic risk for breast cancer, we typically can find embryos that are at normal risk just because of the luck of the draw of the way that you're different. They didn't get the risk variance in this chunk, they got the low-risk chunk from dad and they got the low-risk chunk here. That embryo is fine, but without our technology, there's no way, they're never going to break this chain of inherited family history of breast cancer. Whereas we can break that chain. We just break it right there.
That's the most interesting stuff for me personally to look in to see like, "Oh, what's the distribution of breast cancer risks of these embryos?" and, "Oh, what were mom and dad like?" We can just see it. It's amazing new technology that didn't exist five years ago, but we can do stuff like that.
[01:26:07] Dan: Presumably you all have some of the best models for this stuff, have you considered a consumer website similar to 23andMe, where instead of screening for kids, you just let people off the street run that their polygenic scores?
[01:26:20] Steve: We want to do that. Every now and then, some other entrepreneurs come to us and say like, "Hey, we have the insight that you just had," which is that yes, somebody should be pushing this as an adult polygenic score product out in the same way that 23andMe or Ancestry do their thing. We've really just been heads down on the embryo stuff. Our company has not tried to do this, but we have the back end for calculating everything. The embryo report that we generate could easily be an adult report.
We could start this company. In fact, actually, at this moment, it just happens we just had another iteration of another set of entrepreneurs coming to us saying, "Hey, we want to pursue this adult polygenic score market," and so we're kind of talking to them, but so far, nobody has succeeded in doing this just because it's complicated and people are generally concerned about their health, but how much do they really care? Whereas you're very focused when you're going through IVF, it's like you decided, "I really want to have kids. We're having a fertility problem. I'm going to invest tens of thousands of dollars," and, "Oh, wait, I can genotype my embryos. Wait, let me figure out what that means."
It really focuses the mind and it's qualitatively different from like you go see your GP at the hospital, and he's like, "Yes, Dan, we checked your blood lipids you're a little high. Maybe, do you eat a lot of red meat?"
[01:27:56] Dan: Lots, yes.
[01:27:57] Steve: Yes, and then you have that conversation with your doc. Now, imagine that dude explain to that dude polygenic scores, pleiotropy, machine learning, blah, blah, blah, absolute risk versus relative risk, odds ratios. Just imagine explaining all that to that guy who has to see like one patient every six minutes or whatever the hell the HMO is doing to this guy. It is at least somewhat challenging, I think, to contemplate getting. It'll happen eventually, but for any particular entrepreneur to try to want to do it, I would say it is a difficult lift.
[01:28:36] Dan: I've actually always been just surprised by the general population, how many people just don't seem to think 23andMe is that interesting. I find it super insightful to learn your high-risk and stuff. A lot of people just don't care. It doesn't seem to be a big deal.
[01:28:49] Steve: Yes, exactly, people don't care. Also, "Wow, you're an engineering graduate from Michigan State with a background in tech. Exactly how representative are you of the general population?"
[01:29:03] Dan: You have some really interesting personal stories and you know a lot of, for at least a couple of degrees, some famous people that I think are interesting stories you post on your blog. The first is Jeff Bezos. You work with a lot of physicists who used to be students with him at Princeton, is that right?
[01:29:20] Steve: Jeff Bezos went to Princeton to study physics. This is actually in many, many interviews he's given, he discusses this. When he left high school, his goal was to become a theoretical physicist. He went to Princeton, which has one of the top undergraduate programs in physics. He did well as a physics major until he got to quantum mechanics. At that point, he himself says he had a lot of trouble with the more abstract formulation of quantum mechanics and he realized that out of his class of roughly 30 physics majors or whatever the number was, physics major, it would have probably been 20 to 30 or something, taking quantum mechanics with him, there were a few, a couple that just got the material naturally.
He felt they were just much better than him at it and he just felt he didn't have a good future as a physicist, and so he switched to computer science. That's all on the record. My best friend from high school went to Princeton. Started out as a physics major, switched to math. I'll dox him a little bit. He graduated number one in his class, Bezos' class from Princeton. I think class of ''86. My buddy was the valedictorian of the class of '86 and was pretty good friends with Bezos, they were both in the same eating club. I think they were both in cloister.
I heard a lot of stories from him, and then there's a whole another group of physics guys who, for some reason, it was fashionable or whatever, for whatever reason, a lot of Princeton undergrads would go to Berkeley for their PhD in physics, and I also went to graduate school at Berkeley. I'm class of '86, so my high school friend was a year ahead of me, but I graduated in three years from Caltech, so we ended up graduating college the same year. In my cohort at Berkeley and graduate school, including my roommates that I lived with, were guys who had taken quantum mechanics with Jeff Bezos and knew Jeff Bezos. That's how I heard all these stories about Jeff Bezos.
Anyway, so yes, I know the stories firsthand. Sometimes people will say things like Bezos wasn't smart enough to be a theoretical physicist, but he was smart enough to be one of the greatest entrepreneurs of all time and business creators of all time. To some extent, to the extent that general intelligence is a thing, I don't think that statement is completely inaccurate because probably all these other physics dudes from Princeton who were contemporaries of his were in some broad intellectual sense as sharp as Bezos, but of course, in life, they were nowhere near as successful as Bezos, so whatever.
Yes, there are levels to this thing. My wife hates it when I say this, but there are levels to this thing. If I said it the following way, "This guy was a great high school running back, but when he got to Michigan State, he could not make the starting lineup and eventually dropped out and became a rugby player or whatever." people would be like, "I get it. I understand that story, Steve. I'm not personally threatened by that story. Let's tell the story again," but if you tell the story about Bezos, there are a lot of people that don't like this story. As far as I can tell, the story as I have related to you is true.
[01:32:47] Dan: Basically, he was in college and he realized that he was just not up to chops with some of the other physicists, but then he goes to Amazon, and he's the smartest guy in the room in every conversation that he has.
[01:32:59] Steve: Right, so this part of it I can't personally verify. This is from other people's accounts. Other people will say, and I guess the part of it I can verify is the experiences I've had as a generalist in tech startups, which aren't really physics-focused startups. The statement is because there are levels to this thing, you can have a guy who is not an expert at a particular thing, but the engineers come in and start talking to him and say, "Hey, this is how we decided to solve this problem. We're going to do this, this is our process."
The other guy who could be Bezos or it could be me as CEO of one of my startups, I can be like, "Okay, I think you guys have done a good job, but did you think about this or did you think about this? The one part of your plan that's not going to work is if you issue on this, that's going to not work." All of these stories are that in many situations at Amazon, Bezos was the smartest guy in the room and was often very useful to have in technical discussions. People have said the same thing about Bill Gates. He started at Harvard as wanting to study math. I'll just say I've experienced it myself.
If you ask engineers at my companies, "If we get Steven a technical discussion, he might not be as up to speed on the details as us, but he's often very useful to have in a conversation and sometimes says stuff to us that we didn't think of." I think that's plausibly true.
[01:34:30] Dan: The one other famous person that you had some interaction with, Richard Feynman. I had the Feynman lectures when I was a freshman at Michigan State.
[01:34:39] Steve: Oh, wow.
[01:34:39] Dan: I was using those instead of whatever physics books they gave you. My dad passed them down to me. I guess my question here is, he's from an era with some of these legends, Bohr for me, Einstein. You can go watch the Oppenheimer movie and it's filled with little like Easter eggs of all these guys. Honestly, even people who don't know science know a lot of those names and my question is where is that today? Is it gone? Maybe it's still there.
Maybe I'm just ignorant to it, but it's like the famous scientist's gone because those ideas were easier to find and we got the low-hanging fruit, or is there something different going on where we're just producing less of that massive talent and it's not all working together in the same room? What do you think is happening?
[01:35:30] Steve: This is a great question and something even professional physicists talk to each other about. Let me make a couple of observations. Pre-Feynman, the earlier generation would have been Bohr. Oppenheimer was a little bit younger than Bohr, Heisenberg, Dirac, people like that, and then after that would be like Feynman, Schwinger, and some other people. At that time, if you just calculate how many people had the benefit of a university education like what you got at Michigan State, what is the number? It's literally a 10th the number that have that opportunity in the modern Western world. That's not even counting developing countries like China, India, stuff like that.
We're speaking about a much smaller pool of people from which you could draw top scientific talent. Another way of convincing yourself of that is go look at the raw scores of the army IQ tests of people who enter the US Army. They started using IQ tests for the US Army around World War I, so they have records like 1918, 1925. There's a steady increase in the raw scores that's called the Flynn effect. It's not because people were genetically getting smarter, it's that, "Let's just match it up to what was the average number of years of schooling that a World War I recruit in the US Army had? Oh, six." What was the average? Six.
Today, that would be child abuse. It's literally the kid starts grade one, leave school, grade six to start working in the farm or the factory or something. There was a big change in just the chances that you got a decent education between 1920s when these giants were around or getting educated and today. We're talking about a much bigger pool of people from which to draw talent. Now, similarly, in sports, if you're like, "How fast did Jesse Owens run in 1936 to win the Berlin Olympic Games? I think he ran a 10.3 in a hundred meters or something, and that was no spikes. He was probably running on the dirt or something, who knows.
Anyway, in those days, you could have like here's the distribution of the other runners, and then here's Jesse Owens. You could have a wild outlier, who Jesse Owens, by modern standards, yes, maybe he'd be a pretty good sprinter, maybe he'd be not quite world-class, maybe he wouldn't be world-class, but the point is he was an outlier in that pool of athletic talent. He looked like a giant, he looked like a superhuman compared to all these other guys because the other guys suck. Similarly, a guy who by modern standards would be like, "Oh, this guy is a pretty competent. He's world-class, but not considered a super genius theoretical physicist."
Back in those days, he might have been an outlier. He might have been a guy who looked very different from the bulk of the rest of the distribution. When you're sampling from a smaller pool, it's easier to find outliers that look very different than everybody else, but when you have enough people, the distribution starts to fill in. Then yes, person X is really smart, but person Y is just the smartest person X. He's right next to the guy in the distribution. There's another guy person Z, and actually, in this bin, there are 10 guys. Then after a while, you don't talk about person X is a genius outlier.
Person X is very smart, but I can identify a hundred other people who are similarly smart, and they're all working on quantum gravity, [laughs] so they don't stand out. There's two distinct issues here; who has the ability to make a contribution, pushing the science forward a bit, who stands out so much that they start to mythologize the guy and talk about how brilliant he was and how off-scale he was and there was nobody else like him. Those are two different things. In the modern era, there are so many smart guys, and our systems are actually not bad at picking them out, that there isn't that much-overlooked talent or a lot less overlooked talent.
Anyway, the distribution filled in, so just psychologically, it just feels different. Now the second thing you might say is, but how come we're not making these huge leaps of progress? There happen to be huge leaps in special relativity in quantum mechanics in the early part of the 20th century. That has much more to do with factors outside our control. It just happens that the difference in energy scales between the weak scale, which is very well understood in the standard model particle physics. There are about 10 to the 16th orders of magnitude between that scale and the quantum gravity scale, so we don't have any experimental equipment that can probe the Planck scale, the quantum gravity scale.
The theorist could talk all they want about what's going on here, and maybe string theory is the most beautiful correct theory ever develop, but we were not going to know it. Certainly not in my lifetime or your lifetime. The problem is actually there just happens to be a big gap in fundamental physics between what we've explored and what we need to explore to test the current set of theories that people are working on. In the old days, if you saw the movie, Oppenheimer, there's a character Lawrence. Lawrence was one of the great experimentalist of his day and he built the first cyclotrons at Berkeley.
You may ask like, "How much money did that cost? How long did that take? How hard was that to built those cyclotrons?" Because as soon as they built the cyclotrons, they could then test a bunch of interesting questions that the theorist were worried about. In those days, you could be like theory is tested within six months, and then theory is reformulated, and it was a very healthy good time for physic or a golden era for physics. Whereas for fundamental physics, particle physics today, we have this problem that nature is not kind. For some reason, nature put the quantum gravity scale way beyond the scale at which even the biggest accelerators on earth can explore. It could've been differently.
There's some other parallel universe where the Planck scale is very close to the scale of like a one-kilometer radius collider what that can do. In that universe, Steve Chu won the Noble Prize for some experimental prediction that was tested when he was still 27 years old, but what can you do? We don't control how nature is structured. It is what it is. Some generations get lucky and the experiment and the theory are close together. I could write the paper about, "Oh, height is going to be solved pretty soon as soon as we get to here." Then a few years later, "Yes, we solved height." I could do that only because I was lucky to be in it.
Why? I purposely went there because I knew this was going to happen. I was lucky to be in a place where the underlying technology to explore genomics was advancing very rapidly and the theory could actually play a role in understanding what the experimental results were, but that's different from like, are there still smart guys? Are there still geniuses? Terry Tao is probably in some ways like in terms of his mathematical power way beyond some of these guys. Certainly way beyond Oppenheimer, Bohr, and Heisenberg.
There's no question. Those guys are not even in the same league. Not even in the same level as like a Terry Tao or something like that. There's just no question about that. Anyway, I don't know if I helped explain it.
[01:43:43] Dan: No, sometimes I feel like the answer to that question is fluffy and people would just blame school is not as good anymore. We don't do it like we use to, but that actually makes perfect sense. It's a combo of the field that you're in. They happen to be in a field where you could test your theories on short notice, and then also, you're going to stand out if there's only a couple of thousand of you versus 10, 20, 30, more thousand.
[01:44:08] Steve: Yes, absolutely. The other factors that people quote like, oh, they use to have this aristocratic education, and it is true. Both Feynman and [crosstalk].
[01:44:18] Dan: Yes.
[01:44:19] Steve: They had tutors. They came from wealthy families. They hired people that were equivalent of PhD students to come and tutor their kids when they were still like 15 years old, and so of course, you would learn a hell of a lot more if you didn't have to go [unintelligible 01:44:34] high school, but instead, dad brought home some grad students who started tutoring you in physics and computer science. I think that might work better, but we don't do that anymore.
[01:44:49] Dan: Yes, there's that Bloom's 2 sigma problem. You get two standard deviation or something of improvement in education if you're single-tutored or whatever. I told [unintelligible 01:44:59] that shouldn't really matter because at some point, you're still going go out on your own and just do the physics.
[01:45:05] Steve: Exactly. Of course, as a modern who grew up with this stuff, we're a little bit biased because when we look back and we're like, "Wow, what pass for genius work in 1929 is pretty simple stuff." This is stuff undergrads can do now. Of course, I don't mean undergrads know it because it's already in the textbooks, but just in terms of the difficulty of the calculations that those guys had to do, it's not really that impressive by modern standards.
[01:45:37] Dan: Should more physicist be jumping fields? You guys do it a lot actually right now. I feel like physics majors are all over the place in other fields, but should more be hopping? Is it too tough going or what's your take on that?
[01:45:52] Steve: I think that it's all a question of personal taste. If I look back, I'm an old guy now, so If I look back and I say like, "Wow, what did I do with my life? How did that happen so fast? What did I actually do?" I take great satisfaction in those years of closing the gap between being a young kid learning physics and getting to the frontier to understand everything between the elementary stuff and the frontier. Pushing the frontier at least a little bit forward in my own way, but developing a full mastery of everything in between, and that does take a long time. I don't think anybody can really just do that in a few years.
Even when you go through the PhD program, you still don't really have a full-- Maybe one area you've close that gap, but across a broad spectrum of physics to really do that. The intellectual achievement takes decades to do. Now, and you ask me, "Steve, if you had just abandoned that a lot early, you could add another zero or two to your net worth." What's the trade-off? Different people are going to react differently to that. I don't know what to say. I feel okay. I feel like my satisfaction with having mastered these concepts in mathematics and physics and biology and computation is very valuable to me internally even if it didn't return more than like a professor salary during that time interval.
It's all a question of personal judgment. If you know ahead of time that fundamental physics is not probably going to make a lot of progress in the next few decades, but AI is going t do this, I would say like, "Hey, unless you have a very, very strong affection for fundamental physics, it will probably be more exciting for you working on AI." I can't make that decision for somebody else. Somebody else has to like have their own, based on their own preferences, make that decision.
[01:47:56] Dan: Some great words of wisdom to end on, Steve. Listen, you've been super generous with your time. I've had just a total blast talking with you. Thank you so much for coming on.
[01:48:06] Steve: Yes, this has been a great conversation. Good luck with your podcast.
[01:48:11] Dan: Thank you.
Steve Hsu