PLAY PODCASTS
Aging, Biological Clocks, Proteomics, Longevity & Healthspan | Austin Argentieri | #175

Aging, Biological Clocks, Proteomics, Longevity & Healthspan | Austin Argentieri | #175

Mind & Matter · Nick Jikomes and Austin Argentieri

August 28, 20241h 33m

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

About the guest: Austin Argentieri, PhD is a researcher in the Analytic & Translational Genetics Unit at the Massachusetts General Hospital with academic appointments at Harvard & the Broad Institute. His research focuses on large-scale analyses to to understand human aging.

Episode summary: Nick and Dr. Argentieri discuss: chronological vs. biological age; DNA methylation and aging clocks; proteomics and protein measurements in aging research; health, longevity, and human healthspan; and more.

Related episodes:

* Aging, mTOR, Sirtuins, Rapamycin, Metformin, the Truth of Resveratrol & Longevity Supplements, David Sinclair & Anti-Aging Myths | Matt Kaeberlein | #151

* Cellular Aging, Taurine, Nutrition, Senescence, Longevity, Mitochondria, Metabolism | Vijay Yadav | #122

*This content is never meant to serve as medical advice.

* Support M&M if you find value in this content.

* Full audio only version: [Apple Podcasts] [Spotify] [Elsewhere]

* Full video version: [YouTube] [Odysee]

* Episode transcript below.

Full AI-generated transcript below. Beware of typos & mistranslations!

Austin Argentieri 2:21

Thanks so much for having me.

Nick Jikomes 2:23

Can you just start off by telling everyone a little bit about who you are and what you do

Austin Argentieri 2:28

Sure? I'm a population health researcher. I work at Mass General Hospital, also with appointments at Harvard Med School and the Broad Institute at MIT and Harvard. My work broadly is looking in large human population scale data sets to try to understand aging, both from sort of the perspective of what's the biology of aging, but what are the different environmental drivers of aging as well, and also how that leads into many different age related diseases.

Nick Jikomes 2:55

And an important distinction that people often make in the aging field, and that we're going to talk about today in different ways, is chronological versus biological age. Can you give people just a basic sense for what those things mean and how people in this field have have thought about those concepts and measured those concepts recently? Of course,

Austin Argentieri 3:15

chronological age is easy. It's just your birthday. When were you were born? Let's count in years how old you are. It's a great predictor of many different age related diseases and major common diseases, but it's imperfect, because two different people can have the exact same chronological age. They can be 60 years old, but they can be very different from each other, and have very different risk for disease. So some time ago, now, this idea of biological age has come out, which is when was meant to be more of a precise way to estimate somebody's age on a biological level. So what's your level of biological function compared to someone of your given age? Is it faster, slower? Are there more comorbid, comorbid conditions happening? Are you in great health? There have been a number of different ways that people have tried to measure those over the years, but the basic concept is this is that age is a great predictor and a great starting place for understanding your risk for tons of different diseases, but it's imperfect, and there's a lot of information lost. And the thinking is, if we can go into the biology instead, that will really give us more nuance and more precision, even among two people of the same age. What state are you at now? What's your future health trajectory look like? Where have you been?

Nick Jikomes 4:31

And maybe, can you give us a sense for like so when these two numbers start to deviate, it's really easy to think about it if you're doing things that are incredibly unhealthy. So it's very natural for people to think, and I'm sure this is borne out in the data. You know, if you start smoking cigarettes at an early age, and you're smoking three packs a day for decades, your biological your chronological age are probably going to deviate you're probably going to be biologically older than than you otherwise would be from just knowing your. Calendar age or your chronological age. Can you give us a sense for like, how big some of those effects are and whether or not when these two numbers deviate? For example, is it always that doing unhealthy things makes you age faster, or can you also do things that move you in the other direction?

Austin Argentieri 5:22

Well, this brings up an interesting dilemma, because there's what's true and then there's what you see in epidemiological data, and those two things aren't always the same. I'll give you a classic example. For decades, the epidemiological data tend to show this kind of J shape in the association between intake of alcohol and mortality risk. So starting from no intake whatsoever, used intake a little bit, your mortality risk starts to go down, and then you intake more and more and more, and then it starts to

Nick Jikomes 5:50

go up, I see. So this would be the classic, very widespread notion, at least in my experience, that if you drink a lot, that's bad, but if you have one or two drinks a night, that might actually be better than nothing, right? And

Austin Argentieri 6:03

it turns out, from what we know now, from more sort of advanced statistical modeling, we've been able to do and much better ascertainment of that phenotype in the context of these epidemiological cohorts. Turns out that that's not true. It's really easy to see it in terms of statistical association tests, but it turns out, if you're actually able to model it the right way, and there are a lot of really tricky ways why you tend to see that, for example, if you compare it against any never drinkers, but some of those never drinkers are former drinkers. Most people stop drinking because they have some sort of health condition. Their doctors tell them to stop, right? So there's all kinds of these reasons. Turns out, if you're able to actually model it correctly. The safest amount of alcohol to drink is zero. So if we sort of sidestep the problem that there's going to be a lot in the epidemiological literature that might tell you what's healthy and what will put you in one direction or another versus what's true. It should be, in theory, the case that you know disease and age related diseases and aging more broadly, you can think of it kind of as like an accumulation of damage on a biological level, accumulation of deficits. And this is, these are. This is why it leads to disease. So it should be the things that we know are bad for health. Smoking is a great example. Alcohol intake is a great example, lack of physical activity. We they are associated with so many different diseases long term. And the reason why I think they go to so many is they probably have some impact more centrally on these kind of aging mechanisms that touch all of these different diseases simultaneously. The problem is right now, where we are in the state of, I would say, convincing large scale human data on what moves these aging clocks is not well known. I think there's a lot of research in mice and animals and worms, and there's some small scale human studies. We're trying to do some work now going deeper into the different things in your environment, behavior, lifestyles, even interventions that are changing these on the sort of large human population scale, but I think beyond a couple really basic things, like I said, smoking, alcohol, physical activity, the extent to which this or that is surefire going to improve your biological aging or lead to some kind of detriment biological Aging, I wouldn't say we have the perfect catalog yet. I know this

Nick Jikomes 8:24

isn't going to be the focus today, but, but you mentioned this interesting pattern that's been observed in the epidemiological literature with respect to alcohol. That turns out not to be true. And I think this is important, an important general thing, because, you know, in the field of epidemiology, broadly speaking, when we're looking at associations that have do with human health, these are intrinsically interesting areas of study, because people are all interested in human health. To some extent, they tend to get a lot of attention in the media. And yet, as I think you probably know, a lot of those, a lot of those studies could be very, very misleading. And you just gave us the alcohol example. When you see results in the epidemiology literature, how do you start to think about, or how can people think about weighting the evidence that we see in some of those epidemiology studies? So for example, when you mentioned that alcohol result, this is sort of how I do it, I guess is one and one thing I like to do is, I like to ask myself, what's the result that people want to be true. So the case of alcohol, right? Everyone wants it would be very convenient if having a few drinks a week was healthy for you. So we have to sort of discount that, because we're all biased in that direction, most of us, at least, you know, I would love if it was true, if you had a couple drinks, you know, every day, that that was the healthiest thing you could possibly do. And then two, there's this question, I think, of biological plausibility, or first principles thinking that people often don't do when they look at these associations. And based on how I understand alcohol metabolism, you know, there's it's very difficult for me to think about what, what is biologically healthy, about ingesting ethanol or its metabolites, and how that could possibly help you live longer. How do you think about evaluating those types of association studies?

Austin Argentieri 10:18

So we could think about it in terms of a couple pillars that at least I think about, and I think our group, more broadly, tends to think about on terms of what is a convincing amount of evidence that we think it's plausible or interesting or far fetched, or we're just not sure. I will say that the focus of a lot of the research that I do and that my colleagues do in our group, it's focused on the power of large scale human data. So there are other people that use other kinds of animal models and things like that that I mentioned. But for me, the first pillar is, have we established in this in a large human population, and have we replicated it in independent and different populations from the one in which you started. For me, that's number one. You need to be able to demonstrate that what you see is generalizable and consistent across different ways you slice up the population. So we want a large human population. We want one that is sufficiently generalizable to you and me and everybody else, if we were to have a large population of half a million people, but every single person in that cohort has Alzheimer's disease doesn't really tell you a lot about how something might work and the dynamics of it within the broader population. So number one, is it a large enough population that is representative and something can be generalizable, and can we reproduce that effect again and again from this population to another one to another one? The second pillar we can think about is, how did you measure what you're trying to assess? The classic example everyone loves to pick on is, is it a self report questionnaire, or are you getting your assessment from some kind of biological data? Yeah, this general, but not, sorry.

Nick Jikomes 11:55

Pet peeves, the self reported data. Well, I

Austin Argentieri 11:59

mean, and speaking of alcohol and diet. This is one of the most difficult things to assess in terms of self report. And it's interesting, some kinds of self report measures are more reliable than others. Alcohol is a notoriously unreliable one, whereas smoking status, people seem to lie about that less for whatever reason. But this is a classic example where, if you were to take a questionnaire item versus looking at the biology, you can get much different stories, and one gives you a more reliable output. So let's stick with the same alcohol example. Like I said, if you look at questionnaire data, you'll see the kind of incorrect Association, until you can start to realize there's some confounding within the way you've constructed that measure, and we can start to tease that out. So nowadays, people will exclude all of the previous drinkers, and they might exclude some of the never drinkers, and just go among drinkers smallest to the lowest. So there's lots of just a preponderance of studies showing protective effect of some small amount of alcohol. But the biggest studies that are coming out in Lancet these huge global meta analyzes, these huge global efforts, the safest amount is zero. But we can look at a different set of evidence, which, instead of self report information, might use genetics. So there's this concept of Mendelian randomization, where you take a genetic variant that you use as a proxy variable for your exposure of interest. In this case, alcohol, there are known genetic variants that will associate or correlate with alcohol intake. So we can use the genetic variant which has the benefit and the advantage of being randomly assigned at birth. So we're removing some of the confounding in a sort of patterned way that that exposure could be meted out across the population. We can use that instead as a proxy and measure what's the association of that with the outcome of interest. And when you use the genetic proxies, the genetic variance in a Mendelian randomization study, it's much clearer. The safest amount of intake is zero, and then as you go up in exposure, it just increases your risk of cardiovascular disease and other things. So that's number two. How are we measuring what we want to measure as the exposure of interest? There are, you know, I would say, self report. You've got biological proxies. You've also got randomized trials and interventions where you at least are able to control a little bit more of the confounding and the bias and the way that things are patterned among people. So that's, that's really number two. The third really comes down to, I would say, how can we, how can we evaluate these across populations? So, you know, the and this kind of ties back into to the first, to the first example as well. In my in my opinion, it's not just something that we in one study should be able to show has had some effect, even if we validate it in external populations a couple times. It's is the field as a consensus coming across this is everybody with different study designs. There are different ways of computing it in models pointing at. At some kind of similar direction of effect, magnitude effect, or is it that only this paper, this group of researchers, and in this one study with this one particular model type, found something, and then turns out, no one else can replicate it. We've tried it in other cohorts. It didn't work. We might have tried it in the same cohort with other model types. It didn't work. So I think some things are more flimsy. And the other thing that we've done, especially in the genetics community, to try to increase our confidence in what we find, is by going large scale. So the easiest way to think about this, again, in the context of genetics research, is it used to be you wanted to know the association between genetics and some outcome, let's say cardiovascular disease. Well, then you might hypothesize this one gene for some biological region might be associated. You'll do a statistical test. We'll use a p value of point oh, five, and it might just skirt through. And then we'll say, Okay, that looks great. I think we found it, publish it in a journal. That's not really the way we do it in genetics anymore. Now we'll do genome wide association studies, where we'll measure all of the genetic variants we can, or at least usually all of the common genetic variants we can. And so now you have just tons and tons of tests. And so the threshold for what becomes statistically significant becomes in terms of a p value, much smaller you can think about it, the threshold becomes much, much more difficult to prove something is associated with whatever you're trying to look at, above and beyond everything else. And what that's done is just completely changed the number of false positives that we've had in genetics research, for example, there used to be a huge preponderance of false positives, and now that's been dramatically slashed by the scale at which we do this kind of research. You know, we're trying to find the one signal that's bigger than all of the other signals that exist. Instead of we're just looking at one, and what does it skirt by in terms of some borderline statistical significance? So again, I'm a little biased in this direction, like I said at the outset, our approach for everything is large scale human data and across diverse global populations, because we think that's what gives you what's common in biology, across groups in the world, and not just what's noise, right?

Nick Jikomes 17:11

And that's, that's, I'm not gonna say that's everything that we care about, but most of the things that people care about are things we have in common, like, if people care about longevity. Pretty much everyone cares about that. So there should be some common core factors that are true across the board, across all human populations. It should,

Austin Argentieri 17:28

and it's not just for aging. I mean, think about it for anything. Think about trying to develop drugs, or develop pharmaceuticals, right? If you want to develop even for covering your own bases economically, if you're a pharma company, you want to develop something that works for everybody and not just in a couple of people, then you bring it to market, and you find out, Oh, shoot, you know, it only works in this one specific group. It's not working for anybody else. So I think no matter what we're aiming for, it's what's common and what's common across people and not, you know, what's just noise in one particular setting.

Nick Jikomes 17:59

So aging, you know, again, inherently interesting topic. Lots, you know, everyone's limited by their own mortality. So we, you know, we all want to live long, healthy lives, at least so health span, I guess, as opposed to just raw lifespan, what's becoming really popular are these aging clocks. There's lots of commercial examples of this. Now you can, you can buy these, you know, different aging clocks, and they tell you your chronological age compared to your biological age. So basically, are you younger than your calendar age would suggest, so to speak, or are you older? Do you have maybe some work to do to get healthier. Can you give us a basic sense for how most of these clocks work today? What sort of biological markers are they based on? Do they work well? And how well do they work?

Austin Argentieri 18:54

The idea of a clock has evolved over the years. It started using DNA methylation information. I mean, these came out about 10 years ago or so. The first one was Steve horvath's clock came out around 2013 but the idea is, basically, can you take some kind of biological information and can we use it to predict somebody's age? The first insight was that DNA methylation, as a little biological information, tends to associate or correlate very closely, at least some CPGs, with Calendar age, and we can develop a very predictive model of age. So what usually happens is people will take some kind of biological information, again, the first case, it was DNA methylation, put it in a model, and then have the output you're trying to predict be chronological age. You can then make a nice plot of what's your methylation predicted age versus what's your chronological age. You can calculate the correlation between those two that will give you your, so to speak, accuracy of these kinds of clocks. And then the idea being that there. Is this kind of like you mentioned, this divergence between what your biologically predicted age might be and your actual calendar age is, and then that will be informative for, you know, telling you about future risk of all kinds of different age related outcomes and phenotypes. The problem is, we've had to go through many different iterations of biological clocks over the years, because they haven't always been as predictive of the actual aging phenotypes we care about as we thought they might be. So in thinking about methylation clocks, and then I'll get into other omics, because I think probably other omics are more important for this ultimately. But methylation clocks, the history was such that you would develop these models, which were usually linear models, things like lasso models, or things like that. And they would be really great at predicting chronological age and accuracy of you know, in the high 90s, the problem is, sorry, see, okay.

Nick Jikomes 21:03

The problem is, sorry about that, no worries. I was in the same boat the last few days.

Austin Argentieri 21:13

I'll start from the beginning the sentence.

Nick Jikomes 21:15

No worries, yeah. So, so, so most of the clocks so far that are out there that have been used that people will see commercial examples of, most of those today are based on DNA methylation.

Austin Argentieri 21:26

Yeah, that's right. And the problem is those initial DNA methylation clocks, they're great at telling you about your chronological age, predicting what your age is, but they're kind of lousy at telling you about your future risk, or your or telling you about your risk for morbidity, clinical information or mortality? Can

Nick Jikomes 21:46

you actually, before we go further, why is it? Is there a good reason that DNA methylation tracks with age or predicts age? Do we tend to does our DNA tend to become hypermethylated over time or hypomethylated over time.

Austin Argentieri 22:01

There are different theories. Someone who's written a lot about this, in a very intriguing way, is David Sinclair at Harvard. He has his entire information theory of aging. He calls it, which is largely boils down to the fact that aging entails basically a loss of epigenetic information over time. So DNA methylation, which are these chemical marks on top of DNA, aren't static across life, and they're certainly not static or consistent across cells in your body. And so what they are is very dynamic. And he has some compelling research showing that at least in some of the studies that he's worked on, some of these epigenetic regulatory mechanisms sort of will have dual or shared roles, where they might be involved both in epigenetic regulation, but then also DNA damage repair and somatic maintenance and things like that. And as time when they have to move back and forth between these roles, they get lost and you tend to lose information. So the theory is that epigenetic regulators over time will change in some kind of stable way in terms of, you know, thinking about them through time, and that tracks really well with age. So in some cases, for epic, for methylation, the analogy of a clock is apt because it just seems to kind of tick along with time, and so that's why it's so good at capturing chronological age information. The problem is that there are so many steps that have to go from DNA methylation to the end result, which would actually influence a phenotype we care about, which is basically changing protein levels, that those methylation marks don't always correlate strongly with the and the ultimately the protein, either for the gene that that methylation mark is in, or something more downstream or distal, that there tends to be this disconnect. And this is why our this is at least our theory, why these methylation crocs are great at basically counting the ticking of time, but might not necessarily be so so well associated with the actual aging phenotypes that we care about, whether that's frailty or age related disease or ultimately, ultimately mortality risk. And this is why there have been several sort of generations of DNA methylation clocks. They left these first generation as they're called now clocks pretty early, and moved on to so called second generation clocks, things so the first generation clocks would be like horvaths clock or hanums clock. Second generation clocks would be things like Pheno age or grim age, as they're called. And these are no longer models that are built on taking DNA methylation to predict age anymore. These are models that use DNA methylation to predict some type of phenotypic, aging phenotype, which would probably be constructed from either different types of blood biochemistry markers, albumin, creatinine. Inflammation markers. In the case of grimmage, they also use smoking so things that are already telling you about some kind of morbid morbidity or function, or sort of physical functioning. And then they're using DNA methylation to capture that instead, those are much more predictive of mortality morbidity than the first generation clocks. And then you have what are called third generation clocks. These are things. There's one called Dunedin pace, which is built on, which is built on, sort of assessing over time. So this is kind of like a longitudinal measure of DNA methylation, and what that tells you about change over time in DNA methylation. But there's always this question mark in my mind, at least alongside a lot of these methylation clocks is, are they capturing the real biology that is what's driving the aging phenotypes, or are they just capturing the passage of time? And there was interesting paper that came out, I think, this January, from Vadim Gladys chev's lab at Harvard, where they built a new kind of methylation clock. But instead of just using any CPG site in the meth alone, they only picked among CPG sites that they demonstrated has some causal relationship with aging phenotypes through using Mendelian randomization, which was that genetic type of genetic association testing I was talking about before. And unsurprisingly, what they find is they build a new clock based on these causal CPGs, and they find that all of the main, most of the major clocks that have been established to date aren't really enriched for any of these causal CPGs. And this kind of lines up with what we know more generally about the biology and kind of, what are the correlations between different stages of biology, from genes all the way to proteins. I mean, the central dogma should be, you have genes and and then you go into a transcription. You make mRNA, then you make proteins. The reality is that across many genes, but then especially in development and aging genes, there's a pretty strong discordance between gene expression and protein expression. And the real reason why DNA methylation would matter for any type of physical functioning or disease phenotypes is that it should ultimately change gene expression. You know, methylation, famously is, is the mark that will turn a gene on or off, right? So it should be having some effect on changing gene expression. Well, turns out the gene expression signal for many genes throughout the genome isn't always very well correlated with the amount of that protein that ends up that ends up in your body. So is

Nick Jikomes 27:35

that just because? So you know, you can turn a gene on and then that gene can be transcribed into a protein. There could be a one to one sort of correspondence between how much transcription you get then how much protein gets made. But there's other cases where, say, due to say, post transcriptional modifications, things that happen after the gene is turned on that you get, you know, a deviation between how much gene product is produced and how much protein ultimately gets

Austin Argentieri 28:00

made. Yeah, the thinking is that a lot of what's going on here are what are called post transcriptional or post translational modifications. So you have a gene, you have a transcript that's made, you make an mRNA, that should then ultimately lead into translation into a protein. But those we know that at different stages, from Gene transcript up to the protein, there are a lot of different things that can happen and even after the protein is made, so there are post transcriptional modifications that can change, and then that will maybe change how much of the protein is expressed. But then even after the protein's already made, there are a lot of different modifications that can happen to the protein afterwards, that will change. You know, how much of that protein you see in the body, what we found in some of our research, and this can maybe lead into some of where, I think the more some of the most promising avenues are now in terms of proteins, and proteomics, is that some of the some of the most age associated proteins that we tend to see in some of our work, and certainly other papers that have come papers that have come out previously as well, is that proteins that tend to stick around in the body for a long time, so things like extracellular matrix proteins, for example, that have really long half lives, these are very age associated, but because they stick around so long, they're basically around for a long time to be subject to all kinds of degradation and damage and modifications. And so every time that happens, it's just pulling away the amount of that protein further and further out of correlation with the initial gene expression signal. And so

Nick Jikomes 29:31

that makes intuitive sense to me. Like, if a protein is very long lived, especially if it's outside of a cell, it's in the extracellular space, it's sort of exposed to the environment of that person's body. And so it will, sort of, I would think it would naturally, then just sort of capture, capture what that person's exposing themselves to, more than other proteins.

Austin Argentieri 29:53

Yeah, exactly, and it's so it's kind of a, what we're learning is a, kind of a tenuous exercise to start by measuring something very optional. Dream like methylation or gene expression, and just expecting that that's automatically going to capture the real phenotypic variants that you care about for disease or frailty or anything that we care about in terms of age related outcomes or phenotypes. And so this is why you've seen a boom in the last few years, at least, of people moving into proteomics to try to measure aging instead. I mean proteins inherently are what we care about. They're often what are the causal drivers of disease. They are also usually the targets for therapeutic approaches and interventions to alleviate disease. So there are a lot of reasons why proteins are kind of the layer of interest, so to speak. And so now that the technology and the data are sort of becoming available to look at that directly, people are starting to look directly at that

Nick Jikomes 30:48

instead. Yeah, and that kind of brings us to some of your recent work. So basically, the idea here is, okay, we want to study aging. We're interested in this idea of aging clocks. Well, instead of looking at mRNA, let's look at something that's closer to the aging phenotypes we actually care about. Let's look at the actual effectors of aging, basically, and that would be, in many cases, proteins themselves. Yeah,

Austin Argentieri 31:13

exactly. And so, you know, with all of these advancements, the rate limiting step is always what data are available. So when the first epigenetic clock was made a little over 10 years ago, Steve orvath spent, I think, years, if I'm getting the story right, just going out and curating and collecting all these different methylation data he could pull together and try to make a model that was big enough to build his clock. Well, these days, now we have these big human biobanks that are collecting all kinds of biological data, and now they're starting to invest in proteomics, although it's very expensive. So now we have the opportunity, within the context of some very well phenotyped and characterized biobanks and cohorts to start to look directly at the proteins. And proteomics is not at the stage yet where you have, for example, methylation or transcriptomics data, where you know, theoretically, you'll have a much broader coverage of the genome. I mean, the the the latest DNA methylation arrays, I think, look at close to a million or so so CPG sites across many, many genes. The best proteomics assays we have right now are in the neighborhood between five and 11,000 proteins, but we know there are, of course, far more in the actual human proteome.

Nick Jikomes 32:25

How big is the human proteome? Do

Austin Argentieri 32:28

you I couldn't tell you a number. I think it's,

Nick Jikomes 32:31

is it like hundreds of millions of proteins? Probably

Austin Argentieri 32:35

millions. I don't know, but I thought, but I think, I mean, I think there would be more proteins than there are genes, because you all have different kinds of isoforms and different right and different kinds of proteins. So you I would at least think, if you want to think about how many proteins are in their body, you know, think about

Nick Jikomes 32:51

20,000 genes. There's got to be quite a bit more than that. There's

Austin Argentieri 32:54

going to be many proteins. So, you know, we have an okay, we have okay coverage of those now. But you know, different proteins are at our different sizes. They're in the bodies that far different abundances from one another. So it becomes very tricky from a technological standpoint, all of them, which have, you know, with have much different abundances and different levels and to have different sensitivity across different ranges. So it's an ongoing feel, and it's, evolving rapidly, but there have, in the last, you know, four or five years been there has been an explosion in terms of proteomics data that have come into the world, available to researchers. The biggest explosion by far was last year the UK Biobank made available to the world and any researcher who wants to use them proteomics data for about 50,000 people. And why this is so important is because this is in a biobank where it's not just 50,000 people with no information, it's 50,000 people where you already have whole genome sequencing array genotyping, where you have questionnaire data, full blood biochemistry panels and everybody, and you have linked mortality and electronic health records for everybody. So if there's a disease with an ICD code, you can study it in relationship to all the biological data that's become available. And so this was the sort of foreground for some of the research that we've been doing where, again, our whole approach is try to leverage as as rich, large scale human biological and sort of phenotypic data as we can to develop tools. And so we had this idea anyways. Well, what if we made a proteomic clock and we built it from this huge cohort, but because we have so much other information, we could then, at the same time, test it really systematically across tons of different diseases. And this has been a hypothesis around for a long time. It's usually called the geroscience hypothesis, which is that there are sort of known biological hallmarks of aging. I think now we're up to 12 that are sort of considered canonical, things like telomere length and loss of proteostasis epigenetic changes. Is one as well.

Nick Jikomes 34:58

What does loss of proteostasis mean?

Austin Argentieri 35:01

Protein stability, basically, I see, but now they consider things like gut dysbiosis, another hallmark of aging, DNA damage. So there's a, there's sort of a, there was an initial Hallmark paper in cell that had an initial set of these, and then they just updated it last year to now these sort of 12 pillars, or biological hallmarks of aging.

Nick Jikomes 35:22

So these are things that are measurable, that reproducibly track with aging very well.

Austin Argentieri 35:26

That's right, that's right. And we see them conserved across species, and we see them again and again. And the theory has been that these sort of hallmarks of aging, or aging as a biological phenomenon, is probably common to all of the major age related diseases, and people will demonstrate that in, you know, this study from this one hallmark to this disease, and another study this one hallmark to another disease. And so there's been kind of a patchwork of evidence supporting that. So far. Our thought was, well, if we have a large enough data set and we have electronic health record, we could probably just test that all in one go among the same people. And so that was one of the that was one of the sort of motivating ideas for this work. Now is, well, if we can make a good enough biological signal of aging, and this geroscience hypothesis is correct, then it should be that this biological signature, signature of aging, is going to be associated with all the major age related diseases we can measure. All we need to do is pull them out of any HR or electronic health record and we can test it. Can you

Nick Jikomes 36:27

tell us a little bit more about this data set you were working with? Where does it come from? What's it composed of, and how'd you get your hands on it?

Austin Argentieri 36:35

So the UK Biobank is a very, very well known biobank, or kind of epidemiological cohort study, at least, at least very well known in this sort of large scale epidemiological community. It cover it tracks half a million people in the UK who are recruited between 2006 and 2010 and they did a whole battery of initial testing on these people when they first came into the recruitment centers. So they took blood samples and they stored them so they could do future assessments later, as technologies developed to do different kinds of things, and now they

Nick Jikomes 37:11

have and who were these people? Did they volunteer for this? Were they selected? For some reason, they're

Austin Argentieri 37:17

volunteers. The UK Biobank tried their hardest to make the population within this cohort study as representative of the UK population as possible. They didn't succeed 100% but nobody really does.

Nick Jikomes 37:30

Yeah, so, but it's probably quite representative. As far as these things go, it's

Austin Argentieri 37:35

reasonably representative. In general, they're more healthy than the UK population, and there are definite differences that can and have been observed and written sort of extensively about. So it's always a caveat when you work with these biobanks and datasets that it's not perfectly representative the UK population, but it's a decent approximation, and it's far better, in terms of because of the scale at which it was gathered than a much smaller data set, where you're much more likely to have some kind of bias,

Nick Jikomes 38:06

but so you've got a data set which is a large number of people had blood drawn, had various biomarkers measured, and you sort of have that information across time for a very large number of people.

Austin Argentieri 38:17

Well, at this stage, they have a lot of the rich information comes from baseline. So between 2006 to 2010 when people were enrolled into the study, they came to an assessment center. They had blood drawn. They did a whole suite of anthropometric measurements, BMI, impedance measures for body fat, height, weight, all these kinds of things. They had a clinical interview with the trained clinical interviewer. They did questionnaires. They did blood biochemistry. So they did a lot. They did spirometry for lung function and things like this. So they did a lot of testing. At the beginning in a subset, they followed them up in terms of specific measurements. So at baseline, and then also at some follow up, they did MRI measurements and imaging studies on several 10s of 1000s of people, at least. I don't remember the n right now, also in several 10s of 1000s of people. They did accelerometer data. So they gave people Fitbits for a couple of weeks, and they pulled in all those data as well. But the real sort of longitudinal data over time has been that they continue to pull in information from the linked electronic health records and mortality registers in the UK,

Nick Jikomes 39:24

I say. So we've got health records over time, plus a baseline set of measurements,

Austin Argentieri 39:29

right exactly, exactly. And so, you know, they pull these in at different times, because the sort of register is a little bit different in Scotland, versus Wales, versus versus England. But in general, what they've done is they've pulled in across time, and so now they have about 10 to 15 years of follow up information of from that time at baseline. You know, what are all of the diseases and diagnoses that you've accumulated? We have the exact date of each one. They also go back retrospectively, so we know in the time proceeding when they were. Recruited, what diagnoses did they have as well? And that's where the really rich longitudinal information is. It's thin on longitudinal biological information. In terms of the whole cohort, there's no real repeat assessment of anything for genetics, of course, that doesn't matter, because genes and DNA are stable across life. But for the other, for the other sort of biological layers of information that are much more dynamic, there's hope to be able to have repeat assessments at some point in time. But at least right now, they have genomics in terms of array genotyping and whole genome sequencing in the full cohort. They have metabolomics in, I believe, the full cohort now. And now they have this, these proteomics data in about 50,000 participants, which is far and away the largest proteomics data set that has existed today. And so it's been these data is UK Biobank is a for lack of a better term, a public resource. That doesn't mean that it's free, but it means that anybody and any researcher can apply for access. There are different fees for access based on different tiers of data that you would like to get your hands on. So the most basic access tier is relatively inexpensive, but if you want the most comprehensive data tier, which has proteomics, data, whole genome sequencing, all these things, it'll cost you close to $10,000 so there's a pretty broad range. So I wanted, I don't want to say it's public in the sense that anybody could just download it now. And you know, you have to be a vetted researcher, and you have to sign a Data Use Agreement. They need to, they need to make sure that they're secure in terms of who is accessing the data and for what purpose, but as far as data sets go, it's relatively open and easy in terms of the process to apply and to try to get your hands on some of this for your own research. So tell

Nick Jikomes 41:50

us a little bit more about the proteomics data. What kind of proteins are in this data set. Are there any significant patterns, like types of proteins that are present or absent? So

Austin Argentieri 42:01

the proteomic assessment that was done so far in the UK, biobank was on a platform called olink Explore 3072 it is a is about 3000 proteins, which is and that sort of larger panel is made up of sort of four major subgroups of panels that olink has previously developed. And these are proteins that are largely related to oncology, neurodegeneration, inflammation or sort of cardiometabolic phenotypes. So, you know, it's definitely kind of a hedging your bets type of approach where they've developed these panels over time with proteins that are that, where there's either known information that these are biologically relevant for some of the major phenotypes we care about, or that are high probability targets for future discovery that we haven't looked at yet. So 3000 proteins is okay. It's certainly not by any stretch of the imagination. The largest panel that exists even now and even for even olink Since then, has a new flagship panel that covers 5000 proteins instead of 3000 proteins. So it's continued to develop even since these data came out about a year ago. And I should say the data came out a year ago to the public, but they came out a year before that, in a sort of pre competitive data release to all the pharma partners who financed this project. So this was kind of an interesting this was an interesting phenomenon, even in terms of sort of global community science, and that a consortium of about 12 pharma companies together in what was called the UK Biobank pharma proteomics project came together to finance the sequencing of this proteomic panel and 50,000 participants, most of them were selected, generally from the population, but then they got the cherry picks to some to be enriched for different diseases they cared about for their own sort of drug development pipelines. And in exchange, they got sort of early access to the proteomic data for a year before anybody else did so by the time, you know, by the time the general research public were made, were given access to these data, the sort of big flagship papers describing these protein this, these proteomics data sets, you know, we're already under review, but there was still lots of opportunity to to explore the things. But, you know, I think there's lots of opportunity to go more again, to give you the range of how many proteins can we ask? Can we assay right now in terms of these population data sets, there's the flagship olink platform, which is about 5000 proteins. That's called olink Ht. Somascan is another company that doesn't use antibodies to detect proteins like olink. They use aptamers instead. They have their new flagship panel that will measure about 11,000 proteins. There's lots of really interesting proteogenomics work that's been happening in the last year or so, trying to compare the differences between these platforms, and given they have different technologies for detecting proteins, does that mean you're measuring different things? There's been a lot of interesting work coming out there, and then the. Sort of the other area is something like mass spectrometry. I don't I don't see that generally as much in these large cohorts. I think there's a lot more difficulty in terms of what you find there. I mean mass spectrometry, basically you you put your samples through these then you'll get lots of peaks, and then you have to annotate those peaks in the signal you find to you know which protein is which. So some of them you can annotate, and some of them you can't. So a lot of mass spectrometry research right now is trying to figure out, what are we seeing so, but you potentially have all the signals there. So that's, you know, the pros and cons of that approach, these antibody aptamer based panels, the approach, or the sort of trade off, is instead, while you're getting just these few 1000 or 11,000 proteins, but we know exactly what each of them are

Nick Jikomes 45:53

and so, so this proteomics data set, you've got several 1000 proteins in it. What exactly is the measurement you have to work with here. So there's proteins vary in their expression levels between individuals, across cell types and tissues in the body. Do you just have sort of the average expression level of each of these proteins? Or what are the the actual measurements here?

Austin Argentieri 46:14

Yeah, these are all. These are all from plasma. And so we have plasma, plasma expression of each protein, the way that they're given in the UK Biobank is it's a relative normalization. So the the proteomics data are actually given to you already normalized in a particular way that that olink, as a company, sort of will normalize the data within the batches of when they generate the data. And there was actually a very extensive QC process that went into looking at these proteomics data by this sort of pharma consortium at the at the outset. So the data that are now available to the public are sort of these normalized post QC proteomics data.

Nick Jikomes 46:57

Okay, yeah, basically, blood samples are taken, and then you got measurements of proteins in everyone's blood plasma. What are some of the basic headline results here, in terms of how this protein data set tracked with biological age?

Austin Argentieri 47:13

The headlines are a few. The first is that, you know, using blood proteins, we can make a proteomic clock just as accurately as a methylation clock, but that