The dual revolution of AI and animal-free science - Thomas Hartung, Johns Hopkins University
In this episode I speak with Professor Thomas Hartung.
Prof Hartung is Doerenkamp-Zbinden Chair of Evidence-based Toxicology at Johns Hopkins Bloomberg School of Public Health, and Professor of Pharmacology and Toxicology at the University of Konstanz. He also holds positions as Director of the Center for Alternatives to Animal Testing (CAAT) and Field Chief Editor of the academic journal Frontiers in AI, and was the 2025 recipient of the prestigious Peter Singer Prize.
We have a fascinating discussion about the breathtaking developments in AI and animal-free toxicology, and what this means for areas like drug development, chemical regulation, public health, animal testing, research ethics, the exposome, and the future of our society.
Our conversation covers:
New Approach Methodologies (NAMs): the new automobile?
Technological advances in microphysiological systems (MPS) and artificial intelligence (AI)
The problems with animal models
Challenges with validation and regulatory acceptance of NAMs
The importance of education for uptake of NAMs
Recent developments on US animal testing policy
What is happening in the EU? Differences between regions and regulatory systems
How AI is transforming research, and using it in day-to-day work
Implications of AI for quality of scientific publications and risk of bias
The Human Exposome Project: what is it, and how can AI help deliver it?
Environmental persistence, the exposome, and the public discourse on chemicals
NAMazing: Déjà Vu at the lab bench - Why animal-free science is the new automobile - ScienceDirect
NIH stops funding new projects which focus only on animal testing | Cruelty Free International
Top 10 Emerging Technologies of 2024 | World Economic Forum
Is regulatory science ready for artificial intelligence? | npj Digital Medicine
How AI can deliver the Human Exposome Project | Nature Medicine
Stockholm Declaration on Chemistry for the Future
Prefer to read? Here is a transcript:
Chris: Hello, everyone, and welcome to the Chemical Journeys Podcast. Today I'm speaking with Professor Thomas Hartung. Thomas is a professor of evidence-based toxicology at Johns Hopkins Bloomberg School of Public Health and of Pharmacology and Toxicology at the University of Konstanz. He's also the Director of the Centre for Alternatives to Animal Testing and the Field Chief Editor of the journal Frontiers in Artificial Intelligence. Thomas, thanks for joining me.
Thomas Hartung: Thanks for having me.
Chris: I'm really excited to speak with you today, Thomas. It's a little bit coincidental that I came across some of your work — I saw one of your papers on how AI can deliver the Human Exposome Project, published in Nature Medicine, and I have to say, it was a fascinating read, really visionary. I also get a strong sense of optimism from the work that you're doing, and excitement about everything that's happening. I myself am not so involved in the NAMs space — new approach methodologies — but given the work that you're doing and the perspectives you bring, I'm very happy to talk to you today. So perhaps we could start by having you provide us a bit of information about your background and your research interests in your own words.
Thomas Hartung: OK. You know, there are two types of researcher. Some choose a topic and dive deeper and deeper and become popes in their field. I'm the opposite — I'm broad. I'm doing lots of different things in many areas. There are many affiliations, because I like to connect people and bring different areas together, and my work spans mechanisms of medicine, drug development, and toxicity. My topic found me. I developed some alternative methods, including the first alternative to the rabbit pyrogen test, which, 30 years later, are now saving 150,000 rabbits per year in Europe alone. This made me take a job with the European Commission to head their validation body, the European Centre for the Validation of Alternative Methods. And this is where I had to broaden my spectrum, going into biologicals and vaccines, ecotoxicology, and computational methodologies, because I was suddenly responsible for all of these. But ultimately I started to understand that the animal tests we are using are terribly outdated, and that we have, at the same time, more and more new methodologies — transformative technologies — which promise to do better.
Chris: Thank you for that — it's a great introduction. You've recently described animal-free science as going to become what the automobile was to the horse in the early 20th century. Could you tell us what you mean by that?
Thomas Hartung: I mean, we see that there is a transition between technologies — an overdue transition. The animal models in toxicology, where I do most of my work, were introduced before I was born or when I was in kindergarten, and I'm turning 62 now. So it's really important to understand that it is also a normal thing to get something new. Our knowledge in the sciences is doubling every three years, and this alone means there's a lot in the box since we developed these animal tests. Very similarly, we had a technology shift away from horses 110 years ago, roughly when the car came. And what I really found interesting, digging a bit deeper, was how much of the argument was just the same: how reliable these horses are, how much experience we have with them, and how dangerous these new technologies are. What sounds laughable now — when you hear how people were arguing that we would never replace the horse — is something I see playing out in what we are doing, and people will no doubt laugh about us and about our hesitance in really doing something modern and timely.
Chris: I think, as humans, we are generally quite resistant to change in many aspects, right?
Thomas Hartung: Yeah, especially in the safety sciences, and that's a good reason. I don't argue for going with the latest fashionable thing — we need to establish safety, and we have a big responsibility in doing so. But at the same time, we need to adapt to scientific progress. And what we have been doing in toxicology in particular is create a quilt: every scandal gave us a safety patch, one patch on top of another, all of different making, sizes, and material. And altogether this gives a very rigid system where you occasionally replace one patch with an alternative method, but you don't really change the overall construct. This is the type of discussion I'm trying to have: how much can new, transformative technologies allow us to make an actual change?
Chris: Yes. And maybe you could just explain a bit about what these technologies are that we're talking about.
Thomas Hartung: In essence, there are quite a few. You could take technologies like omics and sensors as things which generate much more data. I mean, toxicology has been a data-poor field — most of the things you could do with a handheld calculator, counting animals or the number of tumours in an organ. But it is becoming a big data science now. This leads me to the first of the really transformative technologies, which is the artificial intelligence movement. Toxicology is no different to all of the other areas of our life. I believe that soon we will have to write “science” with “AI” because science is changing because of it. And the other side is that bioengineering has allowed us to move cell culture to something much more meaningful. So instead of using some tumour cell lines and keeping them in Petri dishes and measuring whether we can kill them, we are coming more and more to systems which recreate the architecture and functionality of an organ. These are necessarily co-cultures. Take a human lung — there are more than 42 cell types, and its function is not for the cells to survive, but to breathe. And you can make similar arguments for each and every organ. And it is mainly the advent of stem cell technologies which gives us access to high-quality human cells.
Thomas Hartung: You get skin, perhaps, gets blood quite well, but otherwise it is very difficult to get, in higher quantities and reproducibly, any human tissues. I'm working on the human brain — if I want to pick your brain and mean it, you will not be very forthcoming, I assume. And this has changed really dramatically. First with human embryonic stem cells, and then — ethically much cleaner, seven years later — with induced pluripotent stem cells (iPSCs), which also have the dramatic advantage that you know who the donor is. With in vitro fertilised embryos, you don't know what they would have become or what disease history they would have developed. The organoids and similar systems we can generate now from a patient donor reflect a disease history — whether this is a person prone to allergies, whether they developed Alzheimer's or whatever type of disease. And very often this is reflected in the cell models we can generate from their cells.
Chris: It does sound absolutely amazing, what we're doing now with this technology — it's really getting into the realms of science fiction.
Thomas Hartung: Oh, absolutely.
Chris: Maybe you could compare that to the animal models. What are the problems with animal models?
Thomas Hartung: You know, when I took the job to promote and validate alternatives to animal testing, my understanding was that this was motivated by ethics and by the fact that a broad part of society wants us to move out of animal testing — about 60% of Europeans and about 50% of Americans object to medical testing on animals. That's quite a number, and policy makers listen to it. But the longer I'm in the field, the more I understand that there are also economic and scientific reasons to change. I once said very simply: “We are not 70-kilogram rats,” and there's a lot of truth in this. Modern drug development is now largely developing biologicals — no longer small chemicals, for which you could still argue whether animals are representative. Animals are certainly not representative for human proteins or antibodies against human proteins. And most animal testing is useless. 57% of new drugs are such biologicals, and so pharma needs it. And the regulators — at the FDA and EMA — need them too, to really judge these systems, because even non-human primates don't do a proper job in most cases. The other side is that animal tests are costly and slow. This does not fit a modern development cycle for products.
Thomas Hartung: If you take an example: cancer is certainly the thing most people would like to know about when it comes to the safety of chemicals. We hardly test any substance for cancer, because the only test we have accepted is an animal test — the cancer bioassay. And the cancer bioassay takes four to five years to get a result, because you have to treat 400 rats for two years of their life, every day. Then it takes about two years to analyse all of the organs and samples, cut and slice everything, prepare it, and write the report. Four to five years, easily. And what do you get in return for an exercise that costs you almost a million? You get more than 50% positives, because these animals, at the end of their two-year lifetime, are full of cancer anyway. So there are a lot of false positives — many more than actual expert estimates would suggest. And it is very difficult for companies to accept this. But even regulators don't accept the results — analysis has shown that 58% of positive cancer bioassays did not lead to a classification by the US EPA, because they said it wasn't relevant. And we really do sometimes have absurd situations where we have to handle false positive results from these cancer bioassays. Another thing people often overlook: in order to treat 400 rats for two years of their lifetime, you need about 10 kilograms of the substance. Tell a chemist to produce 10 kilograms of a chemical that is not yet in production — it is essentially impossible. So there is a lot of value in getting some data that doesn't even need to outperform the cancer bioassay, but just to flag an alert: we should pause and look into this. You need to get it fast, on a small quantity. That's exactly the value proposition of these in vitro systems.
Chris: Yes. It sounds like there's huge scope for refining the way we've been doing this for many decades. Another question I have about these new methods: you alluded to some of the challenges to their adoption. Maybe we can talk about some of those in more detail. I think you mentioned validation as one aspect — presumably this is about demonstrating that the tests are robust and relevant.
Thomas Hartung: Exactly. I had this job for seven years in Europe to validate these methods, because you cannot have every company coming with a new test and have the regulators needing to argue for or against each one, and both sides need to find out: is it trustworthy, or not? The idea is simply that there should be an institution which follows certain guidance in order to assess and then come to a recommendation — endorse this assay or not. Then both sides can talk much more easily to each other, and can expect that the next regulator in the next country will have the same information basis, because that's now a very international exercise, standardised to the OECD. Some 50 to 60 tests have been internationally validated and made their way into the OECD test guideline programme or the pharmacopoeias of the world. So there's a substantial number. But we also have to say there are still a lot of animal tests for which we don't have an officially recognised alternative.
Thomas Hartung: The two — or actually three — things you really want, according to the OECD guidance: the first is to show it's reproducible, that you get the same result in whatever lab is doing it. The second is to show that it gives relevant results. And this has, for many years, been interpreted as: it gives the same results as the animal test did. There's a big inherent problem with this — if the animal test is not perfect, how do you reproduce something imperfect? How do you get out of the shortcomings of something which is not reflecting humans, for example? The third element — and this is something which is a principle but is not really executed very much — is that it should be scientifically sound. It should reflect our scientific understanding of a certain health effect or disease.
Thomas Hartung: And this is something I think is actually the big potential, because I call this mechanistic validation. I don't care whether we can reproduce a finding in an animal. If you take thalidomide: thalidomide doesn't show teratogenic effects in rats and mice. For this reason, I don't want to reproduce that thalidomide is negative. I want to show it positive, and I want it to be positive for the right reason — because it triggers a relevant mechanism for humans, not a completely different mechanism leading to completely different toxicity for a given chemical.
Chris: Maybe on that point — because I was thinking about this — you have all these different assays, in vitro assays and micro-physiological systems that are, in a sense, a constellation of tools designed to inform a particular question about an adverse effect of a chemical. Whereas before you may have had a single animal test. A big question that also comes up in the area I work in is: how do you know that what you're seeing in the test is going to be relevant either to an individual or at a population level? And with these assays — I think you've alluded to this — we have some mechanistic understanding that we use to develop a hypothesis that we then seek to observe from the tool. But how much of the validation is coming from the bottom up? In that we need to generate data with these tools in order to validate our mechanisms, our understandings.
Thomas Hartung: If you give it a closer look, we are very poorly using mechanism in our validation studies. I have often called for the need for mechanistic validation. And there's a very interesting movement with adverse outcome pathways to increasingly agree on mechanisms. But we are only starting to translate this into our validation procedures. But I think that's where a lot of progress will lie. The other thing is, as toxicologists, we like things in black and white — we want one test, and this one test classifies the chemical. And there is a tremendous flaw to this. First of all, no single test can deliver all the different answers. A test can only be either sensitive or specific. If I make it very sensitive, I pick up all of the toxicants, but then I also have quite a few false positives — like in the cancer bioassay. Or I make it highly specific and can trust the result, but then I miss a lot. So it is often much better to have a screening test and then a confirmatory test to follow, and this already requires a combination of assays, at which point black and white becomes a lot of grey.
Thomas Hartung: But our classification needs have always meant we need a definitive test — the gold standard. So I really believe that by moving away from that, and also accepting that we are dealing with probabilities — I like to call it probabilistic risk assessment — we do a favour to science and to what we can really deliver. Any test contributes to identifying the degree of grey, and helps us to reduce the uncertainty around it. But none is a game changer. And we can have much less fear about new methods when they're only modifying an assessment, not making something which was black suddenly white, or the other way around — which is always the big concern.
Thomas Hartung: And interestingly, current technologies, especially artificial intelligence, are doing exactly this, because it is a highly probabilistic approach.
Chris: Yes. I see a lot of similarities here, actually, in the area I'm working in, which is mainly focused on environmental fate and biodegradation assessment. It's the same problem, right? The tests are very expensive. You take one shot and then have to make regulatory decisions from that. And it's written into the regulation that you should employ a weight of evidence — you should try to work with all the evidence available to come to a single conclusion. But the reality in terms of how things play out often falls into the black and white: looking towards a worst-case result. On that point as well, you mentioned reproducibility — that's another big challenge with biodegradation testing. And I wonder, because you mentioned donor cells for a lot of these assays, how important is the genetic component for the reproducibility of results?
Thomas Hartung: Incredibly important, because we are just not the same. And by running our experiments on strains of mice or rats which are genetically very limited — whether inbred or outbred — we reduce what normally makes us different. And we see this is why we can get results with groups of 10, 20, 40 animals, while for any clinical trial for the same substances, you need thousands of people, because we are so different. All these attempts towards personalised medicine — and some people are also talking about personalised toxicology — require us to recognise what makes us different. And this is not only genetics: it's also disease history, age, gender, obesity, and other factors which we have to recognise. Highly standardised lab animals, three months old, reflect none of this.
Thomas Hartung: Also, you can note that in 40 years of the human genome project, we have learned that only 5% of diseases are purely genetic, and about 40% have a genetic component we could identify. But there is a lot that comes really from exposure, and exposure is even more varied and more dynamic than genetics. So I think there's a lot to be understood here. And this is why we are arguing, for example, for the Human Exposome Project.
Chris: Yes. So is the approach to use fresh donor stem cells each time you do these assays, or is there a cloning process going on?
Thomas Hartung: At the moment, it is already becoming something of a standard to ask for several donors in most experiments, and that's becoming more and more the norm for stem cell-derived tests. There's no guidance yet on how many you actually need in order to be comprehensive. But the National Center for Advancing Translational Sciences has already started to do clinical trials on a chip — a programme where they really want different patients reflected in stem cell models, to see whether their stem cells would have predicted the individual's outcome in the clinical study. So we are getting to something which is more personalised, and is helping us to see distribution over patient populations or healthy populations.
Chris: Yes. It's really fascinating, and it highlights how we need to rethink our methodologies and our processes now that we have to generate larger volumes of data and can approach things in a different way. I think you mentioned education as another important component. Do you want to say any words on that?
Thomas Hartung: Education is critical — of the general population. It is quite amazing that, 500 years after Paracelsus, people have still not understood that the dose makes the poison, and are permanently arguing about the mere presence of substances. But also the education of the next generation, and of today's practitioners. We see that a lot of the resistance to accepting new methodologies comes from people who have never been trained on them, especially in something which is moving as fast as artificial intelligence is now. This creates an enormous hurdle. I think everybody understands there's something significant here and that we will be left behind if we don't embrace it, but knowing how to learn it, how to decide what is hype and snake oil versus what is genuinely delivering value — that makes it very difficult. Some people are validating and assessing these things and building the trust for others who don't have the means and the time to evaluate all of these tools.
Chris: Yes. And trust is a big component there, isn't it? And I suppose that's a general problem in society — trust in institutions and science in general. Are you coming up against those kinds of challenges?
Thomas Hartung: I mean, we are first of all dealing with the trust of practitioners in our field, and these are scientists. Scientists in general trust in science, so we don't have the same problem as with the general public, where we see that social media are eroding what we can get across. But I have also often seen that if you approach people with risk communication tailored to their needs, you can achieve quite a bit of understanding and interest, especially if you start engaging them in some type of dialogue. All of these things are now augmented with AI tools. These chatbots essentially feel like talking to a human expert, but they can scale the interactions of experts with the general public and make them much more intense and tailored to specific needs. There are truly revolutionary changes in how we can produce educational materials and provide information sourced broadly — to get you much better information than you could possibly get in the past, packaged to your needs, in a way that you can ultimately grasp it.
Chris: Yes. I did want to just touch on a pretty big announcement recently from the National Institutes of Health in relation to these methods.
Thomas Hartung: Yeah, we see at the moment in the US a dramatic change in the landscape, which came first from two regulatory agencies. The EPA already in 2019 announced it wanted to move out of animal testing as a standard requirement by 2035 — a very surprising and strong statement. And this pledge has been renewed in April this year by the new head of the EPA, Lee Zeldin. So the EPA is still on track. And the FDA — arguably the most important regulatory agency in the world — announced a roadmap saying they want to reduce animal testing and will start with monoclonal antibodies, with everything else to follow, with strong timelines. I say they're the most important regulatory agency for a very simple reason: most of the changes and developments take place in pharma, and the pharma market in the US is second to none. We are only 4% of the world population, but we are 53% of the drug market, and 67% of the market for new drugs under patent. So every pharma company in the world is just watching what the FDA wants. Many actually don't even register their drugs in Europe and other parts of the world, because they simply fear the question: why can you sell it so much cheaper there than in the US? So the FDA's announcement was spectacularly important because it's a signal to their own people. This is a 19,000-person agency — they all need to understand what the new direction is.
Thomas Hartung: But it is also a signal to the regulated industry, as well as to biotech companies, that there is a market — something to gain — which will accelerate progress. Only some 14 days later, the NIH announced they want to refocus their funding on human-relevant systems, stressing these complex in vitro systems and micro-physiological systems. They are naming these specifically, and they even came out with a statement saying they will no longer publish any calls that only request animal studies. They want, as a regular thing, to also call for alternative new approach methods. And I think this is really an important signal for the scientific community.
Thomas Hartung: We have to work together. What I sometimes call “in vitrio” — it's in vivo, in vitro, and in silico combined. And these old-fashioned approaches of “I do my animal experiment and I don't care about human relevance” will no longer fly with the NIH. Surely they leave it, as usual, to the reviewers to implement this, and if the reviewers select pure animal studies, I don't think they will stop funding them. But the request, first of all in the context of what we are publishing, will be stressing very clearly: we would like to see human relevance embedded into any type of study.
Chris: Do you see this as a major lever that will bring to light more of the disparity between the animal models and these technologies? Or do you think that is already clear?
Thomas Hartung: I think it is absolutely obvious to me that this is going to happen. We see that in toxicology, different animal species that we sometimes use — reproductive toxicity on rabbits and rats, or cancer bioassays on mice and rats — they predict each other by about 60%. And there's no reason to assume they predict humans any better; on the contrary, I would expect this is going to fuel the discussion of transitioning to human-relevant systems for human research. I think this is helping us to finally have the discussion which was very much avoided by the animal experiment community. They have resisted the formal validation of assays — we don't have a lot of assays that have really undergone ring trials to show even their reproducibility, let alone their relevance. We have really learned the hard way that very big drug programmes have likely failed because of misleading animal results. The lack of relevance of Alzheimer's animal models, for example, means we have spent more than $150 billion in drug development for Alzheimer's without really remarkable progress.
Chris: Wow. That really brings it to light when you think about what we've lost as a society by not taking action on some of these things.
Thomas Hartung: Yeah. But we should also be fair. Cell culture has been around in a reasonably reliable and infrastructurally supported quality since the eighties and nineties — not much earlier. When I started cell culture in a lab in the eighties, a lot of reagents were still made at home. Computers are only most recently powerful enough to really make large predictions, and it is really the advent of AI. But for example, to design a drug, AI has been powerful enough since about 2019 — that came with the power of computers, not more ingenious programmes. 2019 is the starting point for most of the projects which designed drugs by AI. We call them AI-first drugs. By 2023, already 18 AI-first drugs had entered human trials — a remarkably fast preclinical development which also shows the value proposition of these tools. In 2024, the first of these 18 already completed a successful Phase 2a in patients and went into a Phase 2b study — the one by Insilico Medicine for lung fibrosis. This is unheard of. They started in 2019. In just five years, into clinical Phase 2. One day earlier to the market is worth, for a pharmaceutical company, about $1 million.
Chris: Wow. So you can really see the potential upside of embracing these technologies. You've already mentioned how AI accelerates the flow of communication, so we have the communication flows that should enable these things to be adopted rapidly. We don't want human biases, cognitive fallacies, or presuppositions to be another bottleneck. And I guess the other one is on the regulatory side — you've already mentioned the big movements in the US. What's happening on the EU side?
Thomas Hartung: The EU has been much longer committed to change. It's quite remarkable that already in 1986, European laboratory animal welfare legislation said: whenever there is an alternative reasonably available, you must use it. This is very strong. And they also committed the Commission and member states to further these developments. This became even stronger with the 2010 revision of the legislation. So traditionally Europe has been at the forefront, but it has been very much driven by animal welfare considerations — public opinion — and progress has been slow. The discussion about a roadmap to phase out animal tests — something similar to what the US just announced — has been ongoing for six years now. And there's a plan to announce a draft of this roadmap at the beginning of 2026.
Thomas Hartung: And you can see it is moving, and it has a broad basis, but sometimes it is also ridiculously slow. And I really commend the US here for just saying: let's do it. That's an engineering exercise. We want technologies that do the trick, and we have been convinced that the new ones can do it, and we see the shortcomings of what we have been doing in the past, so let's give it a shot. And they're not saying they're blindly buying it — they're saying: we want to enable such a transition.
Chris: Yeah. And they can see there are advantages in many aspects — health improvements, market advantages as well.
Thomas Hartung: Absolutely. This is an industry. I was very surprised to see data from MarketsandMarkets, which is a market research company. You can argue about their numbers and approach, but they estimated the in vivo toxicology market at about $4 billion, while they estimated the in vitro market right now at $14 billion. So companies are already doing a lot of this. It's just not as prominent, because it's used for internal decision-making — they don't use it regularly for registrations because they just follow a tick-box approach, and they would be foolish to offer more data than they're asked for. But a lot of the decisions internally are based on other types of systems. And this is why, compared to the seventies, animal use numbers in Europe and the US have gone down by two thirds — much less than in the seventies, when pharma was the dominant user, and they have dramatically reduced animal testing through other search strategies.
Chris: Yes. When I've heard discussions about this on the regulatory side in the EU, it's always couched in those animal methods. In the REACH regulation, for example, we have a standard set of information requirements and you need to tick all the boxes. That has framed everything. And what I hear is that it's very difficult to change that aspect of the regulation. But can you see that moving in the near future?
Thomas Hartung: Yeah. And my personal career has been very closely linked to this. I joined the European Commission in 2002, and at the time they were handling the first draft of the REACH legislation. At the time it didn't even mention alternative methods. And I pride myself that, when it was finally decided in 2006, the promotion of alternative methods was one of the three goals of REACH. Most people in the regulatory area seem not to read the full Article One — they stop after the first two goals. But I was also responsible for leading 200 experts to write the guidance for industry on how to implement REACH. And we took a lot of effort to create some openings for alternative methods which were not yet available, but which at some point could make their way in.
Thomas Hartung: But I was also a big critic of this enormous use of animals. And just two years ago we did an actual count of how many animals have already been used for REACH. You have to see that the Commissioner responsible at the time, Günter Verheugen, at some point stated that REACH would not be acceptable if it used more than a million animals. We have now counted 4.2 million already used, with a lot of question marks still remaining that could add several millions more. So REACH is an enormous consumer of animals.
Thomas Hartung: And again, I would say: if we had mechanisms to more quickly learn from the animal experiments we are doing, we could probably have prioritised better, come up with methodologies and guidance that would have dramatically reduced these numbers. REACH is very much a tick-box approach: at this tonnage level, you do this and this and this — that's it.
Chris: Yes. And I see that also as the regulation is implemented and you see new guidance evolving. Recently Annex XI of REACH was updated around the rules for adaptations, and you see written into it a highly precautionary approach such that you need to meet a very high bar to do anything other than an animal test. Or another example is read-across — using data from one substance to support another substance — there's a very high bar for justifying that, and there have been some publications recently suggesting that a lot of read-across arguments are being rejected.
Thomas Hartung: Yeah. There are a couple of problems accumulating here. The first is the staffing and the amount of expertise within the agencies. If you compare: the US EPA, at least before the current round of post cuts, had 14,000 people. Compare that with perhaps 700 or 800 now at ECHA. That's a tremendous difference. And there's no intramural research and not really the possibility to do much more than just follow the dossiers and take a formulaic approach. The second thing: policy and legislation in the US and Europe are very different. In Europe, we have legislation which is extremely prescriptive — it says at a granular level: do this animal experiment. And it was, at least as ECHA perceived it, interpreted in the past as being very difficult for them to change these requirements when there was any type of scientific progress. I read the legislation as something which invites exactly that, because that's what policy makers wanted — they put in a lot of requests for animal testing to be the last resort.
Thomas Hartung: But for example, when the extended one-generation study for reproductive toxicity testing was validated, it was extremely difficult for ECHA to accept it as a replacement for the two-generation study, even though it brings down the number of animals from about 3,000 to around 1,400 per chemical. This is a very, very big thing. But it took years of argument to broadly accept this. And one of the main reasons was: the extended one-generation study is not mentioned in the legislative text. “How can we then accept it?” And I think here the comfort zone of the regulators — who are doing this as the first in the world, because nobody has such legislation — is understandable. And, for public health, I think this is very important legislation. But we also have to learn on the road and say: it's not perfect, it can't be perfect.
Thomas Hartung: It was done by policy makers, on the basis of the science of 20 years ago. So let's occasionally revise it and simply give the agency — not the policy makers — the opportunity to adapt to scientific progress.
Chris: And does this work better in the US?
Thomas Hartung: Yes. The policy makers say: bring information on reproductive toxicity — but they don't specify which test. They leave it every time to the agency to advise on what is the current state of the art.
Chris: Yeah. And it's really striking, that 20-fold difference in staff between the EPA and ECHA. It really jumps out at me. Other guests I've had on the podcast have talked about how REACH is really a very big machine, designed to run almost autonomously — designed to be maintained and managed as it processes chemicals, with perhaps not that much reactive intervention.
Thomas Hartung: I can tell you that the main author of the REACH legislation was a mathematician by training. And in some aspects you smell it. I did not know of any trained toxicologist in the European Commission when I was working there. There had been some occasionally at the European Chemicals Bureau, but most of the people who came in were not toxicologists — they were biologists and others. And it also took a while to attract more seasoned toxicologists and practitioners to ECHA in Helsinki. I think it takes a long time until an agency matures and has the self-confidence to move forward. But I see very positive developments recently. Both in ECHA, much, much more even in EFSA. There was just one of the keynote speakers at EMA’s 30th anniversary, and I have to say, I'm really impressed by how all of these agencies are now handling new approach methodologies more favourably and progressively, and have set up mechanisms which, while not reflecting the dramatically fast pace of what is taking place in the US right now, are going in the right direction.
Chris: Yes. That's really positive. It's great to hear your perspective as somebody who's very close to all these developments happening in various regions around the world. We've covered many different areas where AI is really revolutionising things. I think something that would be really interesting for listeners — scientists who are hearing lots about AI but perhaps haven't fully embraced it themselves — would be to hear how you are utilising AI in your day-to-day work, and what you recommend as ways for people to get over that initial barrier.
Thomas Hartung: I strongly believe that AI is changing all of science at the moment. I do a lot of talks entitled “AI will only replace toxicologists who don't use it,” and similar things. Because of my involvement with Frontiers in Artificial Intelligence as Field Chief Editor, where we have published 1,400 articles in the last seven years, I see a lot of things happening. I became part of the World Economic Forum's committee for the Top 10 Technologies, and for them I wrote, with some co-authors last year, the white paper on how large language models are changing science. It's really interesting to change your perspective on what this means. And I think we are at the moment in a process where the majority of colleagues are just understanding how transformative this is and that they have to use these tools. So it's a phase of integrating AI into science. But I actually believe that we will soon enter a phase where we do science for AI — not AI for science. That sounds crazy, but the integration and availability of knowledge that AI is enabling is such a transformative step.
Thomas Hartung: If you see that in the life sciences, 3.4 million papers are published every year — nobody can read this. First of all, because a large part is still behind paywalls, but it is estimated that about 50% of scientific publications are now open access, though only 20% are true gold open access — in journals that are fully open access — while the others are in repositories that are difficult for AI to access and find. So essentially we are working with 20% of scientific knowledge when training large language models. But in the future, where we do science for AI, we need to understand that everything we publish that is not open access and not machine-readable will not be part of the common knowledge. AI has already, for two years now, been outperforming humans in annotating scientific papers, and since then has also solved the problem of being able to read figures, diagrams, and tables, which was a difficulty two years ago. So while my PhD students have read a paper a week — or hopefully two or three — AI can read millions of papers in a day. And it will never forget.
Thomas Hartung: And it can also see patterns, or just retrieve information. That's such a value proposition. Almost two years ago I was doing a peer review for the EPA, where they had a chemical for which two people had worked one and a half years to find all of the information and do a bit of read-across to similar chemicals. Two people, one and a half years — that's three to four hundred thousand dollars in investment for the EPA, I assume. I found all of the information in one hour using AI, and even found an entire risk assessment on this substance that had been missed by my colleagues. Imagine what this means as a value proposition for any type of risk assessment. And AI can find patterns, it can integrate this information. Already in 2018, we were able to outperform the reproducibility of the nine most used toxicological tests — which is on average 81% — predicting, for 190,000 chemicals, the classifications they had received, at 87% accuracy. And that was 2018. You have to imagine that AI, since deep learning was introduced in 2010, has been doubling its capacity every three months. So this year's AI is eight times more powerful than last year's. We are tens of thousands of times more powerful than in 2018 when we published that work.
Thomas Hartung: And this is what we are observing at the moment. I can tell you, we are working on a model at the moment — it's not peer-reviewed yet, but it works. We have created a database of 260 million data points where we have a chemical property and a result. For 4,000 of these properties, we have more than 1,000 entries, so we consider these “big properties” which we can also predict. We predict these 4,000 properties of any given chemical with about 90% accuracy. So imagine: even for a substance which has never seen the light of day — a chemical the chemist is only considering synthesising — you press a button and get 4,000 properties. Whatever you could possibly be interested in. And then we are at the moment working on integrating them, mapping them onto adverse outcome pathways and others, so that we make sense of it all. Because nobody can go through 4,000 properties. But we can. And this is getting us into extremely good prediction values. We are evaluating this in the ONTOX project of the EU at the moment, for liver, kidney, and the developing brain. But in principle it can tell you whatever you are interested in. I did a test run with mycotoxins recently, and carcinogenicity data all in the nineties in terms of predicting the test validation sets. We do safer plastics evaluation and endocrine disruptor assessment — it is really remarkable what you can do. And we hope to make our database available probably in the next two weeks. The paper is accepted but not yet out. We are sharing through GitHub the different import functions from public databases to create such a database. We are expecting to publish our transformer model, which predicts these 4,000 properties, at the beginning of next year, likely, when we have done enough evaluation of the quality of the predictions.
Chris: Wow. That is amazing. And when that paper comes out, I'll be sure to put it in the show notes if that coincides with the podcast. And I have also seen this for myself, because recently I've been looking at the mobility of chemicals in the environment. The persistence and mobility topic is a really hot one, and for chemicals that are charged or ionisable, their adsorption in soil is extremely complex — there are so many different potential interactions that can determine whether a chemical will be mobile in soil or not. There's some really interesting research coming out from the University of York, where they've used machine learning methods to get greater predictive accuracy than anyone has managed with traditional linear regression models. So yes, it really is astounding. On the use of AI in your day-to-day work-
Thomas Hartung: I didn't really answer your question. It's true.
Chris: No, no, it's fine — it is such a big topic. But when ChatGPT came out, there was all this discussion about how it was unethical: people shouldn't use AI to help them be more productive and things like that. That was an interesting take, but I get the sense that those calls have sort of died down a little bit now. But using these tools, as you say, is a muscle that you have to cultivate — like a public speaking muscle, like a paper-writing muscle.
Thomas Hartung: Yeah, absolutely. I believe if we don't ride the wave, it'll wash us away. So it's really important that I'm giving lectures at Hopkins to students about how to use these AIs, but also how to use them responsibly. Many people looked at ChatGPT when OpenAI first announced it — Model 3.5 — and it was only, I think, 10 days until 500 million users had been registered: the fastest uptake of a technology ever. And I think many people will have noticed that it was something promising but also gimmicky. But with these enormous developments — the increase in capacity and capabilities per year, which is more or less the difference between ChatGPT 4.5 and 3.5 — they have really become very strong value propositions. The most recent models, the reasoning models, are now outperforming an academic with a PhD in their field of expertise who has access to Google to answer questions. And they're so much faster. It's amazing what they can do — and also frightening — but I believe we simply have to use them.
Thomas Hartung: And there should be no curriculum for a PhD where you are not leaving with basic knowledge of these things. But you also have to know how to handle hallucinations, how to check plausibility, how to use AI to solve some of its own problems. So for example, I very often do the very same search twice, because hallucinations will never be the same, and I only use what shows up in both searches. This is a very simple way of dramatically reducing them. But I would also never cite anything I have not verified — and usually I download the papers. But we should also be clear: AI is not just large language models. Large language models have the most beautiful user interface and we can do a lot of things with them. But our predictive AI, for example, uses AI only in very small parts.
Thomas Hartung: And there's also no way of not using AI. If you do a Google search, since 2017 it has been using their transformer architecture and helping to find things. So you've already been using it for many years. And it is just that we are now talking about it more. And I think it is really about building trust, but at the same time also educating critical thinking. People need to understand that it is not the first short answer they should take — it requires really solid work around these things. But in general, AI is not hallucinating more than me or my colleagues when I ask them something.
Chris: We are imperfect too.
Thomas Hartung: What do you expect from a model which was trained on the internet?
Chris: Yeah.
Thomas Hartung: It can't be more intelligent than the median of the crap which is posted.
Chris: Exactly. That's it. And the quality of said crap doesn't seem to be going up — it seems to be going the other way. You alluded to the increase in the rate of scientific publication. So undoubtedly AI is going to contribute and accelerate that. Do you have any concerns about the biases within AI, or how AI is going to interact with the reproducibility crisis and the publication bias we have in science?
Thomas Hartung: Yeah. The quality of scientific publications is, for me, a big concern — as an editor, but also just conceptually. I think you said earlier that I have a chair for evidence-based toxicology, which is trying to bring the principles of evidence-based medicine to toxicology — which is a very thorough quality control process. So I'm all in on this. And AI is helping us to solve these problems, because we can deploy it. To give you one example: I published yesterday, in ALTEX, a draft for in vitro reporting standards, because I'm concerned about how we report on our experiments — they're not reproducible because we are incomplete. Having checklists is already an enormously important start.
Thomas Hartung: And as a complement to this paper, which we hope to finalise within the next month, we have simply created AI-driven systems where you put in a paper or manuscript and it tells you how well you comply with the reporting standard. This takes seconds. And this makes it applicable — a journal can check whether it needs to require more information, without having to hope that reviewers identify these things. It comes in seconds. And these GPTs will also be made available. We have also produced some for other reporting standards — the ARRIVE guidelines for animal studies, as well as the ToxOut tool, which my team developed almost 17 years ago. They do a very interesting job. So I'm really excited about not only producing guidance, which is dry and nobody wants to read, but a tool to implement it.
Thomas Hartung: And very similarly, we had an article last week in Archives of Toxicology about bias and risk of bias in toxicology. We discuss there, on the one hand, how AI tools can bring in more bias and accelerate it because the training data translates to bias, and on the other hand, how we can also deploy AI to do exactly the opposite. Because if humans can identify bias, AI only needs to be told to do so and can do it at scale. So it is both — it is angel-like and devilish at the same time.
Chris: It sounds like a new arms race.
Thomas Hartung: It is. But we need some literacy in order to use these tools — that's one goal. But we also need to systematically evaluate them and build trust. In April, I had a paper on regulatory science readiness for AI, together with Robert Califf — who at the time of writing was head of the FDA — and some colleagues. So we are really trying to describe the path forward. And this is really important. We also need new concepts for validation of tools which are changing on a monthly basis — you cannot expect to freeze them in time and say “for the next 10 years, use this tool.” It would be laughable, but you also can't repeat the validation exercise every month.
Thomas Hartung: So what can you do? Nicole Kleinstreuer and I — Nicole was at the time still heading the US Validation Body — wrote a paper in February about our vision that we could have an agent system which is retraining and accompanying the validated tool, to identify when it is necessary to retrain and create a new version. And then this could become a semi-automatic process where people work with versions and can say “I used version five.” And you could even go so far as to alert users if the re-evaluation made changes to earlier results — it would be no problem to alert previous users: there is now a change in category for your substance, or whatever. It becomes very different — interactive. But it is something much more future-proof than freezing a computational approach in time.
Chris: Yes. Well, that's it — a lot of decisions made over the years would be very problematic to change on the regulatory side. But the world you're describing just feels much less structured and much less anchored when it comes to these things. We have to prepare ourselves to embrace much greater complexity and fluidity in how we approach things.
Thomas Hartung: Sure. But we also have the tools to help us with this. And when I was working as a code programmer, we needed databases with good annotations. Now the new systems often work with structuring on the fly — they're ordering the data while they import it, and can work with completely unstructured information. That's how you get a very meaningful excerpt from a YouTube video, or whatever, within minutes. It's remarkable how AI can bring structure to things which are not yet structured, and help us to digest them.
Chris: Yes. And connect things — like you said, connect disparate things to gain new insights that are totally beyond our capability. I think that brings us nicely onto the point that I originally reached out to you about, which was the article in Nature Medicine: how AI can deliver the Human Exposome Project. Would you mind just explaining what the Human Exposome Project is?
Thomas Hartung: In 2005, Chris Wild who was at the time working at IARC, the Cancer Agency in Lyon, and came up with the idea that we should study the human exposome — the totality of things which are impacting on humans. At the time, it sounded terribly naïve that anybody could attempt something like this. But I come more and more to the conviction, over the years, that we actually need a complement to the human genome. I said already that only 5% of diseases are genetic, and about 40% have a genetic component. But 80 to 90% of chronic diseases have an exposure component. These are the big ones — the expensive ones.
Thomas Hartung: So we would really benefit from understanding more here. And from the strategic point of view of changing toxicology and moving away from animal testing, I think it might offer us the opportunity to move toxicology from the art of poisoning rats — or, let's say, for SETAC, the art of poisoning zebrafish — to something which is more on the exposure side of disease. And that is the idea of the human exposome: using the rich information you can now get from omics studies in biofluids, monitoring data on the diverse chemicals we can identify in samples, and forming from this an exposure hypothesis. It would say: OK, I can explain certain patterns — both in the chemicals I find and the metabolites I find, and also the metabolomic and transcriptomic changes in these people — and I can come up with a hypothesis that a certain class of chemicals, or a specific chemical, could be associated with certain meaningful changes that should be followed up. You would then enter a phase of trying to verify or falsify these possible associations experimentally, using human-relevant systems. Here we come back to micro-physiological systems.
Thomas Hartung: And the hope would be that by building more and more of these associations, learning from this data, we build up something which I used to call the “human toxome,” but we are now thinking more and more of digital twins of systems which reflect biology and the perturbation of biology through chemicals. That's the big vision. And we were twice very lucky last year. The first time, we were very lucky that I convinced our major philanthropic donor to make $17 million available for the next seven years to prepare for such a Human Exposome Project. And then my Deputy Fenna Sillé succeeded in winning an ideas competition for Johns Hopkins' fantastic new meeting space in Washington — a beautiful place, up to 360 people, directly by the Capitol, so you can talk to policy makers quite easily.
Thomas Hartung: And then we said: why don't we change gears? Instead of working for seven years towards the exposome, why don't we invite, for this venue, the leading experts in exposome science, civil society, and the policy makers who need to be convinced that we need a Human Exposome Project? And we found — despite having only a year, no major sponsors, no industry exhibition, only very minor sponsorships — we brought together 300 people. Among them were four NIH institutes that endorsed us, two of them on the organising committee. We had the WHO represented by Sir Jeremy Farrar, the at-the-time Chief Scientist of the WHO — and a day later he was promoted to Deputy Director General. We had Rémi Quirion, the president of the International Network for Government Science Advice; Peter Gluckman from the International Council of Sciences, the umbrella organisation for 250 scientific organisations in the world; and the CEO of the African Academy of Sciences. So really top people. Even the new head of NIH requested an invite and at the last moment declined — didn't come — but just that our event prompted him to request to come and join us for a while: I think that was a big, big success.
Chris: Yes.
Thomas Hartung: It really showed momentum — people felt: yes, we can; yes, we should go. And we are now starting working groups — not just about the science and a mapping of projects, but also a lot about what would be the governance of such a project, what could be the funding streams, how can we quality-assure things, best practices, and so on. It's an exciting time, which really puts these transformative technologies on a completely different scale — but it's just the same technologies we've been discussing here.
Chris: Yes. It does sound totally game-changing, to be honest — the whole exposome concept. And as you said, AI has really made this a possibility because we're talking about all kinds of different data that feed into this framework: exposures, physical exposures, genetics, lifestyle. I mean, it's so diverse.
Thomas Hartung: Yeah. But AI shows us every day how it is capable of integrating more of this. There are problems with it because we see how much energy it takes to crunch so many numbers. But we are also developing the solutions — quantum computing is promising to be a thousand times more energy-efficient. So if we have a problem, we can often also solve it. I don't see that we should blindly follow the science; on the contrary, my learning from validation and all of this is: yes, but be a self-critical bystander. Just observe what is happening, ask whether it makes sense, and consider where we need guardrails to avoid going astray.
Chris: Yes. The sense I get is that there's a huge amount of optimism and excitement from yourself about all these technologies, but — like you say — you're also conscious of the power of these tools and what could go wrong.
Thomas Hartung: Yeah. And there's no way of not trying it. Failure is an option; not trying is not.
Chris: We just have to hope that we have responsible people in the right places doing responsible things as we go, which we’re all accountable for, you know, everyone who's involved, I suppose.
Thomas Hartung: This is why we also need this division of power. We need some people who develop these things, separate people who validate them, and then another group who take decisions based on this — because you need all three.
Chris: Yeah. I wanted to also touch on something that really interests me about the exposome — because in my field, environmental chemistry and the environmental persistence of chemicals, this issue has really grown hugely in recent years. Everybody's very interested in whether chemicals will degrade in the environment. I think this has largely been triggered by issues like plastic pollution, and also PFAS — both regarded as not degrading in the environment. And I think it's also linked to the sustainability movement and the drive towards a circular economy, and whether emissions of non-degradable chemicals into the environment can be considered a sustainable activity. But something that strikes me is that I really get the sense that the public doesn't fully comprehend the nature of the society we've built and the nature of their exposure to chemicals in their day-to-day life. They can get on board with the idea that if they work in an occupational setting they might be exposed to some chemicals, or if they use a product at home they might be exposed to chemicals. But especially when it comes to environmental exposure — how a chemical could make its way through the environment back to people — I think that's something that is totally alien to a lot of people. And as a result, we see alarming headlines all the time about certain chemicals being found in food, or in people's blood. My thought on the exposome is that it's also an opportunity to bring the public to a greater awareness about the nature of the society we've built, and that hopefully that will help us have better policy decisions as well.
Thomas Hartung: Yeah. I agree. The number of chemicals around us and being used is incredibly large. There are about 1,000 industrial chemicals making it to the market in the US per year. 1,000. There's no way of properly assessing them before they come. A study from 2019 showed that there were 350,000 chemicals registered for marketing in the 19 most developed countries. 350,000. Nobody will ever be able to address them comprehensively, except with purely computational tools to identify problems. And I think that's really key. I don't think chemicals are bad, but we have produced enormous chemophobia because of a lack of knowledge in some cases that have been very publicly discussed. And I think we need to change this narrative, because we have tripled life expectancy since these chemicals came into our lives. They can't be so terribly bad. And there are not many diseases which are really increasing, if you correct for age. So it is really important to understand that we probably need to find only a few culprits — certain things which we need to identify. And again, this is probably happening through screening, not through definitive studies testing every chemical on everything.
Thomas Hartung: And it's quite fascinating that the green chemistry movement, to make things more sustainable, has two of its nine principles that are toxicological principles: benign by design, and early identification of toxic liabilities. And they are the very same technologies which we need — computational, so that the chemist can design safer drugs or safer chemicals; and early testing, which is only possible with the tiny amounts that an in vitro system requires. So, interestingly, just the week after our Exposome Moonshot Forum in Washington, I was in Stockholm for a Nobel Foundation conference on the future of green chemistry. There — with Paul Anastas, one of the fathers of green chemistry — 30 people, myself included, had developed a declaration for the future of chemistry, trying to promote exactly the same idea: we need to change, and we need to build this change on transformative technologies.
Chris: Yes, yes. I'm glad you brought up that declaration, because it is something really important that's going on right now, and I would encourage everybody to read it — and, if they choose, to sign it. I've read it several times, and I think it does lay out some very good ideals and principles for how we can move towards a more sustainable society. But one thing I'm also starting to see is that we almost need a deeper philosophical discussion about the environment. Because I think underpinning a lot of the discourse around chemicals in the environment are two worldviews. There's a human-centred worldview that accepts and aims to manage damage to the environment through human progress. And there's another worldview of people who simply want to preserve the environment for its own sake and leave no mark on it. And it seems that unless we revisit that, we're going to continue to have fundamental disagreements about how we manage chemicals.
Thomas Hartung: Yeah. I agree. And I'm always attracted by these types of big discussions. The One Health discussion, the planetary health discussion — this is what you're alluding to. I think both are very important, and we need a good understanding and modelling of these connections to be able to do the effective things.
Thomas Hartung: It was quite fascinating — at the conference on green chemistry under the Nobel Foundation, people were showing that the ecological footprint of a paper bag is much bigger than that of a plastic bag. So we are sometimes not doing the intuitively correct things. Without being able to judge all of these arguments — in toxicology also, we do sometimes have some really crazy discussions. Take bisphenol A — that's a discussion which has led to 12,000 scientific papers, and we are still arguing about whether some extreme exposures of babies could have an endocrine-disrupting effect. It is certainly not the biggest problem in the world. We are swimming in endocrine-active substances in all of these phases, and if at all, it is a borderline exposure. But we have tremendous scientific effort without coming to a clear result. Which is crazy. And with respect to resources — you can also say: let's be precautionary and simply use something else. Go back to glass bottles, I don't know. There are disadvantages and advantages to all of this. But I think a more comprehensive view on things is warranted — also to avoid regrettable substitutions.
Thomas Hartung: There's always this joke that a physician dealing with a cancer patient is so worried about some little detail on the body that they specialise in — and this is exactly what is happening. We have major problems, and we need to find solutions. And I don't like that we have not found a way of making the Human Exposome Project encompass the ecological aspects as well, because I think the same tools could be applied. But profiling it as a complement to the human genome, it made sense to make it human. While we are falling short in understanding and including some of these important aspects of environmental health and their impact.
Chris: No, I really appreciate those reflections, Thomas. This has been an absolutely fascinating discussion — I knew it would be. I just want to thank you for your time today. For those listening, Thomas is on holiday at the moment, so he's taken time out of his vacation to speak on the podcast. So thank you very much for that. And thank you to everyone who has been listening. Your time is really precious, and I'm very grateful that you've spent it with us. I hope you've enjoyed this discussion. If you have, please tell your friends and colleagues about it — spread the word about the Chemical Journeys Podcast. I hope to have many more conversations like this to follow. Thank you very much, Thomas.
Thomas Hartung: Thanks for having me.