Meet the researcher – Jason Hartford

Jason Hartford,

University of Manchester

“My diverse academic background drives the way I think about machine learning and gives me a bit of a different perspective on approaching generative models. ”

The University of Manchester’s Jason Hartford describes how studying economics first got him interested in machine learning and how his academic background has influenced his approach to AI research.  

Tell me about yourself and your work? 

My PhD was in developing machine learning methods for causal inference and as a post doc I focused on what I felt was really missing in causal inference – the question of how we deal with unstructured data. This took me into the world of causal representation learning, which is trying to ask the question of how we can make causal statements about unstructured data-like images, text and so forth.   

One area that generates a huge amount of this kind of data - and in particular, experimental data - is biology, so after my postdoc I spent some time at a biotech called Recursion who are able to perform large-scale experimental screens of microscopy images.   

Now I am a Kathleen Ollerenshaw fellow at the University of Manchester, and I hold a joint appointment with Valence Labs, which is the research wing of Recursion. I remain very interested in causal representation learning questions, and I am also interested in how we efficiently design experiments to explore the very large spaces of possible experiments in biology.  

Why is this work in the field of generative biology so important and what kind of difference could it potentially make? 

The most obvious societal implication is in discovering new drugs. In drug discovery applications experiments are expensive, so you really want to have computational simulations of experiments. You still need real world experiments to test whether your models are correct, but you would move from using labs for collecting data to primarily for verification and using simulation to generate most of your hypothesis.  

Generative models give us very powerful frameworks for learning simulators of cellular behaviour from data. If we can build accurate simulators of cells and their responses to perturbations, then this would allow us to test out many possible drugs using computer simulation and only run experiments on the most promising candidates. This is a tricky problem but overcoming it would speed up the drug discovery process by allowing researchers to quickly search over large libraries of potential drugs and only test the most promising one in the lab.

Which of the hub’s working groups are you part of?  

I am part of the Generative Biology working group led by Magnus Rattray. Our workstream is interested in finding and supporting collaborations that build our generative models for biological applications. So, we are interested in both modelling applications and supporting data generation efforts as well. 

 We want to support the data generation process, because that is something that is critical in biological applications. Unlike in text or images where you have mountains of this data on the internet that is generated as a product of our going about our business, in biological applications you really have to go out and actively collect the right data. 

What are your hopes for the working group and the hub more generally? 

We are in an interesting time in machine learning where there's a lot of value in larger scale, more collaborative projects. I think the Hub serves as a really nice linchpin for building up these larger collaborations that span universities. I think the thing I am most excited about is building bigger projects that require larger sets of resources and longer time horizons. This approach has been very successful in industry, and I hope that we are able to foster that through the hub. 

 What do you think the AI Hubs bring to the research ecosystem? 

A lot of today's science is very collaborative and interdisciplinary. Breaking down the walls between institutions and getting people working together across shared areas rather than shared institutions is how we can build out bigger, more ambitious projects. 

The hubs provide a mass that crosses universities and provides collaboration. I am interested in causal questions and generative models, so I have spoken a lot to people in the CHAI hub and I work with the Gen AI hub. Just having a forum for connecting researchers across universities and building masses around particular areas, I think makes a lot of sense.  

I did all my graduate studies in the Canadian system where they have these large institutes like Mila, the Vector Institute and the Alberta Machine Intelligence Institute (AMII), which are about machine learning more generally, but they are very successful at building out collaborations. They have a shared building space, so everyone's working out of the same area, which makes it even easier to collaborate. Potentially you can imagine the Turing ending up playing that sort of role.  

So how did you end up as an AI researcher?  

I did my undergrad studies and first masters in South Africa in statistics and economics. While working for the South African government, I became interested in the fact that we had these huge data sets that no one was using. At the time “big data” was the buzzword and I started reading about statistical approaches to machine learning.  

Trevor Hastie and Robert Tibshirani's book Elements of Statistical Learning got me really interested in data-driven methods, so I applied for a PhD, originally planning to go into economics. I then ended up applying to an Econ / Computer Sciences programme with Kevin Leyton-Brown at the University of British Columbia. After arriving in Canada, I got more interested in machine learning and ended up doing my PhD purely in machine learning.

In economics, you tend to only really care about causal questions, rather than pure prediction questions. That kind of background informed my belief in the importance of causal inference before I truly understood how it differed from standard machine learning. That foundation continues to influence how I think about machine learning because I care most about learning models that accurately reflect the processes that are going on in the world. You want to know the real relationship; you do not want some correlation that you have just picked up in the data. My diverse academic background drives the way I think about machine learning and gives me a bit of a different perspective on approaching generative models. 

What would it surprise people to know about you? 

I once rode a bicycle from Bangkok to Kunming in China through Laos. After my economics Masters, I spent a year backpacking around the world and as part of that, I did a lot of very long bike rides. I also rode up onto the Tibetan Plateau and in Xinjiang province in the northwest of China up the Karakoram Highway, which is the road that connects Kashgar in the far west of China to Pakistan. I travelled by bike to get off the tourist trail and I ended up seeing parts of the world that I would have never seen otherwise. 

Rosie Niven

Rosie joined the hub from the regional university consortium Science and Engineering Sourh where she was a Communications and Events Manager. Since 2020 she has held a number of communications roles at UCL. Previously a journalist, Rosie has worked in higher education organisations since 2014, including Jisc and Universities UK where she edited the Efficiency Exchange website.

Next
Next

Why academic AI research is essential for our country’s geopolitical and economic security