Universities awarded £400,000 to build datasets shaping the future of AI

Four university-led projects have been awarded a total of £400,000 by the AI Hub in Generative Models to develop datasets that could help make AI systems safer, more reliable and better aligned with the real world.

The funding, awarded through the hub’s Dataset Creation & Challenge Projects programme, supports collaborative teams working at the forefront of science, technology and the creative industries. Each project will receive up to £100,000 to accelerate the creation of high-quality datasets - a critical foundation for advancing modern AI systems.

From improving how AI understands sound and human movement to reducing the risk of satellite collisions and helping systems update their knowledge more reliably, the projects address some of the most pressing challenges facing AI today. Together, they aim to unlock new applications across sectors, including space, the arts and professional services, while strengthening the UK’s position in responsible AI innovation.

Funded projects

Lead Institution Project Name
Northumbria University SSA-LaMB: Space Situational Awareness Language Model Benchmark for Evaluating Generative AI in Safety-Critical Space Operations
Queen Mary University of London Real World Multimodal Data for Generative Audio Synthesis, Sound Source Separation & Auditory Machine Understanding
University College London Beyond Motion: The World’s Most Advanced Multimodal Creative Performance Dataset
University of Sheffield Multi-DocVerify: A Multimodal Evidence-Based Benchmark for Professional Contexts
 

David Barber, Director of the AI Hub in Generative Models, said:

"Data access and capability are central to unlocking the benefits of AI. The availability of rich, diverse and well-curated datasets is vital for advancing scientific and technological innovation. Conversely, a shortage of high-quality data or poorly structured data hinders generative AI models' ability to learn reliably from authoritative sources.

"To support this, we launched the Dataset Creation and Challenge Projects funding call to fund collaborative, open-source dataset projects. We received almost a hundred applications and were impressed by the quality and scope of the projects proposed, which made selecting just four for the award a considerable challenge. The successful proposals are all outstanding examples of what was submitted, covering diverse disciplines, from space science to the performing arts."

Dataset challenges play a vital role in benchmarking AI systems, identifying new solutions and enabling collaboration. This call was open to UK-based academic teams eligible for UK Research and Innovation (UKRI) funding. 

The hub was set up in October 2024, along with eight others set up by UKRI to deliver next-generation innovations and technologies. These awards are the first made through an open funding call by the hub, which brings together experts in Generative AI from industry and academia.

Professor Wai Lok Woo, Head of Data Science and Artificial Intelligence, Northumbria University, said:

“Trustworthy AI in space is not a future ambition, it is an urgent present need. With operators carrying out more than 144,000 emergency satellite manoeuvres every year to avoid collisions, AI systems that cannot honestly communicate uncertainty pose real operational risks.

“We are delighted to receive this award to enable us to build the evaluation infrastructure the community needs to move from promising capabilities to proven reliability. Developed with operational partners in both UK defence and commercial space sectors, SSA-LaMB gives every researcher access to rigorous AI evaluation tools regardless of whether they hold classified data access or work at a well-resourced institution."

Dr Iran R. Roman, lecturer at Queen Mary University of London, said:

“I’m genuinely honoured - and honestly still a little surprised - to be selected alongside such strong competition.

“This project is about something I care deeply about: the gap between how AI perceives the world and everything that’s actually going on in it. Real kitchens are messy, rooms have corners that are hard to sense, sounds bounce around chaotically.

“We want to capture all of that and give it back to the community as open data. If we do this right, a developer somewhere builds something that helps people in ways we haven’t even imagined yet. That’s what gets me excited to come in every day.”

Professor Neill Campbell, Professor of Visual Computing and Machine Learning, University College London, said:

“We are very excited about this new partnership between UCL’s Creative Intelligence Centre, the Generative AI Hub and Studio Wayne McGregor as well as collaborators at Cardiff and Bath.

“This dataset will push the boundaries of what it is possible to capture technically in collaboration with world-class performers who are stretching the limits of human motion.

“We are very grateful for the opportunity to co-create the research questions with Sir Wayne, renowned for his creative vision and ambition; we hope that the resulting creative challenges we present will help the research community to move away from activities, benchmarks and metrics born out of convenience, or availability of data, into those that explicitly acknowledge and address creative tasks.”

Dr Xingyi Song, Lecturer in Computational Media Analysis, University of Sheffield said:

"We are excited to receive this award, since it allows us to tackle a critical and timely challenge in Generative AI: the processing of large, multimodal documents.

 “While Gen AI, such as Retrieval-Augmented Generation (RAG), is being widely adopted across many industries, its performance in real-world applications remains under-evaluated.

“This award gives us the opportunity to strengthen collaborations with our industrial partners, Full Fact and AMRC, by developing and benchmarking Gen AI models in their real-world use cases, directly contributing to solving practical problems and ensuring the safety and efficiency of Gen AI deployment."

Rosie Niven

Rosie joined the hub from the regional university consortium Science and Engineering Sourh where she was a Communications and Events Manager. Since 2020 she has held a number of communications roles at UCL. Previously a journalist, Rosie has worked in higher education organisations since 2014, including Jisc and Universities UK where she edited the Efficiency Exchange website.

Next
Next

Hub researchers use concept-based attack to stress test AI safety