UK unveils Sovereign AI collaboration designed for its languages and public services

18 Sept

Selection of road signs for the Elan Valley Reservoirs, Powys, Wales, United Kingdom, Europe (credit: Andy Chisholm, iStock) — Selection of road signs for the Elan Valley Reservoirs, Powys, Wales, United Kingdom (credit: Andy Chisholm, iStock)

A collaboration led by UK-LLM to develop a large language model (LLM) enabling AI reasoning for the UK’s languages and specific needs has been announced by UK Prime Minister Sir Keir Starmer.

The UK-LLM project led by University College London is building an AI model developed in collaboration with Bangor University and NVIDIA to enable reasoning in both English and Welsh, a language spoken by about 850,000 people in Wales today.

Compared with languages like English or Spanish, there’s less available source data in Welsh for AI training. So to create a sufficiently large Welsh training dataset, the team used NVIDIA NIM microservices for gpt-oss-120b and DeepSeek-R1 to translate NVIDIA Nemotron open datasets with more than 30 million entries from English to Welsh.

This new dataset supplements existing Welsh data from UK-LLM’s previous effort, with potential extension for other languages used across the UK.

The model is part of UK-LLM’s ambition to produce freely available large language models that better reflect the UK’s languages and unique needs, compared with LLMs trained predominantly on US data.

Professor Pontus Stenetorp of University College London’s AI Centre and a Gen AI Hub investigator, is the leader of the UK-LLM project, which was established in 2023 as BritLLM. He said the collaboration is an ongoing wider effort to produce training data, evaluation data, know-how, and open models aligned with UK interests.

“This collaboration with NVIDIA and Bangor University enabled us to create new training data and train a new model in record time, accelerating our goal to build the best-ever language model for Welsh,” he said. “Our aim is to take the insights gained from the Welsh model and apply them to minority languages, in the UK and across the globe.”

UK-LLM contributes to the government’s mission to transform AI in public services by enabling AI reasoning for the UK’s languages and specific needs.

In addition to enabling reasoning in Welsh, the model created as part of the collaboration is trained using data that reflects local laws, to allow it to be used for healthcare, education and to provide legal resources.

UK-based AI cloud provider Nscale will make the new model available to developers through its application programming interface.

UK-LLM was one of the projects piloting the Bristol-based supercomputer Isambard-AI, when it first went live earlier in 2025. Since the summer of 2024, the project has taken part in Isambard-AI Phase 1 Technical Preparatory Access, to evaluate the UK’s most powerful supercomputer’s suitability for LLM training.

Led by the natural language processing group at University College London, the project has previously released two models for UK languages.

The first concrete scientific outcome from the work prior to this collaboration was "Multilingual Language Model Pretraining using Machine-translated Data", which broke new ground by using machine translation as a method to improve multilingual LLM pretraining and will appear at the 2025 Conference on Empirical Methods in Natural Language Processing in November.

Other outcomes have included showing the feasibility of using the graphics processing unit (GPU) infrastructure of the supercomputer DiRAC to train LLMs. The project has also produced the first LLM trained solely using British compute: Caernarfon 3B, which outperforms models more than twice its size for English, Irish, Welsh, and Scottish Gaelic.

More on this subject:

Hub Director welcomes UCL’s partnership with NVIDIA

Read more

Sovereign AIUniversity College London

Rosie Niven

Rosie joined the hub from the regional university consortium Science and Engineering Sourh where she was a Communications and Events Manager. Since 2020 she has held a number of communications roles at UCL. Previously a journalist, Rosie has worked in higher education organisations since 2014, including Jisc and Universities UK where she edited the Efficiency Exchange website.

UK unveils Sovereign AI collaboration designed for its languages and public services

Hub Director welcomes UCL’s partnership with NVIDIA

New datasets and compute time will make Nightingale AI sing

Hub welcomes New Zealand’s Science Minister to UCL’s AI Centre

The AI Hub in Generative Models