Posted on June 10, 2021 by Mohit Motwani
At the NAO, our central analysis team supports value for money studies by applying specialist analysis techniques to government data in order to generate new insights. For our Investigation into the Windrush Compensation Scheme, we used process mining to help the study team understand the Home Office’s operation of the scheme.
Process mining is an exciting technique that allows you to gain a detailed understanding of a process purely by analysing data that users of the process generate, which can be found in automated event logs. It allows us to understand the flow of cases through a system, including the time taken for different activities, how resources are used and where bottlenecks occur, and as such can be vital in assessing the performance of the overall process.
To use process mining, the minimum data requirement is an event log containing Identification Codes (IDs) that are unique to each case, consistent labels relating to actions made on a case, and a timestamp for each action. For our work on the Windrush Compensation Scheme, we analysed logs generated by the Home Office’s case management system for dealing with compensation claims using the open source programming language R, specifically the bupaR collection of packages.
Our first step using process mining was to create a process map. This output provided a powerful visual representation of the different stages of the case management system and the way in which cases move between stages – beginning with registration and ending ultimately in either claim payment or rejection.
A simplified version of the process map was included in the report and is repeated below. The different nodes shown here represent the different stages in the process, while the numbers show how many times a compensation scheme case has entered/re-entered a stage (boxes), or, the number of times cases have moved between stages (connecting lines).
Next, we used process mining to create an animated version of the process map. This showed all individual compensation scheme cases dynamically moving through the system between stages. It proved very effective in illuminating the rate at which cases progress through the system and the stages where that progress is slower – A short illustrative GIF of this output is shown below (see also footnote 2).
Visualising the Windrush Compensation Scheme in these ways enabled the study team to better understand its operation and to draw out insights from the client’s case management systems. As a result, our auditors were able to define complex hypotheses and explore these with the client, using process mining analysis to support their findings.
For example, our analysis showed that, of cases that were subject to a quality assurance check, half needed to return to a caseworker, indicating a significant level of rework. We were able to use process mining to combine this with observed data and quantify the exact rate at which this occurred. In another example, our study team were able to use process mining to compare the Department’s initial estimates for the average number of hours to do everything required on a case, end to end, with the actual number of hours recorded in the Department’s data for cases up to 31 March 2021.
Using advanced analytics – in this case process mining – our study team enhanced their understanding of the Windrush Compensation Scheme and the report’s findings.
Authors: Mohit Motwani and Ben Coleman
Ben and Mohit work in the NAO’s analysis hub, helping support value for money studies by providing complex data analysis techniques to study teams. They both undertook process mining work in relation to the NAO’s Investigation into the Windrush Compensation Scheme.
- Data relate to 1033 cases for which a registration stage was created after 13 March 2020 and show the observed movement of these cases through the system until the 31 March 2021.
- The numbers shown are a count of instances of cases reaching a stage or moving between stages, rather than a count of the number of unique cases. Some intermediate stages such as offer and payment approvals and payment preparation have been omitted for clarity.
- Some movements between stages have also been omitted for clarity, including:
- Cases moving back following successful applicant appeal;
- Cases moving back to registration or eligibility following casework;
- Cases moving back to casework following payment offer.
Source: National Audit Office’s analysis of Home Office applications data
The full version of this covered the period March 2020 to March 2021, which was the period for which we had access to full case management records in the event log.
Posted on June 1, 2021 by Daniel Lambauer
In our ever-increasing digital and automatized world, certain buzzwords are becoming more centre stage in the public sector. One of them is “artificial intelligence”. While the concept, and development, of artificial intelligence is not new (artificial intelligence was first recognised as a formal discipline in the mid-1950s), it is a word that has been casually thrown around more in recent years in the public sector, and sometimes carelessly.
Traditional algorithms vs machine learning models
These days modern data scientists normally associate with artificial intelligence systems that are based on machine learning models. Machine learning models deploy methods that develop rules from input data to achieve a given goal.1 There is a difference to, what you may call, traditional algorithms. Traditional algorithms don’t need data to learn, they just churn out results based on the rules inherent to them.
Traditional algorithms have been used in the public sector for some time to make decisions. The latest example making the headline was the model determining A level exam results last summer. From an auditing perspective, as the basis of the algorithms are usually transparent, auditing them is something we as a public audit institution are used to.2
But artificial intelligence that is based on machine learning is different – it has only been (cautiously) employed in the public sector in recent years.
It is different because, firstly, for a machine learning model to learn it needs good, quality data – and often a lot of it. Our report on the challenges of using data across government has shown that that condition is not always given.
Secondly, it can be quite costly to develop and deploy them. Moreover, the benefits are not always guaranteed and immediately realisable. In a public sector context with tight budgets, the willingness to put money behind them may not always be there.
The reason for this is related to a third point. It is not always certain from the outside what the machine will learn and therefore what decision-making rules it will generate. This makes it hard to state the immediate benefits. Much of the progress in machine learning has been in models that learn decision-making rules that are difficult to understand or interrogate.
Lastly, many decisions affecting people’s lives that artificial intelligence models would support pertain to personal circumstances and involve personal data, such as health, benefit or tax data. Whilst the personal data protection landscape has strengthened in recent years, there are not always the organisational regulatory structures and relevant accountabilities in place of the use of personal data in machine learning models.3 Public sector organisations are therefore at risk of inadvertently falling foul of developing data protection standards and expectations.
How to audit public sector machine learning models
Given all these challenges, it may not be surprising that in our public audit work, we are not coming across a lot of examples of the use of machine learning models in decision-making. But there are examples4 and we foresee that they may be growing in the future.
We have therefore teamed up with other public audit organisations in Norway, the Netherlands, Finland and Germany, and produced a white paper and audit catalogue on how to audit machine learning models. You can find it here: Auditing machine learning algorithms (auditingalgorithms.net).
As the paper outlines in more detail, we identified the following key problem areas and risk factors:
- Developers of machine learning models often focus on optimising specific numeric performance metrics. This can lead them to neglect other requirements, most importantly around compliance, transparency and fairness.
- The developers of the machine learning models are almost always not the same people who own the model within the decision-making process. But the ‘product owners’ may not communicate their requirements to the developers – which can lead to machine learning models that increase costs and make routine tasks more, rather than less time-consuming.
- Often public sector organisations lack the resources and/or competence to develop machine learning applications internally and therefore rely on external commercial support. As a result they may take on a model without understanding how to maintain it and how to ensure it is compliant with relevant regulations.
We also highlighted the implications for auditors to meaningfully audit artificial intelligence applications:
- They need a good understanding of the high-level principles of machine learning models
- They need to understand common coding languages and model implementations, and be able to use appropriate software tools
- Due to the high demand on computing power, machine learning supporting IT infrastructure usually includes cloud-based solutions. Auditors therefore also need a basic understanding of cloud services to properly perform their audit work.
Our audit catalogue sets out a series of questions that we suggest auditors should use when auditing machine learning models. We believe it will also be of interest to the public sector bodies we audit that employ machine learning models. It will help them understand what to focus on when developing or running machine learning models. As a minimum, it gives fair warning what we as auditors will be looking for when we are coming to audit your models!
1 In fact there are two main classes of machine learning models. Supervised machine learning models attempt to learn from known data to make predictions; unsupervised machine learning models try to find patterns within datasets in order to group or cluster them.
2 See for example our Framework to review models – National Audit Office (NAO) Report to understand more about what we look out for when auditing traditional models and algorithms. We currently have some work in progress that aims to take stock of current practices and identify the systemic issues in government modelling which can lead to value for money risks
3 In the UK the Information Commissioner Office has published guidance on the use of personal data in artificial intelligence: : Guidance on AI and data protection | ICO
4 For some UK example see: https://www.gov.uk/government/collections/a-guide-to-using-artificial-intelligence-in-the-public-sector
About the author:
Daniel Lambauer joined the NAO in 2009 as a performance measurement expert and helped to establish our local government value for money (performance audit) team. He is the Executive Director with responsibility for Strategy and Resources. As part of his portfolio, he oversees our international work at executive and Board level and has represented the NAO internationally at a range of international congresses. He is also the NAO’s Chief Information Officer and Senior Information Responsible Owner (SIRO). Before joining the NAO, Daniel worked in a range of sectors in several countries, including academia, management consultancy and the civil service.
Posted on July 20, 2020 by Ruth Kelly
Data analytics has become an integral part of the audit process. And where a few years ago it was an area where we were exploring and creating tactical solutions to problems, today the NAO, like many organisations, has developed a strong capability and is making significant progress towards the widespread adoption of data analytics across all elements of our audit work.more… How data analytics can help with audits