Tackling Rising Data Volumes in AML 

By Kieran Holland, Head of Technical Solutions, FinScan

High-quality data is the bedrock of an effective anti-money laundering (AML) and watchlist screening programme. Incomplete or inaccurate data can lead to false negatives, potentially allowing illicit activities to go unnoticed. Conversely, it can trigger false positives, leading to unnecessary investigations and resource wastage. With data volumes increasing exponentially, the risk of such data mismanagement is a growing concern for organisations. This article examines the issue of growing data volumes, the importance of data selection and comprehension, and best practices for managing data quality going forward.

The explosion in data volumes

Global data volumes have exploded over the last decade and are set to exceed 180 zettabytes in 2025[1]. So, too, has the number of customer data points available to organisations. Businesses have larger and more dynamic customer data sets, from customer onboarding to client management and transaction data. In tandem, this has also created a surge in metadata volumes — that is, the relevant data points about each data set, such as when and how the organisation collected the customer information. 

The surge in data presents both opportunities and challenges. From an AML perspective, it provides a wealth of information to enhance customer checks. Traditionally, AML checks have focused on the basic customer relationship but are now expected to incorporate a broader range of data points. For instance, regulators typically require organisations to gather additional information to understand Ultimate Beneficial Owner (UBO), conduct adverse media checks, and use behavioural analytics for transaction monitoring. 

The challenge for organisations is how they approach the management of these ever-expanding datasets for AML purposes. Ultimately, the data is only of any real value if it is complete and accurate. This is especially important given the rapid development and increased adoption of AI tools in AML, which rely on high-quality data inputs for successful deployment. Yet organisations often grapple with multiple, disjointed legacy systems that were not designed with today’s data requirements in mind and sometimes inherit low-quality data sets through acquisitions or through the course of acquiring back-books of business, as is commonly seen in insurance markets. As a result, putting in place a robust data quality programme that addresses data selection, comprehension, and ongoing quality management to mitigate these challenges is essential. 

Data selection: choosing the right data

The first step in managing data quality is to select the appropriate data for the use case. In doing so, it is vital to consider data protection regulations, such as the EU’s General Data Protection Regulation (GDPR). These require data to be proportionate and necessary for the intended use, meaning organisations do not have free rein to comb through unlimited data.

Organisations must also determine whether data is fit for compliance. For example, if the date of birth is used to screen against a sanctions list, does the institution reliably have that data? And if so, does it know how that data point is sourced and verified, where it is stored, and any governance procedures around it? 

Frequently, data is missing or unreliable. It may be decades old and collected using legacy systems without the relevant fields. Or the data may be captured in a physical format, such as a bill of lading when shipping, which the organisation needs to convert into a digital data point. Understanding these potential obstacles and how to overcome them is a critical stage of the data selection process. 

Data comprehension: understanding your data 

Organisations should also have a strong handle on how accurate each data point is and the common formats used, and any missing data points. Indeed, without understanding the quality of the data or potential weak spots, it is impossible to make informed decisions about how to best use it. 

It is important to remember that no one company’s data is ever going to be the same as another, once you consider all the “variables” such as business type, target markets and customers, internal systems, policies and procedures.  The data a company holds is unique and needs to be treated as such. 

To establish accuracy, organisations should analyse the key data points, such as name, address, and date of birth or passport number, nationality and source of funds intended for the various AML checks. This helps to determine the reliability of each data point; where this is low-quality or frequently missing, the organisation knows it cannot be used as the primary identifier for AML and screening purposes. It also makes it easier to determine what remedial work is needed to improve existing data sets and collection processes.

Ongoing data management: keeping data quality up to date

Finally, organisations need to appreciate that data is a constantly moving target. New customers are acquired, existing customers add new products, services, or locations, old customers are offboarded, and customers create new relationships with third parties that need to be understood. Adding to the challenge, this data is normally stored in multiple disjointed databases. Organisations must, therefore, keep up with the constant evolution of their data sets. Failure to do so will quickly return them to square one after any one-off data improvement exercises. 

There are two approaches organisations can consider for ongoing data management. The first is to completely overhaul raw data inputs, but this is a high-cost, time-intensive option that is difficult to convince management to support. Often, projects do not even get off the ground, let alone finish. In many instances, compliance does not own the data sets and needs budget from other departments to conduct this exercise. 

The second option is to use data quality tools that prepare and monitor the data as it is absorbed into the AML system rather than requiring that data be cleaned before it is used. This removes the need for compliance to push upstream for more resources and is normally more cost-effective and more specific to the needs of compliance teams. 

Managing data quality checks at scale

As data volumes in the AML space continue their upward trajectory, organisations must consider how they manage the impact of data quality on compliance and other efforts. Data quality is the foundation of a successful AML and screening programme. Without it, the potential for false negatives and false positives or inaccurate risk ratings increases. To avoid and overcome data quality issues, organisations must first consider the data points they select for AML purposes and how well they understand that data set. They must then factor in ongoing data management and how data can be best prepared and checked as part of AML and watchlist screening processes.


[1] https://www.statista.com/statistics/871513/worldwide-data-created/