Tag: Entity Resolution

Entity Resolution: An Essential Tool for Fighting Crime

It’s not John, it’s James. In the US alone, it is estimated there are over 30,000 people who share the same name, James Smith. In Korea, almost 20% of the population – some 10 million people – share the same family name of Kim. The world is also home to over 150 million with the same given name – Mohamed. Cases of mistaken identity are common, particularly when searching over large volumes of data, but they needn’t be.

Almost all investigatory work, whether in law enforcement, counter terrorism or within the anti-money laundering (AML) and due diligence processes of a bank, require accurate ways of searching and discovering specific entities in large data sets. However, poor record keeping, missing or incomplete data and legacy matching-logic hamper these efforts. False positive matches – selecting the wrong entity – and worse, false negatives (where a critical search result is missed altogether) are abundant.

Not only are they not unique, there is also no standard way of rendering names. Thus, James Smith can be Jim Smith, J Smith, J M Smith, as well as a huge array of possible typos, transpositions, aliases, or renderings in different dialects, alphabets and scripts.  Matching against “exact hit” names works when data quality is very high, but it means there are no alerts at all if names have even the slightest variation, increasing the chances of criminals slipping through the net. Similarly, so-called “fuzzy matching” which will alert if one or two characters are different, still cannot account for the sheer variety and array of cultural nuances in how names are rendered in different types of data.

The solution is to use data to drive a new type of matching logic – advanced Entity Resolution. Ripjar uses observations from millions of names, deriving matching logic from how the name is used in real-world situations. 

Entity Resolution is an essential capability in the fight against financial crime, fraud and terrorism. By improving the quality of the data that is used to make decisions such as enforcing international sanctions or alerting to possible corruption or fraud, it can dramatically improve the effectiveness and efficiency of human analysts and allow small teams to scale investigations to the demands of the modern information environment.

Combining recent work in entity resolution and NLP means that analysts can now see the complete picture across structured and unstructured data, and data-driven approaches to name matching covering transliterations, scripts and other real-world name variants can give 90% more accuracy than legacy “fuzzy matching” technology. Robust data privacy controls mean interconnected graphs of knowledge, resolving entities from all available data sources can be now built without compromising user privacy or data protection.

If you would like to know more about Ripjar’s approach and how we have helped global institutions roll out breakthrough innovations in entity resolution to support their counter-financial crime programmes, please download our whitepaper or get in touch with the team below. 

David Balson
Director of Intelligence

Contact us for a demo today

What is Entity Resolution and why is it vital in the fight against financial crime and terrorism?

It’s not John, it’s James. In the US alone, it is estimated there are over 30,000 people who share the same name, James Smith. In Korea, almost 20% of the population – some 10 million people – share the same family name of Kim. The world is also home to over 150 million with the same given name – Mohamed. Cases of mistaken identity are common, particularly when searching over large volumes of data, but they needn’t be.

Centuries of tradition and culture have given us an eclectic mix of ways that we refer to one another. Names can reflect our familial ties, which generation we were born into, who our ancestors were, our clan, or may even indicate a union of two families. They are a part, not just of our heritage, but of our identity. This rich diversity however, was not intended for the information age, where electronic records, transactions and communications often require a global, unique and unambiguous identity to be resolved. IP addresses may work well to uniquely identify devices on the global internet, but humans still require something more… human. 

The stakes are high. Almost all investigatory work, whether in law enforcement, counter terrorism or within the anti-money laundering (AML) and due diligence processes of a bank, require accurate ways of searching and discovering specific entities in large data sets. However, poor record keeping, missing or incomplete data and legacy matching-logic hamper these efforts. False positive matches – selecting the wrong entity – and worse, false negatives (where a critical search result is missed altogether) are abundant. 

Entity resolution?

When searching large datasets for names or organisations, ‘entity resolution’ refers to data analytics that aim to uniquely resolve data – often across many different sources – to a real-world entity.

Any example makes clear the benefits of this. Our collection of James Smiths could be resolved by utilising other details in the data. Email addresses, dates of birth and postcodes are common attributes that help systems disambiguate or join the dots between multiple records about the same person such that results are returned for the specific individual and not their namesake. Companies suffer from the same ambiguity too – nearly three thousand companies are registered with UK Companies House starting with the word “Sigma”, but using amplifying information such as address, phone number, company registration date, or any other feature of the record helps entity resolution technology narrow down the data to ensure that decisions are made with only the desired, and not unintended effect. 

Most importantly to regulators, the global programme of international sanctions enforced by the US, EU, UK and almost every other country relies on high quality entity resolution. When GRU officer Yuriy Sergeyevich Andrienko was charged in connection with worldwide crimes in cyberspace, his name was added to many international watchlists. However, this name may be rendered not only in the latin script above, but in its native form in the Cyrillic alphabet – Юрий Сергеевич Андриенко. It may be abbreviated, or re-ordered, or simply misspelled. So to ensure that the sanction is effectively implemented, that records are not missed and that other people with similar names are not inadvertently punished, large amounts of analysts’ time are spent ensuring that poor quality alerts are fully assessed. 

Key Challenges for Entity Resolution

Entity resolution can be a powerful enabling technology that can underpin anti-money laundering and counter-terrorism programmes. In its most rudimentary form it has existed for many years with deep limitations. However, new technology such as artificial intelligence means it is an area that is rapidly evolving. We see five key challenges for data scientists to overcome to create more efficient and effective systems for countering money laundering and terrorism:

Joining automatically between structured and unstructured data – The power of entity resolution is limited when it is only able to process data from structured records such as client records, watchlists, spreadsheets and other data formatted for machines. However, perhaps more than 90% of the world’s data is unstructured, meaning vital insights may be missed. When searching for “James Smith”, modern entity resolution technology needs to ensure that data sources such as news articles, websites and other notes also included, linking names as they appear adverse news (articles about corruption, bribery, fraud, terrorism or any other predicate offence) for instance, with names as they appear on watchlists. Natural Language Processing (NLP) is a field of computing that allows the automated analysis of large amounts of text content. It increasingly makes use of machine learning to allow computers to understand the intricate patterns and subtle semantics of human language by learning from the seemingly limitless quantities of text found on the internet. NLP can make sense of unstructured data and extract entities across multiple languages and dialects, which is essential in order to identify and link records wherever they may appear. 

Matching names in new ways  – Not only are names not globally unique, there is also no standard way of rendering them. Thus, James Smith can be Jim Smith, J Smith, J M Smith, as well as a huge array of possible typos, transpositions, aliases, or renderings in different dialects, alphabets and scripts.  Matching against “exact hit” names works when data quality is very high, but it means there are no alerts at all if names have even the slightest variation, increasing the chances of criminals slipping through the net. Similarly, so-called “fuzzy matching” which will alert if one or two characters are different, still cannot account for the sheer variety and array of cultural nuances in how names are rendered in different types of data. The solution is to use data to drive a new type of matching logic. Technology such as that developed by Ripjar uses observations from millions of names, deriving matching logic from how the name is used in real-world situations.  

Relationship Linking – No person is an island, and the relationships that an entity has with others give important context to analysts and investigators. Entities may relate to one another in a familial sense (father, brother, mother), or in the context of a business (owner, shareholder, person of significant control), or their location or address. Identifying these relationships vastly increases the likelihood that the person being searched for is correctly selected by the system, but many legacy systems do not extract relationships from the variety of data needed to give a complete and accurate picture – especially unstructured data. Extracting relationships at scale allows vast “Knowledge Graphs” to be built which can dramatically improve decision making and Entity resolution, providing a way of quickly analysing many different questions, from a single joined-up picture of entities and how they relate to one another.

Security and Privacy – The power of entity resolution means it must be governed appropriately. Processing personal data and connecting records effectively means safeguarding the privacy and security of those customers who place their trust in the institutions that administer financial systems or government agencies. Entity resolution systems therefore must also become tightly integrated with wider audit and data governance strategies – if entity records from two distinct datasets or systems become linked through smart logic, then the resultant resolved entity must inherit the security regime of each dataset that contributed. This means policies at the national or international level can be adhered to at all times, without compromising the effectiveness of the data analytics. 

Evolving Understanding of Identity – Real data is not just messy and incomplete, but it also evolves over time with new facts being added, or incorrect facts removed. Sometimes the addition or removal of a new strong identifying fact, for example a Social Security Number or a Passport Number can cause a new match to be made or, indeed, a previous match needing to be undone. To do this, entity resolution processes must store the history of matches and merges such that they can be undone in the light of new evidence which makes the previous assumption to be incorrect. Reconsidering the best possible match on seeing a new or updated piece of data also allows for the system to provide the same results regardless of the order that data is played into it. It is crucial that an entity resolution system is able to evolve to accommodate a changing landscape and correctly handle the uncertainty in the decisions it makes.

Conclusion

Entity Resolution is an essential capability in the fight against financial crime, fraud and terrorism. By improving the quality of the data that is used to make decisions such as enforcing international sanctions or alerting to possible corruption or fraud, it can dramatically improve the effectiveness and efficiency of human analysts and allow small teams to scale investigations to the demands of the modern information environment. 

Combining recent work in entity resolution and NLP means that analysts can now see the complete picture across structured and unstructured data, and data-driven approaches to name matching covering transliterations, scripts and other real-world name variants can give 90% more accuracy than legacy “fuzzy matching” technology. Robust data privacy controls mean interconnected graphs of knowledge, resolving entities from all available data sources can be now built without compromising user privacy or data protection. 

If you would like to know more about Ripjar’s approach and how we have helped global institutions roll out breakthrough innovations in entity resolution to support their counter-financial crime programmes, please download the whitepaper or get in touch with the team here

How AI is turning the tide in the battle against modern slavery

There is nothing modern about slavery. For as long as there has been a distinction between the powerful and the powerless, people have sought to take advantage of human labour, setting up the systematic exploitation of entire ethnic groups and vulnerable people. Despite hundreds of years of formalised abolition all over the world, slavery persists. It has adapted and evolved to survive – if not thrive – in the modern day. Today, organised criminal networks profit between $50-150Bn a year from the indentured labour of as many as 50 million victims worldwide.

This crime is closer than you think. In the UK, some estimates put the number of victims at 100,000 or more. These victims, trafficked into wealthier countries from overseas, often find themselves deep in our daily supply chain – in our factories and farms, or for luxury items like flowers or fashion. As the current healthcare crisis evolves into an economic crisis, criminals are already seeking to take advantage and find new victims.

To profit from this flagrant abuse of basic human rights, criminal gangs use an array of psychological, financial, and physical techniques to maintain a tight control on those in their employ. Preying on often vulnerable groups including homeless or substance-dependent individuals often means victims are not even always aware of their own victimhood – their captors seen as simply helping them find work and shelter.

Trafficking individuals from overseas further traps victims to financial debt to the gangs, or by a language barrier, not able to communicate effectively with those around them at work or to the police. Finally, an ever-present threat of physical violence against victims and their families is used to ensure compliance.

Modern Slavery: a board-level issue

This all makes detecting and disrupting this type of crime extremely difficult. Cases reported to the Police (5,144 in 2019) and the National Referral Mechanism (6,985 in 2018) are rising, but perhaps more than 90% of this type of crime goes undetected.

This is not just a matter for the police. Businesses, financial institutions, and government bodies must all work together to spot the red flags that might hint to an underlying concern of exploitation. Legislation such as the The Modern Slavery Act (2015) and the EU’s upcoming 6th Anti-Money Laundering Directive have all made this crime a board-level issue, but questions remain on the implementation, and who ultimately is responsible for its detection. Complex supply chains must be understood better, and organisations that inadvertently enable exploitation must all do their part if we are to hope to eradicate this type of crime for good.

Bold leadership, social policies and control frameworks will all be required to catch criminals and stop victims falling into the trap of modern slavery. A key enabler is being able to see the entire picture of a supply chain from all available data sources, but legacy technology and institutional stovepipes persist. The evidential trail of modern slavery, the data that could allow an elaborate international network to be completely unravelled often sits over many organisational boundaries. Enterprise analytics built to detect large scale money laundering and international sanctions evasion may not alert on the subtle, low-value payments made to a dozen migrant workers all sharing the same address. Fortunately, technology can now provide some vital support to companies, banks and governments in these areas.

The vital role of AI – 4 key areas

Artificial Intelligence is a breakthrough technology to help respond to the growing criminal threat. Advanced data analytics are now helping organisations automatically detect risk in their supply chain and customer base. It can scale their understanding of available data, joining the dots automatically to detect and prevent human trafficking and modern slavery. We are now seeing a step change in how entity resolution and natural language processing (NLP) are helping partners across the entire supply chain ecosystem make significant leaps forward in the detection of this pernicious and abhorrent crime.

We are seeing four key areas where our technology is now being deployed on the front lines of the fight against modern slavery:

Enhancing Due Diligence – Modern slavery relies on significant deception in acquiring legitimate enabling assets such as bank accounts, national insurance numbers and tax details. These allow money to be deposited, extracted and laundered, and having these legitimate identifiers avoids scrutiny by law enforcement and employers. Crucially, it enables criminals to place victims in well-paying jobs in the supply chain. Victims, who may not even speak the local language often lack the basic details that banks would be collected at on-boarding such as proof of residence address, phone number, email and other infrastructure – details which criminal gangs are happy to provide on their behalf and open accounts which they are in control of. Using analytics that can spot hidden connections between otherwise seemingly disconnected individuals means that next generation KYC checks can more reliably flag the signs of deception and exploitation and escalate to law enforcement if necessary.

Employment Vetting – placing vulnerable workers within legitimate employment is a key step in the modern slavery crime; the perception that victims are mostly paid cash or off-the-books is largely false. With legitimate assets and tax codes, victims can unknowingly earn tens of thousands of pounds a year while only receiving a few pounds per week on top of their food and shelter. Employment agencies and supply chain partners such as factories and warehouses are now employing their own due diligence based on the details provided by workers. Entity resolution – AI that can uniquely identify individuals from ambiguous and sparse datasets can detect the tell-tale red flags of exploitation such as unusual numbers of employees sharing the same address or bank details. 

Follow the money – The desire for wealth drives criminal behaviour. Money paid to victims needs to be interdicted by the gangs, extracted and then laundered so they can spend it on lavish lifestyles of cars, mansions and luxury goods. Transaction analysis within banks designed to catch money laundering often misses the small flows of money, taken from bank accounts in the victim’s name, often just with repeat visits to a local ATM. Banks provide the infrastructure from which modern slavery thrives. Behavioural analytics are now able to look across the network of accounts and their activities; combining contextual risk factors with transaction data to more easily spot these crime typologies and flag suspicious activity to law enforcement.

Complex Investigations  – police forces and other law enforcement agencies face an uphill struggle in piecing together data from a multitude of sources to identify suspects, victims and the infrastructure used in the trafficking and exploitation of victims. Data fusion technology, driven by platforms like Ripjar, are now allowing resource-constrained teams of intelligence analysts to more easily exploit data from any source – whether structured or unstructured. Natural language processing (NLP) and entity resolution combined with flexible link-analysis software mean investigators are able to build up a single, centralised knowledge graph for a case or network of criminal gangs – connecting the dots automatically between victims, suspects phone numbers, bank accounts, transactions, flight records or any other evidence collected during an investigation. 

Conclusion

The application of AI is a key development in the fight against modern slavery. It can automatically identify risks at any point in the client or employee lifecycle and help the entire ecosystem of employers, agencies, financial institutions and police forces understand the tell-tale signs of human trafficking and exploitation. Entity resolution, automatic prioritisation, natural language processing and data fusion all play a role in ensuring that relevant data is not missed and the links that criminals go to great lengths to hide are much more rapidly uncovered by compliance, risk and law enforcement analysts. Within a single platform, such as Ripjar, means these breakthroughs can all be harnessed while retaining full audit and accountability – bringing about a step-change in the way that modern slavery is detected and prevented.