In the battle between global criminal gangs and the world of international espionage and intrigue, you are likely to imagine a hero, perhaps James Bond, at the bar of a 5-star hotel ordering his signature martini, travelling across far flung and exotic locations to find out critical information and conduct his vital intelligence work.
You are perhaps far less likely to imagine our hero, dinner jacket and all, sat behind a desk, scouring over stacks of newspapers in all the languages of the world, extracting clues and intelligence about the threat the world faces.
Yet, this is precisely how significant portions of intelligence have taken place throughout the 20th Century and have continued to evolve into large scale data analytics well into the 21st. In the run up to World War 2, the British government set up the Foreign Research and Press Service (FRPS) to review and understand open-source news published from across occupied territories (received via neutral countries) revealing a remarkably coherent picture of intelligence vital to the war effort. In 1943, it was formalised into FORD – the Foreign Office Research Department – a full blown intelligence agency that sifted the foreign press, concentrating on political and administrative subjects, for intelligence – past and current – to make inferences for Churchill and his policymakers to help the war effort.
Meanwhile over in the United States, the enigmatically named Office of Strategic Services (the prototype for what was to become the CIA or Central Intelligence Agency), were also combing through foreign newspapers and radio broadcasts for clues that could help the allies in their understanding. Despite restrictions on the press, especially in occupied countries, there was still immense value to be found if one looked hard enough. Speaking on this, William Donovan, the director of the OSS and founding father of the CIA wrote shortly after the war “even a well regimented press will again and again betray the national interest to a painstaking observer”. Citing success stories such as a society column in a local German newspaper inadvertently revealing the location of an army division they had been seeking, and another report which confirmed to the allies the existence of German submarine oil tankers even replete with a photograph of the tanker refuelling a submarine or U-boat at sea.
Over 75 years later, the value of open-source information and intelligence (OSINT) is now well understood within the financial sector too. Breaking news can indicate the impending collapse of a stock, or presage the merger or acquisition of successful (and some not so successful) companies. Crucially, it may also give leading indicators of warning – allegations of bribery and corruption, or the arrest, trials and allegations of the type of predicate offences that generate volumes of illicit wealth – embezzlement, narcotics, human trafficking and fraud.
Thus, financial regulators and the anti-money laundering community have a strong consensus that exploiting open-source news data is a critical step in assessing risk. Unlike the spies of the 1940s however, searching millions of articles of news has become as simple as typing a client’s name into a search box for so-called ‘adverse media’ or ‘negative news’. This largely well understood process for banks to add an additional step during the due diligence process and reduce the risk of on-boarding a potentially criminal entity.
The Signal in the Noise – Challenges of Adverse Media Screening
When conducting financial services on behalf of a person or organisation, the question may sound initially straightforward – have there been any media articles published which indicate that the client is involved in criminal activity or otherwise presents a risk to operations?
In practice, complexity emerges. From the volume of media reports available, the presence of poor quality and irrelevant data, the difficulty of resolving unique entities and the need to preserve privacy and accountability.
We see five key areas where compliance teams struggle to make sense of the growing deluge of data:
The growing scale of data – Ripjar’s access to news data from a variety of different data providers show the scale of the challenge. Since 1990 more than a billion news articles have been published and this is growing at a rate of 3-4 million per day in 2021 across thousands of publications in dozens of languages. Only a small fraction (less than 1%) of these articles will be relevant to the interests of anti-money laundering professionals – stories that relate to bribery, corruption, embezzlement, fraud and other risk topics.
Making it relevant – sifting through this mountain of data, even with modern search engines is fraught with ambiguity. Client names are often non-unique and search engines – optimised not for intelligence work but for commercial interests – typically favour popularity and recency over depth and historical completeness. Thus, the challenge of entity resolution (see our blog on that topic here) becomes critical to reduce the likelihood of false positives where too much irrelevant data is returned, and false negatives – a more dangerous case where relevant data on a risk or threat is missed altogether.
Keeping on top of the news – If only conducted at onboarding, negative news screening – even if done accurately – only presents a snapshot in time. With millions of new articles published daily, keeping on top usually requires periodic review and remediation exercises, but with resources stretched, often clients may go many years before another check is made to see if any adverse articles have been published.
Post-Truth and Fake News – While the digital age has given society unparalleled access to information about the world in which we live, it has increasingly polarised that information to such an extent that news may indeed present a doctored or biased version of the truth, or outright fabrications entirely. In the last 5 years, trust in media has been eroded, with a recent study in 2020 showing that over a third of UK adults trust the news less than previous years.
Security and automation – Concerns around data integrity and privacy are paramount. Client details may include some of the most sensitive data an institution has available to them. Yet, online databases of news articles and search engines exist in cloud-based regimes, with opaque data-retention and security models. Therefore, typing or uploading a client’s name into an online search box transmits data to a third party, often outside of the country, with no audit log of the search ever being conducted by the bank. Establishing a rigorous governance model around such searches can be exceptionally difficult and labour-intensive.
The Future of Adverse News Screening
While it was the secret intelligence deciphered at Bletchley Park by mathematicians like Alan Turing and early computer scientists Bill Tutte and Tommy Flowers that went on to inspire numerous books and movies, the open source intelligence analysts contributed hugely to the overall picture. Bletchley Park code breaker and official post-war historian of British intelligence Sir Harry Hinsley noted that ‘of the total number of reports (by the Enemy Branch of the Ministry of Economic Warfare) some three-fifths were based upon the Press, broadcasts and official statements’.
That being said, the cryptographic successes at Bletchley Park could not have been achieved without the technological breakthroughs that accompanied the mathematical ones. Automation of codebreaking using some of the world’s first computers was essential to break down the large number of possibilities into something a human could attempt to assimilate.
From the vantage point of the 2020’s, the intelligence analysts of the UK’s FORD and the USA’s OSS teams would likely scarcely believe that such automation could now be applied not to the mathematical structures of Enigma and Lorenz ciphers, but of human language itself, the written text as it was broadcast across a future global internet that would not be conceived of until decades later.
This type of automation – artificial intelligence (AI) – has been core to the approach we have taken at Ripjar to understand the millions of news articles that enter our database every day. We apply Natural Language Processing (NLP) to read the news like a human would, except tens of thousands of times a second; in doing so, we’ve created a brand new approach for looking for criminal and terrorist activity in news data and reporting that is more efficient and effective than legacy technologies.
NLP turns the torrent of data from global news outlets into a focused beam of articles that only relate to risks such as fraud, corruption or human trafficking even if it is originally written in Chinese, Russian or Polish. This removes huge amounts of noise in the data, from sports reports, movie reviews, horoscopes and other miscellany that fill column inches, but usually fill searches with false positives and useless distractions – hiding valuable insights.
Another type of AI – Name Entity Recognition (NER) then extracts and matches the names of people, organisations and locations in order to detect and alert if a client name appears in the news. Rather than wait for an analyst to search for a name in a search box, autonomous systems like Ripjar’s pre-emptively finds information of value and brings them to the attention of a compliance or intelligence analyst.
Lastly, the future of screening requires strict adherence to policy and control frameworks, how data was used in which locations, and how it was accessed and retained. Data governance, particularly when artificial intelligence is helping to sift through data more effectively and efficiently helps regulators and decision makers understand why decisions were made, at what time and with what evidence.
If you’d like to find out more about our vision for the future of adverse media screening you can download the whitepaper here, or get in touch with us for a discussion about our unique technology here.