Community
Last week, the 58th Annual Meeting of the Association for Computational Linguistics took place online. Most of the discussion was dedicated to AI and linguistics, with round tables based around cognitive and computational building blocks for creating more natural and human-like language in machines. Events like this are a valuable source of unique insights for every data scientist who wants to keep in pace with the latest AI-NLP advancements. And one of the gems to share is especially valuable for RegTech in the field of horizon scanning.
The whole industry is putting in tremendous efforts in order to build an automated expert system, which is able to read, classify, link, and store every related regulation, news, or updates being published. A system that is ready to instantly provide and link related information for any query. While most of the developments being proprietary are accompanied by marketing speculations on what they can and cannot do, we are lucky to have one of the public ACL papers. It highlights recent US military developments sponsored by the DARPA agency aimed at building a knowledge extraction system in the field of social and military events described in the text and video data. Being involved in R&D for AI in financial services, I can see a great future for the application of these results in RegTech. Successful military developments have always been breakthrough in their nature: they give us an opportunity to drastically improve everyday life by putting significant resources towards innovative research and achieving consistent results. For the list of most famous DARPA agency achievements, have a look here.
In this article, we unbox the military Skynet-like system. The article does not require technical knowledge. It will be helpful for anyone who organizes and manages document processing in the organizations and who is interested in seeing what is proven to be possible with the current technology stack and what the limitations are out there. The source article is available here.
A groundbreaking new system
In RegTech, we have a dream of creating an ideal system for horizon scanning that automatically scans all the new regulations, links them to the older statements adds classifications and analytics for different use-cases. Even a brief look at this new military system that monitors the news and tracks social media reveals that insurers and bankers can use this system as a basis to create the same kind of AI-based SkyNet for horizon scanning in compliance:
Put simply, the system extracts the relevant facts with their locations and time and creates connections between those entities. You can filter events by type and when you click on that type, you will see a list of connections and detailed information on all the participants. For example, you can check who was financing an event. You can see who supported it and who opposed it. You can click on a person and trace the information on who they may receive money from. One of the first interesting results of this development started to be available in 2018 with ELISA visualization, where events were pinned to their location on the interactive map.
Building a Skynet
So, we’re coming to the vital question: how do you create this kind of Skynet?
It’s not an easy task.
For military analytics, officials releasing news notes, and politicians in the long run, it is crucial to stay on top of world events without having to spend hours analysing them in order to extract the essence. These are people who are involved in making vital decisions on projects. They need to be able to scan the news field quickly and extract the relevant information easily.
The same needs and processes are relevant for compliance. Financial institutions need to stay on top of regulations as they change. They need to be able to anticipate and update their internal documents proactively or change decisions that have already been taken, constantly re-evaluating the risks in real-time. With remote work the ‘new normal’ and changes happening even faster, it’s become even more important for them to understand in which direction they should be advancing technologically. They also need to know exactly what tech architecture they need to apply to stay efficient in this new reality.
Military developments allow us to understand the next evolutional steps for this type of system. This is in fact a working prototype of a similar kind of AI-based horizon scanning system, involving a number of ML models which together form ‘an AI-based system’:
Technical details are out of the scope of this article, but please, do leave a comment with questions if any.
This ‘AI-based system’ – let’s call it AI Skynet – is built after all the engineering works have been completed in accordance with the scheme (see the Scheme above) . It’s important to understand our output, our input and what the main steps are. The system as a whole is designed for absorbing, categorizing, indexing and creating base analytics for everything that’s happening in data. This new military development provides excellent insights for compliance and what horizon scanning might look like in the future.
How does it all work?
AI Skynet architecture can seem a bit complicated at first glance, but it is easy to see two main data flows - Visual data processing and Textual data processing. Let's discuss the textual one since it's more related to RegTech and horizon scanning. The majority of the blocks represent separate deep-learning ML models that are already in place, using the generation of neural networks someone may call old. ELMO – LSTM – CRF were some of the top models until the moment when BERT appeared in 2018 and the direction of machine learning models development shifted.
In terms of computational efficiency, this architecture looks compatible with the tasks assigned to it (see the Scheme). It’s analogous to what we’re doing with P2P links, where you upload a regulation and link all the key points in it, applying ML. ClauseMatch uses out-of-box relation extraction: it’s general by default and then we fine-tune separated ML-models for clients with the specific use-cases if any.
The AI Skynet system will automatically show all of the knowledge elements connected with this event (military conflict, to be precise), so there’s no need to deep dive into all of them to understand and scan what’s important. And it’s the same with regulations. That means a compliance officer doesn’t need to dig into each and every change that’s been happening. Instead, he takes a separate slice and sees the whole picture with each of the key elements automatically highlighted for his attention. Now we can go through the main data-processing steps presented.
The first step is the extraction of the relevant entities. These can be certain regulations, rules, or the name of a regulator, a type of product, mitigation, permission, and so on. A separate ML model facilitates this process.
The second step is a coreference resolution. Again, there’s a separate model involved in this process. This is designed to build a coreference to indicate that, for example, New York and NYC is exactly the same entity. Or, if we refer to the world of regulatory compliance, it could be, for example, the Securities and Exchange Commission and the SEC, or the Financial Conduct Authority and the FCA, or Markets in Financial Instruments Directive MiFID. With the Textual Relation Extraction as the next step, you then need to understand the connection between different mentions in the text.
All of these connections are then put inside a Knowledge Base in the form of triples that’s separate for each field. This base will be used to extract certain events in connection with the timeline and location. In relation to regulations, it will be jurisdiction-based on a timeline.
It’s as simple as three standard steps: objects are extracted, then identified; then linked, connected. Finally, all of this ends up in a Knowledge Base designed to be queried for recommendations.
These are the main blocks that "AI Skynet" consists of. I’m going to refer to their analogs in my practice at ClauseMatch in brackets, as we’re working with some of the leading top-tier financial institutions, as well as regulators.
Named Entity Recognition (status: done, for new clients, may need manual labelling)
Coreference resolution (status: done, for the needs of a certain client may require taxonomy)
Relation extraction (status: done: requires taxonomy, may need client manual linking example)
Some kind of ground truth or Knowledge Base (it is possible to extract it, may need manually labelling for a specific use-case)
Task-specific module (status: done. obligation extraction for regulators)
API for search across Skynet results (status: done. open-source Graph DB used as a back-end)
So, how can this development for military analytics be relevant to financial services compliance, compliance officers and regulators?
Let’s imagine a financial services regulator building and applying this kind of system. How would they do this? It would actually be an analogous scheme built for different types of ‘objects’: financial institutions, products, activities.
For a financial services company, these will be regulatory obligations aligned with financial activity, product(s), etc. New and upcoming obligations will be gathered and indexed in real-time. Thus, financial firms will not be overwhelmed and flooded with all of their obligations, as the system will allow them to step up to a significantly higher level. Using this approach, financial institutions can react proactively in advance, making this Skynet a promising scheme for horizon scanning.
A regulator could also potentially have all of their data in this kind of Skynet. This is also important because each regulator is working with other regulators. So, using this system would help them to achieve consistency and form united Knowledge Base which will serve as a ground truth.
We have actually tested this kind of system at ClauseMatch, as we allow regulators to store all of their documents in our ‘Skynet’. Our ML-models' quality is above 90%. It’s vital to have a platform where you can place all of your data and have structured storage for all of the different objects. To make it work, your experts need to discuss their main goals and tag specific objects in the text. We then tag and link everything and the outcome of the knowledge graph. A new regulation --- > automatically will parse obligations out of it to create a knowledge graph. ClauseMatch is in fact an advanced option for a system like this (we’re already using more advanced architecture for tasks than this system is able to take on).
To sum up, this development by the Information Sciences Institute, Columbia University and US Army Research Laboratory can be seen as a kind of blueprint, showing you all the necessary steps as you build this type of Skynet for your compliance team. Crucially, adopting this system will allow you to be proactive because it enables you to stay on top of both existing regulations and upcoming changes.
Hit the like button and let me know that the material is interesting and I will prepare some more.
This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.
David Smith Information Analyst at ManpowerGroup
20 November
Konstantin Rabin Head of Marketing at Kontomatik
19 November
Ruoyu Xie Marketing Manager at Grand Compliance
Seth Perlman Global Head of Product at i2c Inc.
18 November
Welcome to Finextra. We use cookies to help us to deliver our services. You may change your preferences at our Cookie Centre.
Please read our Privacy Policy.