In the new digital age, data is currency. For technology, software processing, cloud computing, and everything beyond, data is essential. However, with the boom of new technology and the emergence of artificial intelligence (AI), the sheer amount of data
available has become overwhelming. In the banking sector, companies need to be able to organise, evaluate, and assess this data effectively and efficiently. To do so, the data needs to be accurate and usable – which is not always the case.
To better understand the processes of data usability and the development of generative AI (GenAI) products for the banking sector, Finextra spoke to Adam Kamor, co-founder and head of engineering at software firm Tonic AI, Richard Harmon, global head of
financial services at Red Hat, and Matthieu Hafemeister, co-founder of Concourse.
How does data usability factor into AI and software development?
Kamor explains that AI tools can be used all over the banking sector, in fraud detection systems, chatbots for customer assistance, loan underwriting processes, and compliance reporting. Data usability impacts the development and implementation of AI as
none of these tools can be active without detailed and accurate data. To ensure that AI tools are equipped with secure data, data privacy approaches such as data de-identification or data synthesis are essential.
Harmon calls usable data the “backbone” of AI and software development in banking: “Clean, accessible, and well-governed data ensures AI models are not only accurate but also explainable — a critical compliance requirement in financial services. Without
usable data, institutions face increased development times, suboptimal AI performance, and heightened risks of non-compliance.”
Harmon further details that usable data accelerates developments, leading to faster deployment of products, and can offer real-time data when needed. Moreover, organised and clean data makes it easier for data scientists to focus on where they are needed,
using AI models on datasets.
Hafemeister states that the biggest challenges firms face regarding data usability is fragmentation. He highlights that the siloed system of data storage makes it difficult for AI models to access and analyse relevant data, which leads to issues of data
quality, consistency, and availability.
“To improve data usability, financial institutions should prioritise deploying data integration platforms and data lakes that can consolidate data from various sources into a single, unified repository.”
Harmon continues: “To improve usability, financial institutions can deploy technologies such as data fabric architectures, which provide a unified view across silos, and modern ETL (Extract, Transform, Load) pipelines for real-time data processing. Tools
leveraging AI-driven data cataloguing and governance also play a critical role, enabling teams to discover, standardise, and secure data efficiently. Moreover, implementing synthetic data generation can also mitigate privacy risks while providing teams with
rich datasets for AI training.”
He further adds that the ‘Data Mesh’ principle allows for decentralising data to domain-specific ownership, “where each team or department treats its data as a product.” This strategy allows for consistency while making the datasets accessible to engineering
teams.
Kamor explains the operating process of RAG systems to detail the essential nature of data quality: “The RAG system is essentially backed by two components: Your vector database and your large language model. The vector database stores all of your company's
information, the way that data is inserted into this database, and the way it's formatted and kept really has a strong effect on the quality of answer that is given.”
Hafemeister says that LLMs require massive amounts of high-quality data to perform accurately, and this is where enterprise data can provide domain-specific knowledge. For LLMs to be effective in finance, data has to be specific, detailed, and accurate.
He furthers: “99% accuracy of an LLM output is 0% accurate in finance, so making sure the data is cleanly structured is critical to making sure every output an LLM provides is 100% accurate. Enterprise data also enables LLMs to tailor their outputs to the
specific needs of a team, whether that’s analysing financial statements, creating financial reports, or answering detailed questions. Without access to clean, structured enterprise data, the impact of LLMs is severely limited.”
How are RAG and AI tools providing solutions?
Kamor details how RAG tools are used to sort through large amounts of information: “RAG is just a technique to leverage a large language model (LLM) with your enterprise data. A RAG system is sending the LLM just the relevant chunks needed to answer your
question. If you're an enterprise, you have millions of documents floating around. Every time I ask a question, I can't send every document I have from the large language model, right? You can only fit so much in the LLM. You have to send the bits of information
that are relevant to what's being asked, and the RAG system essentially does that for you.”
Harmon comments: “A firm’s enterprise data is extremely valuable to maximising the potential of LLMs since these very broad models will be fine-tuned to the specific customer, product and market conditions the firm operates in. It is this firm-specific
aspect that is critical to customise model performance for that institution. In this respect, organisations can also look at using smaller models with domain-specific data that are purpose-built for targeted tasks.”
Enterprises use RAG systems to manage the data within LLMs. LLMs are not able to handle the scale of data and extract the relevant information at the degree at which RAG systems can. Kamor highlights that LLMs unlock new experiences for customers, efficiencies
for employees, streamline software processes.
Kamor cites that compliance is a top priority in heavily regulated industries such as financial services, therefore when GenAI is integrated for the benefits of auditing, external reporting, and automating laborious processes of compiling data and metrics
into standardised reports, it must be implemented in a way that complies with regulatory requirements. While GenAI is a helpful tool for streamlining processes that can aid with compliance, it is also crucial to achieve compliance within the use of GenAI technology,
in particular when it comes to the compliant handling of sensitive data.
“Banks must ensure that they are maintaining compliance and respecting regulations. For example, banks must ensure that AI systems comply with data privacy laws like GDPR. AI systems must be architected to handle and protect sensitive customer data, ensuring
that no PII is exposed. On the AI model side, banks must ensure that models have been rigorously tested and tuned to balance bias and fairness, proper model governance measures are in place, and AI-driven decisions for things like credit scoring or customer
interaction are explainable and auditable.”
Kamor pointed out that the main challenges of building RAG systems are data quality issues and privacy concerns. Tonic.ai is addressing these issues to ensure that sensitive data is detected within unstructured sources and effectively de-identified or synthesized
for secure use in AI development and implementation.
Kamor furthers that data redaction addresses privacy concerns by removing sensitive information from documents before feeding them into the vector database, enabling free-text data to be leveraged without revealing sensitive information. Harmon corroborates
this point, highlighting that data redaction is effective for protecting sensitive data.
What’s next for GenAI?
Kamor states: “The future of GenAI in banking is all about deeper personalisation, seamless AI integration, and maintaining a strong ethical foundation. We’re going to see AI systems that not only predict customer needs but also assist executives in making
data-driven strategic decisions. As AI continues to integrate into every channel, from apps to ATMs, the focus will be on creating consistent, personalised experiences while ensuring that AI-driven decisions are fair and unbiased. Of course, all of these use
cases are built upon a solid foundation of data that banks possess about its customers, business, industry, and broader economy.”
Harmon expresses that one key area where GenAI is evolving is in agentic AI systems which include autonomous agents that complete tasks without human intervention. He states that this form of GenAI is more proactive than reactive like other GenAI tools,
as it executes actions without the need for, or with minimal supervision.
Hafemeister adds that GenAI and LLM are revolutionising software to where it can accomplish more than ever before. He states that AI agents are pumping out more accurate, and higher-value work which can positively impact productivity in banks.
He concludes: “Banks that fully embrace this shift will be ahead of the curve, enabling them to scale operations efficiently. The value of software will be measured not just by how much it improves employee productivity, but by how it enhances overall business
performance and agility. As AI agents become integrated into everyday workflows, they will become indispensable partners in driving innovation and achieving competitive advantage.”