Community
"Data is the new gold" is a quote often heard, especially in the financial services industry. This sector deals with enormous amounts of extremely valuable data about their customers and handles purely virtual products or services. Thus, data is an essential part of the financial services business.
While the value of data is undeniable, data is not a scarce commodity, unlike "gold." Enormous amounts of data are produced daily. For example, the world created around 118 million petabytes in 2023 and is expected to create 463 million petabytes per day by 2025 (for reference, a single petabyte is one million gigabytes). Although storing and processing these vast amounts of data is complex, the true challenge lies in converting this raw (almost infinite) data into actionable insights. This is where "Big Data Analytics" comes into the picture.
Data analytics is no longer a nice-to-have for businesses. With financial services companies becoming primarily digital, data-driven decision-making is becoming the norm, helping executives make evidence-based decisions instead of relying on guesswork. However, this requires complex data pipelines and analytics tools to bridge the gap between raw data and valuable insights. Such pipelines form complex journeys for data:
Capturing and Collecting Raw Data: Data needs to be captured. In many cases, the data is already captured, but sometimes additional capturing and storage might be needed. For example, not only storing the current version of data but also all historical changes to data and/or capturing meta-information, such as who made which change, when, and how (via which channel, screen, IP address, OS, etc.). Usually, data is captured by different systems (silos) in various formats and technologies. Centralizing all data in a data lake can be extremely useful for the subsequent steps.
Inventorizing the Data: If you do not know data exists, you cannot use it to generate insights. Therefore, proper inventorizing of data is required. Creating data catalogs is a crucial aspect of data governance and management. Think of a data catalog as a well-organized inventory of data assets. However, manual cataloging is error-prone and time-consuming. GenAI helps overcome this shortcoming with AI-driven data curation and cataloging. It recognizes correlations and relationships between data sets and automatically categorizes and tags them. GenAI-powered data catalogs can offer self-service capabilities with chatbot-style interfaces, facilitating seamless data discovery. Automated cataloging also helps maintain data consistency and integrity, which are crucial for data management.
Cleaning and Structuring the Data:
Converting Unstructured Data into Structured Data: A lot of valuable data is unstructured (e.g. text documents, pictures, movies, voice recordings), containing a wealth of interesting information. Structuring this unstructured data into some kind of model is critical for data analytics. Techniques such as OCR, automatic tagging, pattern recognition, voice transcription… can help achieve this.
Filtering Out Irrelevant Data: Ensuring only pertinent data is processed. This can include techniques like selecting the right time frame or the right customers, filtering out duplicate data, removing noise…
Cleaning Data: Identifying errors and anomalies in the data and correcting them. This can be done via techniques like automatic correction algorithms, manual actions, comparing different sources and automatically taking the majority vote…
Data Modelling: Even structured data is not always consistent. For example, different payment messages at a bank may be structured differently (e.g. some in SWIFT MT format, others in ISO 20022, or proprietary formats). Therefore, mapping towards a common model is required to ensure consistent semantics and syntax for all instances of the same object type (e.g. same date/time format, including the expression of time zone, same number format, consistent set of enumerated values, same terminology…). This allows a uniform view of similar data from different sources, addressing consistency and quality issues.
Data Augmentation: Once data is structured and modeled, new data (or specific views) can be derived from existing data sets, allowing for easier and more efficient data use.
Analyzing the Data: Once we have a uniform, structured view of all the data, we need to analyze it by slicing and dicing it over multiple dimensions (like time, value, customer segment…) and visually presenting aggregated information in dashboards. This process can be time-consuming, as generating these dashboards often requires setting up complex database queries. GenAI can extract meaningful insights from data using the right text-based prompts, identifying correlations and hidden patterns within the available information.
Interpreting Results: From the dashboards, correct conclusions need to be drawn. This requires advanced business insights and a good knowledge of statistics, as some conclusions might seem obvious but might not be statistically relevant. This is a common error within companies, as statistical measures like variance on results are rarely shown in management dashboards. Insights derived from the results finally need to lead to concrete actions.
Implementing Insights: The defined actions should be implemented and followed up. Ideally, analyses should be regularly re-executed to see the effects of the actions. In business, it is not possible to isolate a specific action from everything else happening (like market evolutions, competitors' changes, other internal changes, employee transfers). Using techniques like A/B testing, we should try to isolate the applied action’s effects as much as possible, allowing us to identify the positive or negative impact and make quick adaptations if needed.
AI plays a crucial role in analyzing data by identifying patterns that are not straightforward for humans due to the size and complexity of the data sets. However, AI requires enormous amounts of high-quality data to train its models. Therefore, good AI use cases require that the above-defined data pipelines already exist.
Luckily, as indicated above, AI (and specifically GenAI) can also help set up those data pipelines. GenAI can help clean and structure data, understand natural language queries, and turn those questions into reports and answers.
This makes the work currently requiring highly skilled data analysts, data engineers, and data scientists accessible to anyone, completing the "Big Data revolution". However, caution is needed because interpreting data can be tricky for both humans and AI. Our human brain (and as AI is modeled on our data, often AI models as well) tends to make statistically incorrect conclusions. Additionally, a good knowledge of the business is still essential to ensure the data is correctly structured and interpreted.
For example, in the financial services sector, a simple payment comes with different amounts (e.g. transaction currency or base currency, with and without costs), different dates (e.g. initiation date, processing date, settlement date), and different involved parties (e.g. payer, payee, payer institution, payee institution, agent of payer, intermediaries…). Using the right field in the right context is crucial to making the right decisions. Handing this over to any employee in combination with AI is likely to result in incorrect insights.
Therefore, specialized tooling that combines advanced analytics with deep business insights is likely the future.
For more insights, visit my blog at https://bankloch.blogspot.com
This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.
Jamel Derdour CMO at Transact365 / Nucleus365
17 December
Alex Kreger Founder & CEO at UXDA
16 December
Dan Reid Founder & CTO at Xceptor
Andrew Ducker Payments Consulting at Icon Solutions
13 December
Welcome to Finextra. We use cookies to help us to deliver our services. You may change your preferences at our Cookie Centre.
Please read our Privacy Policy.