Join the Community

22,587
Expert opinions
44,637
Total members
564
New members (last 30 days)
220
New opinions (last 30 days)
28,876
Total comments

The Data Structures That Drive Quant and AI

Our BizDev team at work asked me to explain why Data Management developed different cultures and languages to that of Data Science. I found this hard, in part because some of my background comes from engineering maths, which, for many years, was tightly coupled with the arguably adjacent Financial Services discipline of Quant, or Financial Engineering, terms which my work colleagues rarely engage with.

This matters because Quant and Financial Engineering, like Data Management, preceded modern data science by decades. Indeed, Data Science  became a thing only in the mid 2010s.  I've argued before that Quants are the original data scientists. Given the arrival of DeepSeek, essentially a firm populated by and supported by Quants, Quants have pedigree as modern AI engineers too.

So this Opinion ties together three distinct but interelated worlds:

  • Technical Computing which inspired compute and matrix-intensive Quant Finance and the neural networks enhanced by the AI godfathers, e.g. Geoff Hinton et al 
  • Data Management, dominated by tables and tabular operations
  • Enterprise statistics and data analytics, dominated by tables and tabular operations

I will also make a case, one which doesn't involve Moore's Law or cheap data storage, important though both were, why:

  • data science suddenly became a thing, with 2008 being a critical year.
  • unifying tables and matrix-based disciplines heralded the new golden age of AI
  • Quants continue to innovate

To help our BD team, I sketched this pre data science timeline blending the three disciplines, and added pictures of Ne-Yo, Pink & Taylor Swift which I will explain later. 

The Pre-Data Science Years

 

For folks like me without tabular backgrounds, here's a brief summary of the history of tables, and how they align to Quant, Data Management and analytics teams.

đź“ś A Brief History of Tables in Computing:

âś… Pre-20th Century: Mathematicians and statisticians used tables to categorize and organize data, as early scientific records, accounting ledgers, and statistical tables.
✅ 1950s–1960s: Early computing structured data in punched-card systems (IBM) and hierarchical databases like IMS (1966), but they lacked the flexibility of relational tables.
✅ 1970: Edgar F. Codd brought the relational model to data storage, introducing tables (“relations”) to modern databases and data management.
✅ 1974–1979: Relational databases (IBM System R, Oracle) used structured tables for enterprise computing.
✅ 1976–1993: Programming languages embraced tabular data:
The SAS Programming Language introduced structured data step tables.
The R Programming Language (1993) used data frames — essentially tables.

SPlus, commercially supported and based on R, was popular in Quantitative Finance in the late 1990s, while SAS prevailed in enterprise risk analytics, credit risk and risk-based decisioning. All were popular in university statistics departments, and in decision sciences teams in biotech, pharmaceutical and chemical organizations.

Meanwhile, my matrix-based language, MATLAB, prevailed in Financial Engineering and Quantitative Research, particularly for option and derivative pricing, and for prototyping, and in production too, on then emerging proprietary trading desks in capital markets.

Why? Well these teams employed matrix algebra-literate engineers and applied physicists, while risk and analytics functions tended to hire table-familiar statisticians and mathematicians. Some departments featured both, e.g. buy-side portfolio research teams, or econometricians. This meant good-natured battles between statisticians highlighting table convenience and engineers highlighting matrix computing power. I use the word power because matrixes performed well for compute-intense operations, e.g. Principal Components Analysis,  regressions, simulation, neural networks/AI, optimization, time-series operations, and much much more.

Therefore, matrix algebra quant applications included:

  • Stochastic Monte Carlo simulation, including for option and derivative pricing
  • Portfolio theory, in particular mean-variance optimization which drove the buyside, highlighted by Nobel prizewinner William Sharpe but leveraging the work of Harry Markowitz
  • Macroeconomic modelling which borrowed from control and systems engineering to develop state-space, equilibrium and DSGE models
  • Stochastic Asset-Liability simulation and associated financial products, for balance sheet cashflow modelling in pensions, long-term investing, and insurance
  • Backtesting and trading strategy development for systematic hedge funds and Prop desks
  • Value at Risk (VaR) simulation, and other risk types that simulated large portfolios or capital-at-risk, often over longer time-horizons, e.g., market (e,g, CVaR), credit (LGD & PD calcs), counterparty (simulation or Adjoint Algorithmic Differentiation (AAD)), operational risk (e.g. Change of Measure).
  • Economic and Risk Scenario Generation, i.e. simulated, synthetic data.

What Happened in and after 2008?

Ne-Yo’s up-tempo melodic song, Closer  with follow-up Miss Independent, alongside Pink, at her musical peak, and Taylor Swift, still singing Country, dominated the pop charts. The credit crunch hurt. Its regulatory impact will make a brief appearance at the end of this opinion.

However Wes McKinney, a hedge fund data engineer-come-quant at AQR Capital Management introduced the open source tabular-based pandas (Python Data Analysis) library to the Python programming language.

Python long preceded McKinney's pandas. A functional language, it originated in the early 1990s, becoming popular for unit testing scripts. Only when Travis Oliphant, who appreciated Python’s simple, understandable programming language, delivered SciPy in 2001 and NumPy in 2005 did it enter mathematics and engineering, leveraging matrix algebra libraries like MATLAB had prior.

In 2008, however, Wes McKinney brought pandas to Python, and thus tabular convenience to the matrix libraries of Travis Oliphant's NumPy and SciPy. 

Now data science could take full effect, with tables and matrices in one unified open source programming language, Python servicing statisticians, data engineers, quants and financial engineers. New tools drove community growth further, e.g., reproducible Jupyter notebooks, scikit-learn for machine learning, and PyTorch, Keras, Tensorflow, and  other deep learning libraries driving the new transformer technologies that underpin modern AI and LLMs.

Data Science Unifies through Pandas

Data Science Unifies in the 2010s with Pandas

Fast forward to 2025.

With vector databases, graph structures, and AI-driven data processing, will tables remain so influential?

Well, matrixes and vectors will continue to power the engine of AI. However, as someone working with graph technologies, I see contextual benefits relationships of graphs, built on matrix algebra (as sparse matrices) and evolving the convienience of tables. Quoting Tony Searle, the so-called Knowledge Graph Guy, "a customer isn’t just a database row; they’re linked to past purchases, support tickets, email exchanges, written notes, social sentiment, and pricing preferences. An insurance claim isn’t just an entry - it’s tied to policy details, vehicle history, repair records, and similar cases. This isn’t about storage - it’s about making sense of complexity at a scale that rigid databases and APIs simply can’t match." I agree.

Yet revitalized by Parquet, Arrow and Iceberg formats underpinning the so-called lakehouse and new streaming analytics ecosystems, tables are here to stay too.  

We in FinTech have much to celebrate in driving and governing AI.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Join the Community

22,587
Expert opinions
44,637
Total members
564
New members (last 30 days)
220
New opinions (last 30 days)
28,876
Total comments

Now Hiring