Community
Accuracy measures the degree to which data represents the real-world entity in this case a person, organization, or Source from where we know it has originated.
Measuring data quality is possible if the characteristics of its physical and contextual existence are well understood. These characteristics of data can be translated into data quality rules and data out of this range can be termed an exception.
In most cases, the chief data office, will be able to apply for changes to make data correct. While it is possible to measure dimensions including Validity, Integrity, Consistency, Completeness, Coverage, Timeliness, and many other sub-dimensions; measuring Accuracy is not that simple.
When an application form is filled by a person for a product with a Financial Institution, it is deemed that the information the individual has provided is accurate, as it is coming directly from the person. The degree of accuracy can be measured by having to check with the person who filled the form. This can also happen by auditing it against a third party held information which is often not cost-effective. To identify scenarios when data should be checked with the real-world entity is possible through Machine Learning Modeling.
Traditional data quality profiling, rules management, and statistical control methods are based on analyst’s skillsets as well as previously established business rules. This often limits performance in addition to being a very time-consuming process.
With the vast amount of data being wrangled to give commercial outcomes and viability by models like cross-sell, Fraud detection – the accuracy of data is a dimension that needs to be stressed on.
Outlier detection for accuracy of data is based on statistical quality control mode. A deep learning model outputs a predicated value when it reads input data. There are many algorithms for the predication of data in a specific context like Income. While comparing the Backpropagation network, Convolutional Neural Network (CNN), Support Vector Machine (SVM), and Regression models, testing results show that the Backpropagation model is a good choice for measuring accuracy.
An approach to identifying accuracy of Income is by contextually analyzing the associated business terms including “Organization, ‘Role’, ‘Role Type’, 'Position Title', 'Service start Date', ‘Gender’, ‘Address’, ‘Travel percentage’, ‘Role start date’, ‘Organization Type’ to start with. The same model can be leveraged for fraud identification in lending as well.
A six Layer Backpropagation network was modeled that includes 10 features states above with six layers that include four hidden layers and one output node. The model gives a promising outcome with higher co-relation, with a minimum mean absolute error and root mean squared error.
These Forms filled by applicants can be given over to operations, that can involve third party verification or audit to get to more accurate information. The role of data stewards cannot be less stressed in building accuracy models, to get to understand the context of these business terms.
This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.
Joris Lochy Product Manager at Intix | Co-founder at Capilever
31 December
Carlo R.W. De Meijer Owner and Economist at MIFSA
30 December
Prashant Bhardwaj Innovation Manager at Crif
29 December
Kaustuv Ghosh CEO at Nxtgencode
Welcome to Finextra. We use cookies to help us to deliver our services. You may change your preferences at our Cookie Centre.
Please read our Privacy Policy.