Blog article
See all stories »

Understanding Big Data Quality for Business Decision

 Nowadays, Organization extensively started to utilize massive data sets for the purpose of analysis. Big data has become innovative but in the context of Data quality vs Data governance is still raises question for the organization. The complexity of big data quality, conventional definition of data quality, meaning and motivating factors of big data quality, strategic plan and tactical adoption for big data quality and characteristics of solutions that not only adapted to big data platform but also facilitate the transition to use big data platform were some of the special features of the big data quality. Three core point of the data quality are 1. Structured and Unstructured data sets can be used based on the different user requirements and same data sets can be reused for different outcome. 2. When data is migrating from one source application to different target application, data can be recognized as error or inconsistent with target application. 3. Historical data lifetime can be extended by applying validation and with data governance. For all these 3 scenarios, quality of data is important and strategic steps to be taken to convert how the conventional approach can be balanced with the current approach are the important points with regard to the data quality.

Data quality & Data Governance should never be considered as a one-time project. It’s an ongoing continuous process (Master Data Management by Tony Fisher-Pg.4). To focus on managing the data quality, big data governance and balance the need to deliver usable analytical results is the critical factor. However, Data quality management practices includes data profiling, data standardization, data cleansing and data governance are all dimensions of big data quality and these can be simplified using multiple tools, processes, validation  to convert the quality data. Currently, large volumes of data sets are in both structured and unstructured format. Whenever the data originate from outside the organization to inside, extra care should be taken for the data quality and data governance. Data can also originate within the organization or can be acquired from the data provider. In such cases, ability to access and validation of data quality and tactical adoption is important. However, this process is simplified and this can be managed with multiple tools and process in the current context compare to traditional approach.

 

Data sourced outside will be different than the internal data. The ability to control such data or to make consistent with internal data sets, data cleansing is necessary. However, it is not prudent to cleanse the data value that originates from outside the organization stating the reason; data can be inconsistent with the original source. You must apply your own quality & business rules to align the external data to match your view of the customer. Organization requires quality at all stages. When data enters the database, when it interacts with other data sources, and when it is extracted for business usage. Without Data quality, project will fail and confusion will prevail.

 

We cannot avoid importing data from external sources for unstructured information for e.g. Twitter, Facebook. While integrating data from external sources, consistency and usability, corrections to data sets should be applied. This is the basis for the core conflict, if an organization does not opt to retain original source format, it may risk the introduction of data inconsistencies by cleansing and standardizing the data. However, Traditional data cleansing technique will not help to increase the data usability especially when the data sets to be used for different outcomes. Practical approach of big data quality does not match with conventional big data quality. Conventional data quality points out data quality and control over the input side but big data quality positively affects the outcome. Data quality should concentrate from entire end-to-end production flow against the introduction of data flows. In addition to that, analytically, organizations look for authentic results that inform business processes to make decision in a profitable way. For this, applying proactive data quality methods like profiling, standardization, cleansing and record matching can positively contribute to the ultimate business objectives.

 

Big data quality is nothing but matching internal vs external data for both structured and unstructured data and to ensure consistency for common reference domain. Identifying the collection of business rules for data quality is useful. Identifying when and when not to introduce data standardization, data cleansing, validation and monitoring is the key strategy for the big data quality.  The tactical solution of big data quality is Business manager should aware of the potential negative impacts that poor data quality can have on the analytics side point rightly aligned with the book author “Data quality is perceived as an IT problem when it is, in reality, it is a business problem”. Ideally, my point is organization cannot blame fully on IT department to put tactical solution for big data quality. All departments within the organization must work as a cohesive entity especially business users and IT team.

2564

Comments: (0)

Now hiring