Cognitive Data Engineering: AI-Governed Data Quality, Lineage, and Pipeline Optimization at Scale

Main Article Content

Velangani Divya Vardhan Kumar Bandi

Abstract

Artificial intelligence (AI) is establishing itself as the next generation of technology. However, the data required to train such expandable, self-learning, powerful systems have so far been collected and pre-processed in a traditional manner. Moreover, because of the lack of governance in many such AI initiatives, these processes remain uncontrolled and often produce low-quality results. Both issues urgently need solution.


Three core principles of cognitive data engineering have been developed that are enabled by the recent expansion of AI technology. First, quality metrics and evaluation, as well as anomaly detection and correction mechanisms, have been formalized to provide a comprehensive AI-governed data-quality framework. Next, a set of metadata standards that define describe and affect Internet-scale data ecosystems is proposed. Their implementation provides a sophisticated method of capturing data lineage and provenance information and using that data for compliance and efficient data query acceleration. Third, the data pipeline architecture is designed to produce the definition and execution of complex data pipelines and orchestration in a cost-aware manner. These contributions enable an AI governance framework for complete data pipelines. Such a framework defines roles, policies, and decision rights to identify risks in the use of data pipelines, assess those risks quantitatively, and provide mitigation guidelines.

Article Details

Section

Articles