iCEDQ CTO Sandesh Gawande Explains the need of Automated ETL Testing and Data Monitoring Testing in a modern data integration strategy.
Solutions Review recently invited iCEDQ CTO Sandesh Gawande for an Expert Discussion on the need for automated testing of data integration processes. The conversation also covered the need for quality assurance and quality control with regard to data monitoring for compliance across data warehouses, data migration, and big data.
STAMFORD, Conn., Oct. 1, 2018 /PRNewswire-PRWeb/ -- Big companies do not really focus on testing and monitoring ETL processes, especially today when their projects are data-centric i.e. related to Big Data, data lakes, migration, integration, Master data management, etc. Moreover, there is absolutely no focus on the automation of testing these systems which poses a huge risk for companies. So even though it's not "fashionable", somebody has to focus on quality assurance and quality control of ETL processes. There are also industries under federal mandate to get data auditing process in place. Huge billion-dollar companies still have nothing in this regard and they simply do manual testing. In a world, where data is as valuable as money, there has to be automated data auditing, regulation, monitoring, and testing.
With the new concept of DevOps which involves automation of development, testing and release management, companies should consider automating their testing. So if companies are not automating their testing especially in data-centric projects, they are effectively not doing any DevOps/DataOps. In the current scenario, we are talking about billions of records while QA professionals are manually checking 100 to 1000 records per day to test their ETL processes.
Performance and scalability are not the same things as people generally think. There is a crucial difference between performance and scalability - performance is how fast a process can finish while scale talks about the amount of data. So a process which performs fast might fail in a minute if applied to trillions of records of data.
There do exist challenges in companies switching to a platform like iCEDQ. Since ETL testing is generally outsourced, companies just do not care enough to really understand what is happening at the other end. Also, the QA teams majorly quote budget as the problem. Surprisingly, another challenge is also complacency because many people want to continue their old ways. The core challenge is that testing of ETL processes is not really a priority for the companies. People who approached us for iCEDQ were mostly people who wanted to change things at their companies and make them better. Or people who figured that their project was failing during implementation and they should have tested it in the first place.
Automated testing reduces the development timeline by 33% because one need not follow the sequential Waterfall model. With automated testing, companies can split their project into two tracks - developing data processes whether it is data lake or Hadoop and in parallel, rules can be built to test those processes and commence testing. A tool like iCEDQ is relatively easy; what is really difficult is understanding the difference between auditing and transaction or testing and development. Setting a culture of quality assurance and quality control in a data-centric world is important. Apart from tangible results, automating ETL testing has intangible benefits for a company including credibility, delivering good data, compliance, increasing test coverage, etc.
If there is quality assurance and quality control for machinery, why not for data processes. When manufacturing a mechanical product, quality assurance is done by checking if individual machine parts are working as desired as well as if all inputs are correct. Quality control means to take a sample of the product and check what's happening. This is an engineering concept and it should be applied to data factories as well. We do exactly this in iCEDQ. Rules are tested in production before deployment which is quality control for production processes. Quality control should be ongoing in production monitoring as we do not have control over the kind of data we are getting.
About Sandesh Gawande:
Sandesh Gawande is the CTO for iCEDQ Software at Torana Inc. Since 1996 Sandesh has been designing software, implementing data architecture, ETL and reporting. He has developed and trademarked framework for data integration - ETL interfaces Architecture. iCEDQ software has its genesis in these experiences as the realization dawned to him, that very little or no tools are available for ETL testing as well as data monitoring.
Learn More:
- Test drive iCEDQ today.
- Subscribe to iCEDQ blog for the latest on DevOps/DataOps.
- Follow on Twitter @icedqtalk.
- Join the iCEDQ User group on LinkedIn.
- Subscribe the iCEDQ YouTube channel.
About Torana Inc. (iCEDQ):
Torana Inc., has been in business for 12 years and has served the enterprise business with consistency. Data is the foundation of every business, and eliminating data risks is our mission. Our products are used in all kinds of verticals, like banking, insurance, healthcare, manufacturing, and e-commerce.
iCEDQ is a DataOps and Data Certification platform uniquely designed for ETL Test Automation, Data Migration Test Automation to the Cloud, Big Data Testing and Production Data Monitoring. It is used throughout the data lifecycle. During development, iCEDQ is used to test the data processes/ETL and data migrations to big data or the cloud. In production, it is used to certify both incoming raw data from sourced systems and processed data generated by internal systems.
SOURCE Torana Inc.
Share this article