Software runs the world, and data is everywhere. But not all data is equally valuable. Data comes in many different forms, shapes, sizes, and structures. It can be structured and ordered, or it can be unstructured, badly duplicated, and altogether raw in its base form. When it’s the latter, we really can’t make much use of it until it undergoes some form of treatment.

A Cure for Dirty Data

Sometimes that treatment is a simple and straightforward classification and categorization process where we can parse it and determine its place within a higher structure, such as inside a database or an application.

Other times, the treatment might need to involve deduplication or understanding a data set’s time stamp values (so we know when each component piece of data was created). Or perhaps it involves deeper log file analysis so that we can use it to reverse engineer its value.

With all these different ways to work with data, it would be beneficial to know the tools (free or otherwise) SAP provides to customers to start the process of cleaning their data estate.

Tools of the Trade

ASUG Members know all too well how complex a process it can be to migrate to SAP S/4HANA. A variety of SAP and third-party tools exist to help customers clean up their data sources and get them ready for the deeper level of analytics and data management power that SAP HANA offers.

Small businesses using SAP Business One have traditionally used a tool called Master Data Cleanup Wizard. ASUG Members may have used this tool in the past or may still have instances of it in place. Its core focus appears to have been directed toward removing or deactivating data, rather than total data cleansing and management.

Scanning SAP’s user blogs, you’ll find a site dedicated to all questions. It appears that many customers have used Master Data Cleanup Wizard to perform actions like removing obsolete stock items from database inventory sections. This use case illustrates a perfect real-world reason and need for data cleanup processes.

SAP Data Services

A more direct place to start is SAP Data Services. These are resources dedicated to turning your data into a trusted, ever-ready resource by using SAP solutions and tools that offer functionality for integration, quality, and cleansing.

ASUG has previously discussed aspects of this topic with SAP’s Frank Densborn and Kim Mathäß, both of whom emphasized that preparation involving data cleansing is key for successful project delivery, whether it’s for migration projects or upgrades.

Pointing to SAP tools that sit under SAP Data Services and the SAP Support Portal, Densborn and Mathäß highlight SAP Transformation Navigator, a tool designed to assess an organization’s landscape and provide a framework for digital transformation.

Is Your Data Ready?

“Once you have a clear vision of which scenario your organization will adopt, then the first technical tool you should use is the SAP Readiness Check. This tool scans your current system and tells you exactly what changes you need to make to go from an SAP ERP system to an SAP S/4HANA system,” Mathäß said.

They also mentioned SAP Software Update Manager, Database Migration Option (SUM/DMO). This tool has traditionally been used to bring ERP systems to ERP on HANA, but it has been enhanced to now do a system conversion to SAP S/4HANA.

Some of these tools may be available to you at no additional charge if you are an SAP customer, depending on what service plans and packages you have.

External Opinions on Data Cleanliness

For an external opinion, ASUG Members can consider reaching out to companies in the partner ecosystem, as there are many tools and services available for data cleansing.

IBM is particularly vocal about data cleansing-related integration alternatives and discusses the relative benefits of manual and automated integration in its white paper, “Best Practices for High Data Quality in an SAP Environment.” The paper covers data preparation, data integration, data management, and other aspects of data care.

As the company notes, “Automated systems enable an iterative approach to migration—if the transformed data is not quite correct, it is easy to change the transformation rules and reload the data. These systems also provide a clear audit trail that shows how transformations were conducted, so if problems are encountered when loading the data, the rules can be changed and the process repeated. This iterative methodology is a very effective way to fine-tune the data cleansing process.”

IBM adds that once the rules are finalized, they can be applied in real time to new data as it enters the SAP system, making use of the initial integration investment over future years.

How Long Does a Data Cleanup Take?

With these tools, facts, platforms, and data realities in front of us, we know that ASUG Members will be asking how long their data cleanup process should take.

Is it a one-time process, or it like dentistry where we need continual checkups? Which type of data do we start with? Do we get to a point where all data is clean, and we get an award or safety rating? The answers for all is, it depends.

It depends on the size of your data estate. It depends on the types of apps you’re running. It depends on the compliance rules and regulations that apply to your industry. It depends on whether you’re migrating or simply upgrading. It depends on whether your project is a greenfield or a brownfield deployment that requires re-engineering of your customizations. Or it could be some combination both green and brown.

How much time will it take to execute ultimately comes down to the volume of data your organization has to process and how unstructured it is. Some data you may be happy to leave in the unstructured waters of the so-called data lake, and that’s fine if you have enough business model acumen applied to your organization’s workflows to know how to separate the wheat from the chaff.

Your Current (and Future) Data Estate

We may be able to execute effective cleaning measures on the current data estate inside our organization as it stands this year. But it’s highly likely that we’ll start to ingest, interconnect with, and integrate with new data streams next year. And these may well need to go through the data wash before we can start working with them productively, safely, and confidently. The fact is, we’ll likely generate more data this year than the year before, and this will probably grow exponentially in the future.

Which type of unstructured data do we start the cleaning process with? That very much depends on the particularities of your business model. Most organizations will know where their mission critical data resides (the stuff they need to clean first) versus their extraneous information streams.

When Your Data is Certifiably Clean

And finally, do we get to a point where all data is clean and we get a safety rating? The answer is that it’s a process, which means it is ongoing and never ends.

ASUG Members can clean their core data before, during, and after an SAP deployment, upgrade, or migration project. Yet data is always changing, which makes this a Sisyphean never-ending task. Acceptance of the issue is the first step, so let’s go boldly—and cleanly—forward with our data.

ASUG Members can learn more about data preparation before an SAP S/4HANA implementation by registering for the Start Your SAP S/4HANA Project Fearlessly webcast.