Data Cleansing

Get your business partner master data clean not once, but smart!

SVG
Clean
Data entry errors, system hiccups, duplicates, and inconsistencies - they all lead to inaccuracies and incompleteness, which have serious implications for your business. With a robust master data cleansing process in place, you can ensure that your data is as clean and accurate as possible. Not just once, but continuosly.
A call to master data cleansing

Why your customer and vendor master data need to get clean:

59%
of business partner records show
critical violations
21%
of business partner records
outdate within 1 year*
16%
of business partner records are
unintended duplicates
*Based on our data quality benchmarks
Towards clean master data

Data fit for purpose

Correct data of your customers and vendors is a prerequisite for proper invoicing, authority tax reporting, or reliable compliance checks. At the same time, manual data maintenance is time-consuming and error-prone due to the various country-specific reference data sources, as well as volatile business landscape.

With our capabilities, master data cleansing becomes easier. By checking against 2,600 data quality rules and more than 70 reference data sources, CDQ gives you a comprehensive tool for handling any missing information in your business partner data, incorrect attributes, or duplicated records, making your data fit for purpose.

Utilize CDQ out of the box services in a structured process for your master data cleansing - not just once, but continuously. 

CDQ Get Clean App

Getting master data clean is just the beginning

Master data cleansing with the right approach

Get-clean projects that stick

Master data cleansing is a complex process that goes far beyond simply "cleaning up" data. It requires navigating disparate systems, aligning cross-functional priorities, and establishing clear data standards and ownership.

And even once your data is clean, it doesn’t stay that way for long. Business partner records are constantly changing: our 'Business Partner Update Monitoring' counts 43K changes every day in address information alone, including minor and major changes. That’s one change every two seconds, making it clear: a one-time clean-up isn’t enough.

CDQ offers an all-time-right approach, designed to simplify this complexity, and offering the flexibility and expertise needed to ensure your data cleansing project delivers lasting value.

Getting it clean with CDQ: master data cleansing

CDQ Client Spotlight: Kuehne+Nagel
Effective data cleansing of 4 Mio. global customer records
icon speed
Speed
2y to cleanse 80% of data
The first ever global process for clean-up of duplicate customer records included 29 largest countries
icon trust
Trust
660,000 duplicates eliminated
equaling 17% of customer records in scope, that used to complicate daily operations and analytics
icon efficiency
Efficiency
Lean review process
for 400 reviewers in various functions of 4 business units of Kuehne+Nagel
Spotlight K+N

In the project, we not only eliminated thousands of duplicates, but developed a sustainable cleansing approach, including global duplicate definitions, a cross-company review process and an innovative review tool

Dr. Andreas Nohn
Senior Data Scientist
CDQ Client Spotlight: Bayer AG
Effective data improvement for one of Germany's biggest mergers
icon speed
Speed
10 weeks
1.3 Mio BP data records from Bayer and Monsanto merged and improved within 10 weeks
icon trust
Trust
97%
of duplicates identified correctly
icon efficiency
Efficiency
80,000 less duplicates
Duplicates can slow down decisions and processes
Bayer-logo.png

A merger of this size needed the best data management approach we know. With CDQ, we could deliver faster and better!

Gerhard Gripp
Global Data Lead M&A
At a glance

How CDQ can help you

  • Get clean and stay clean with our best in breed software solution and 18 years of hands-on experience in corporate data quality. Integrate CDQ All Time Right capabilities to your system via APIs to enable continuous maintenance and improvement of your data stack.
  • We give you a comprehensive solution for managing the quality of business partner data, including data cleansing, data enrichment, data standardization, and data governance - not limited to syntax but also including semantic checks. And 2,600+ data quality rules are already in place, so no configuration is required by the customer.
  • CDQ-supported get-clean project requires minimum effort on your side and is fully facilitated by our Customer Success Manager and Solution Architect. All steps in your master data cleansing project can be customized and adapted to your specific needs.

Clean master data? Absolutely. Master data cleansing? Not exactly a walk in the park.

Mastering data cleansing usually means wading through endless tables and systems, getting buy-in from multiple business processes, all to establish standards, priorities, and who's responsible for what. And if you're thinking about applying specific data cleansing services, they often lack the flexibility to handle those unique business needs and the fuzzy logic that comes with it.

It's time to enjoy a hassle-free clean-up

SVG
arrow 1

 

4 steps of master data cleansing

1

Data Preparation​

SVG
Mapping

Is the data extracted correctly? Are there any missing fields? Is the content in the right format and field?​ How does your data attributes map to the CDQ data model?​​

2

Data Quality Check​

SVG
Quality Check

Configuring master data validation and getting data quality status report,with identified violations per record based on over 2,600 out-of-the-box data quality rules.

3

Duplicate Identification & Consolidation​

SVG
Duplicate

Identification of potential duplicates based on proven CDQ matching algorithm proposing a consolidated Golden Record.​

4

Data Cleansing and Enrichment​

SVG
Enriching

Enrichment of missing identifiers and alignment of name/address data based on external reference data sources.​

Get clean at a glance

Clean data? Yes, please.

Master data cleansing is often a tedious search in countless tables and systems, involving at least three different business processes that need to agree on standards, priorities, responsibilities. And if data cleansing services are considered, most of the time they are not flexible enough to take business-specific requirements and fuzzy logic into consideration. 

 

Get-clean is one click away: get your free copy!

SVG
arrow
CDQ Get Clean

What's in it for you?

Cut costs of redundant records and inefficient processes by using advanced algorithms and tailored data quality rules.
Identify and flag records associated with natural persons, ensuring GDPR compliance and mitigating legal risks.
Prepare for SAP S/4HANA migration with clean data, lower costs and improve system performance after go-live.
CDQ duplicate check

Sustainable approach to duplicates

Tackle deduplication with CDQ All Time Right: designed for keeping your data clean now and in the future.

  • Flexible access: use our intuitive web apps or robust APIs to process over a million records, seamlessly integrated with your system, including SAP MDG and S4/HANA.
  • Domain knowledge: built with deep expertise in business partner master data and adaptable across various data types.
  • Country-specific configurations: Ready-to-use matching rules tailored to specific countries, ensuring accurate results based on local standards.
  • Customization: from data cleaning to advanced comparators like Levenshtein and phonetic matching, get full flexibility to tailor matching configurations to your specific needs.
  • Continuous improvement: machine learning improves matching accuracy over time, adapting to changing data patterns.
  • Rule-based Golden Record consolidation: merge matched records into one complete, enriched version to ensure data quality and consistency.
David_Giesinger_circle
Ready to automate and improve your master data cleansing approach?
See key product features in action and challenge us with your questions.
Deduplication powered by CDQ

Full flexibility in configuration for your exact needs

Matching configuration defines which attributes are to be compared (which comparator should be used for which attributes), which impact identical or different values of these attributes have on the matching score (confidence scores, how values should be temporarily transformed (which cleaners should be used), which threshold the matching score of potential matches should be exceeded to be considered a duplicate.

CDQ offers a fully-flexible configuration 

  • Any set of attributes can be used for identifying duplicates 
  • Cleaners and comparators can be individually configured 
  • Thresholds can  be individually configured 

You can use standard configurations and optimize them iteratively for their specific use case 

Cleaners

For each input data field, a cleaner can be used. A cleaner transforms or normalizes data of the field before it is compared. Thus, cleaners help to improve match scores by removing characters or accents which are not meaningful in the matching context. Standardize and harmonize for better matching results.

These cleaners are used to normalize values, i.e. remove accents, whitespace, and case differences:

  • Lower case normalize: Most widely used cleaner. It lowercases all letters, removes whitespace characters at beginning and end, and normalizes whitespace characters in between tokens. It also removes accents, e.g. turning é into e, and so on. 
  • Non-character cleaner: Removes any characters that are not latin characters including numbers. 

  • Replace cleaner: Replaces strings by other strings. Patterns may also comprise regular expressions, character case can be ignored.  

  • Strip nontext characters cleaner: Removes non-text characters. Specifically it strips control characters and special symbols 

  • Trim cleaner: Trims whitespace characters at the beginning and end of the input string. 

  • Regular expression cleaner 

E.g. City cleaner, Legal form cleaner etc:

Legal Form Cleaner 

  • Special cleaner for business partner names. The cleaner identifies a legal form in the input string and cuts it 

  • Example: In CDQ AG Factory St. Gallen , AG is identified as legal form and only CDQ remains for matching. 

  • More than 1‘000 legal forms worldwide with more than 2‘500 variations of abbreviations (Limited, Societe Anonyme, Aktiengesellschaft, Incorporation, etc.) 

International phone numbers cleaner 

  • Standardizes phone numbers to one common international format, so that e.g. the following numbers are represented identical 

  • 0049 55301400, 

  • +49 55301400, 

  • 49 55 301400, 

  • +49 (0) 55301400 

Country cleaner 

  • Special cleaner for removing country names from an input value In some cases it is required to match values such as CDQ AG and CDQ Deutschland AG 

  • Takes the input value and searches for any term that represents a country. This information is then removed from the input, so that just CDQ AG is compared with CDQ Deutschland AG 

Comparators

Comparators compare two string values and produce a similarity measure. To compare different kinds of values differently, the CDQ duplicate identification engine can use the following comparators among others: 

Apply individual comparisons for any attribute for highly accurate matching 

The Exact Comparator, categorized under "String Comparison," delivers precise data comparisons. It treats attributes as identical only in case of exact congruency. For example, when comparing "ABC Corporation" to "ABC Corp," it treats the strings as different and thus lowers the overall score.  

"Geospatial Comparison" is ideal for location-based data. It calculates the distance between geographic coordinates to determine location proximity accurately. For instance, when comparing the coordinates of New York (Latitude 40.7128, Longitude -74.0060) and Los Angeles (Latitude 34.0522, Longitude -118.2437), it provides the distance between these locations on Earth's surface. 

  • Categorized as "Phonetic Comparison," the Soundex Comparator utilizes a phonetic algorithm to compare textual data based on their pronunciation. It excels at recognizing similar-sounding names or terms, enhancing data accuracy. For example, it would identify a similarity between "Smith & Sons Co." and "Smyth and Sons" due to their phonetic resemblance. 

  • The Metaphone Comparator employs the Metaphone phonetic algorithm to compare textual data. It transforms text into phonetic representations, making it effective for identifying similar-sounding names and terms. For example, it would compare "Phoenix Electric" and "Fenix Elektrik" based on their phonetic representations, enhancing data accuracy during comparisons. 

  • Jaro Winkler Comparator specializes in comparing short strings like names. It calculates the Jaro Winkler distance, a dedicated metric for deduplication. When comparing "Johnson & Johnson" and "Johnston & Jonson," it produces a Jaro Winkler distance score to indicate their similarity. 

  • Levenshtein Comparator quantifies string similarity by measuring the number of edit operations needed to transform one string into another. This makes it highly reliable for fuzzy string comparisons. For example, when comparing "Google Inc." to "Goggle Incorporated," it counts the edit operations required. 

  • Longest Common Substring Comparator offers in-depth analysis by identifying the longest common substrings. It repeatedly drills down to the smallest shared elements. When comparing "Tech Solutions Group" and "Solutions Group Inc.," it identifies the longest common substring, such as "Solutions Group." 

Q Gram Comparator: The Q Gram Comparator utilizes n-grams of field values to calculate similarity. It excels in scenarios where token order doesn't matter. For instance, comparing "Microsoft Corporation" and "Corp Microsoft" using 2-grams, it calculates their similarity. 

Discover the smart data journey with CDQ

Clean your data once and ensure new records are entered correctly to stay clean over time.

CDQ All Time Right is the first SaaS solution built to give you complete command over your master data. It enables cross-functional value throughout your entire organization by keeping your business partner master data consistently accurate, verified, and reliable. Always. Continuously. All time right.  

You might also like

Boost Data Quality for SAP S/4HANA: The Role of Customer Vendor Integration (SAP CVI)

How benchmarking can help you find out! Data quality benchmarking helps to identify gaps in an organization's data quality processes and target their…

Augmenting business partners into trusted Golden Records

Golden Record: the fundamental concept that ensures the integrity and consistency of your business partner information. At its core, a golden record is the…

Data Enrichment for Enriched Business Partnerships

If you're looking to improve your core business processes, it's important to have reliable and accurate data on your business partners. Data enrichment is one…

Leading corporations already rely on our solutions

BASF logo
Bayer AG Logo
BOSCH logo
Dovista Logo
Dräger logo
Evonik logo
Ferring logo
Fresenius Kabi logo
Johnson & Johnson
Kuehne+Nagel
LanXess logo
Nestlé logo
Novartis logo
Rheinmetall logo
Sartorius logo
Schaeffler logo
Schwarz logo
SEW
Siemens logo
Swiss Krono logo
Syngenta
Takeda
Tetra Pak logo
TUV SUD
Unilever logo
Vaillant logo
ZF logo