Identity - Referential Matching

Why Referential Matching Is Superior To Standard Entity Analytics Techniques

The Challenge: Master Data Management

Master Data Management (MDM) is often seen as a massive challenge. Today’s blog article may make you think think that MDM is actually not so complex. Referential Matching is the best-in-class solution for uniquely determining the identity of a business entity.

Cleaning Your Systems Of Record

On my first week at Dun & Bradstreet (D&B) some years ago I had the opportunity to spend some time with a D&B employee who was a month before retirement. He was the company’s long-time expert in Entity Matching.

My new colleague – at least for one month until he retired – welcomed me and demonstrated his work to me. He was engaged with a client who has had records of roughly 150,000 clients (for the sake of the example, it could be clients or suppliers). Within this list of 150,000 entity records there was some level of redundancy, as well as some old clients that no longer existed (they have gone out of business), but there was no clarity on the size of the problem, i.e. how much redundancy? How many out-of-business clients? D&B has been contracted to uniquely identify each entity in this list, and to detect redundancy, out of business entities, and linked entities. This would provide the D&B client insights into who our client’s clients really are.

Did I Ask A Naïve Question?

I looked at my new (well, older) colleague with intrigued eyes. Having come from the IT industry where I’ve seen that MDM projects at large organizations have long implement periods and require large budgets, I knew that uniquely identifying 150,000 entities and detecting redundancies in datasets is not a trivial task. A dataset with 150,000 entities is a fair size dataset, yet not a massive one. With my reference, this would be a project of several months (depending on the team size). And so I looked at my colleague and asked: “how long will this project take?”. Now it was his time to look at me with surprised eyes. His response was: “What do you mean? I just need to press the button”. He kept looking at me as if I had asked a very strange question. Have I asked a naïve question? Obviously he did not understand the source of my doubts. And so I responded: “Of course you can identify some percentage of the entities automatically, but the majority would require manual review, because when entity details differ somewhat, entity analytics software fails to detect the redundancy”. I spoke based on my own experience so far. But then he showed me that the vast majority – I think it was about 90% of the records – have been identified with high confidence through the automated matching process.

My next question was: How is this possible?

In order to understand how this is possible, one needs to understand the difference between standard Entity Matching (or Entity Analytics) techniques and the Dun & Bradstreet Referential Matching technique. The two are inherently different.

Standard Entity Matching Methods

Entity Matching – The Regular Approach

Many MDM / Entity Analytics software tools on the market provide Entity Matching capabilities. These would typically look at entities in your database, and try to detect similarities. If the software detects two similar entities, it labels them as the same single entity.

How does that work? Imagine that an entity has a number of attributes, for example (if we consider business entities): company name, company trade name, physical address, mailing address, telephone number, CEO name, CFO name, local chamber of commerce registration number, VAT number etc. If entity analytics software finds two entities with similar attributes, it flags them as the same entity. For example, you may have the following two entities in your database (all company details used here are fictional, and used for demonstration purposes only):

Entity Attribute Entity #1 Entity #2
Name Belgian Chocolate Factory Belgian Chocolate Factory
Address Rue de la Loi 1000, Brussels, Belgium Brussels, Belgium
Telephone Number +32-2-1234567 +32-2-1234567

Regular entity analytics will detect that these two entries are the same. But the same software will fail to detect the redundancy in many other cases. Let’s look at some examples. The examples are simplified for the sake of understandability; in reality the software compares a larger number of attributes than presented in the examples.

Where Regular Entity Matching Fails – Different names

If your database includes one record with the company’s formal name (in this case: Belgian Chocolate Factory) and one record with the company’s trade name (in this case: Godida), entity analytics software will fail to detect that the two records refer to the same entity. Often software would flag such cases as two companies registered on the same address, resulting in redundancy in your database.

Entity Attribute Entity #1 Entity #2
Name Belgian Chocolate Factory Godida
Address Rue de la Loi 1000, Brussels, Belgium Rue de la Loi 1000, Brussels, Belgium

Where Regular Entity Matching Fails – Linguistic Diversity

Still using the Belgian example, consider the situation in a country like Belgium where addresses are used in two languages: French and Flemish. You may then have two records with the same address, but in two different languages. Consequently, entity analytics software may consider the addresses to be different. The example below uses the Flemish and the French versions of the same address.

Entity Attribute Entity #1 Entity #2
Name Belgian Chocolate Factory Godida
Address Rue de la Loi 1000, Brussels, Belgium Wetstraat 1000, Brussel, België

Where Regular Entity Matching Fails – Old Details

Have you ever had outdated company name and address details in your database? (most organizations will answer positively). Company names, addresses and other details change regularly, and often your clients/suppliers do not even inform you about such changes. You may then have two records of the same company, but with different addresses or names: an old address / name, and a new address / name. This again is a problem for entity analytics software, because the two addresses really are different. Consequently, entity analytics software would flag these two entities as different entities.

Entity Attribute Entity #1 Entity #2
Name Belgian Chocolate Factory Godida
Address Rue de la Loi 1000, Brussels, Belgium Rue du Trône 2000, Brussels, Belgium

Where Regular Entity Matching Fails – Records With Different Attributes

In this example you have two records in your database with different attributes of the same entity. Entity analytics software compares the available attributes, but because different records hold different attributes, there is no good match.

Entity Attribute Entity #1 Entity #2
Name Belgian Chocolate Factory BCF
Trade Name
Physical Address Rue de la Loi 1000, Brussels, Belgium
Mailing Address Belgiëlei 700, Antwerpen, België
Fax Number +32-2-7654321
Telephone Number +32-2-1234567
CEO Name Jean Le Chocolatier

Referential Matching: Comparing Entities To A Reference Database

How Referential Matching Works

Referential matching (RM) inherently differs from standard entity matching methods. Instead of comparing entities in your database to each other, referential matching compares all of them to a reference database that is considered complete. Referential matching finds the unique entity in the reference database that matches to every entity in your database, and assigns it the unique entity number from the reference database. When the referential matching engine assigns the same unique entity ID to two entities in your database, these two entities are in fact the same entity.

How Referential Matching Deals With The Challenges of Standard Entity Matching

The reference database is considered to be complete. It includes for example the current official name of a company, a previous official name of a company (some companies regularly change their names), the company’s trade name (many companies have trade names; for example, IBM is a trade name while the official name of the company is International Business Machines Corporation), the physical address (current and previous), mailing address (current and previous), CxO names, VAT number, local Chamber of Commerce number etc.

Using the above examples, the reference database would contain all this information:

Entity Attribute  
Unique Entity ID 123-456-789
Name Belgian Chocolate Factory
Trade Name BCF, Godida
Physical Address (current, French) Rue de la Loi 1000, Brussels, Belgium
Physical Address (current, Flemish) Wetstraat 1000, Brussel, België
Physical Address (previous, French) Rue du Trône 2000, Brussels, Belgium
Physical Address (previous, French) Troonstraat 1000, Brussel, België

 

Mailing Address (current) Belgiëlei 700, Antwerpen, België
Mailing Address (previous) Rue du Trône 2000, Brussels, Belgium
Telephone Number +32-2-1234567
Fax Nunber +32-2-7654321
CEO Name Jean Le Chocolatier
VAT Number BE1234567890

Referential Matching: Critical Success Factors

We have established that regular entity matching (entity analytics) fails where the records in your database do not have very similar details of a company (entity). Referential matching does not suffer from such data quality challenges. Instead, it compares each entity in your databases to a complete reference database, where the most complete dataset is available.

Several critical success factors must be present, for referential matching to be successful.

Completeness Of The Reference Database

First, the reference database must be as complete as possible, both in its breadth and in its depth. Breadth refers to the number of entities covered in the database. If an entity in your own database does not appear in the reference database, the matching will fail. And therefore it is crucial to have a complete – or near-complete – reference database. Depth refers to the attributes of entities in the reference database. Also here if the reference database does not cover attributes that you use in your own database, or if the reference database is not up-to-date, the matching will fail.

A Unique Entity ID That Is Really Unique

Second, the reference database needs to have a unique ID that is really unique. It’s easy to assign a so-called unique ID to every entity without being sure that there is no redundancy. But this defeats the purpose. The reference database should have a unique entity ID while having zero – or close to zero – redundancy. I talk about “close to zero” redundancy and about “near complete” database because in the realm of data one can hardly ever promise 100% completeness or correctness. It’s practically impossible.

Performance

Third, performance is a critical non-functional requirement. Imagine that you need to search 150,000 entities (or even larger numbers) in a reference database that contains hundreds of millions of entities. Such a search could take a very long time. Performance can become an exponential problem. Therefore, smart search and indexing algorithms must ensure a response time that makes users smile rather than lose their patience.

Best Results Achieved Fast

Referential Matching: The ROI (Return On Investment)

When all of the above is available, referential matching is a solution that will solve the master data challenge in your B2B databases (i.e. databases with registrations of companies) fast, with high precision and with lower costs than when using regular entity analytics software. The lower costs stem from shorter project duration, involving substantially less manual work. Typical project duration is weeks or months, not months or years.

The Greatest Ideas Are The Simplest. But…

William Golding wrote in the book Lord of the Flies “The greatest ideas are the simplest”. I feel that this sentence fits the challenge discussed here very well. The problem domain can be of massive magnitude, and the solution is so simple. If it’s so simple, why didn’t many providers think about it? Why doesn’t each entity analytics software package offer this capability? The answer lies in the Critical Success Factors listed above. There is only one provider worldwide that achieved absolute superiority in all the critical success factors listed above. That is Dun & Bradstreet, which is known under different names in some regions, for example Altares in France, Altares Dun & Bradstreet in the BeNeLux region and Bisnode in Germany.

The Unique Position of The Dun & Bradstreet WorldWide Network

The D&B Worldwide Network operates globally and offers the most complete B2B database worldwide (with more than 300 million business entities at the time of writing this article; and growing), with the richest history (D&B exists since 1841). Each entity in the D&B global database has a unique ID, referred to as the D‑U‑N‑S Number. This ID is a de-facto standard worldwide, having a global coverage. The D‑U‑N‑S Number is being used by approximately 90% of the Fortune 500 companies. Since D&B is the only provider who offers such matching quality, one can understand why the D‑U‑N‑S Number is really unique, and there is close to zero redundancy in the D&B WorldWide Network database. Last but not least, D&B owns dozens of patents, including patents for matching and indexing, which are the fundament of entity matching, and allow the fast response time of the matching engine. Those are all major entry barriers for other providers, leaving the Dun & Bradstreet WorldWide Network as a de-facto sole provider of high quality referential matching.

Am I Biased?

As I believe in openness, I will mention here for readers who do not know me, that at the time of writing this article I am employed by one of the Dun & Bradstreet WorldWide Network (WWN) companies. Does that mean that I am biased? Or that the previous paragraph is a marketing pitch? In all honesty, the answer is no. I believe strongly in this unique capability of Dun & Bradstreet, and I am convinced that the D&B WWN possesses all the critical success factors listed above. Gladly, I work for a company in whose offerings I strongly believe. But besides my strong belief in the D&B Referential Matching capabilities: having been involved in MDM projects without this capability (prior to joining D&B), I see very concretely the immense potential of such a capability.

Referential MAtching Combined with Entity Analytics and MDM Software

Can referential matching be combined with entity analytics and MDM software solutions? The answer is yes. Referential matching will solve many of the complexities around business entities, and will speed up the broader software implementation project. In fact, successful implementations always embed referential matching into MDM solutions, because the matching is not a one-time-only effort. After all, your data keeps evolving every day.

Next Steps

Do you have redundancy within your IT systems? Does your company use multiple systems of record without a helicopter view on a single entity? Is there inconsistency in how you capture entities in your databases? Have you not cleaned your CRM or other B2B / G2B systems for a while? If you have answered positively to any of these questions, Referential Matching can help you. For details, contact your local Dun & Bradstreet WorkdWide Network partner. If you’re interested in my advice, contact me directly.

 

Go back to the blog start page.

Sign up to receive blog updates via email.

About the author

Ziv Baida

View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *