For the past few weeks, I’ve been looking at writing VBA code to validate your Excel data. I covered implementing a validation harness that you can use on any Excel file, and then looked at using it to validate phone numbers. I also used that same harness to automatically replace text in a cell as the user types.
Today I’m going to use that same harness to check for nearly identical values entered into a column. I already talked about using Excel validation to catch exact duplicates, but now I’m going to cover the scenario where someone types in something that is nearly identical. For example, if they type in “ABC Company”, and then “ABC Co.”, those won’t be caught by the validation I implemented using the Data Validation feature. I’m going to use VBA code to search for values that are nearly identical, including a few different variations.
Catching Close Matches
Here's a sample dataset that I’ll be working with:
When a new vendor name is typed in on row 8, I’ll run back through the previous entries on the list, and check to see if the new name is a close match for any name that’s already on the list.
First, let me define what I mean by “close match”. I’ve got “ABC Company” on row 2. What variation on that name might a user type on row 8? Off the top of my head, I can think of:
A B C Company
I’m sure there are countless other variations that could be entered, but my goal is to figure out the most commonly used alternatives, and trap those. I’ll never catch every possible variation, but I can capture a lot of them.
Using the examples I listed above, I’ll make some general rules about near-matches. Any two names are considered duplicates if:
They are exactly the same when punctuation is removed. “A.B.C. Company” matches “ABC Company”.
They are exactly the same when spaces are removed “A B C Company” matches “ABC Company”.
They are exactly the same when I replace “Company” with “Co”. “ABC Company” matches “ABC Co”.
They are exactly the same when I replace “Incorporated” with “Inc”. “ABC Incorporated” matches “ABC Inc”.
They are identical but with case differences. “abc company” matches “ABC COMPANY”.
For today’s purposes, I’ll stick with those rules, but of course, you could extend that list quite a bit.
Looking at those rules, here are the steps I’m going to take in my VBA code:
Remove periods, commas and apostrophes
Replace “Company” with “Co”
Replace “Incorporated” with “Inc”
Replace all upper-case letters with lower-case letters
I’ll include this logic in a loop that runs through all previous entries on the list, and build a list of potential duplicates. After I’ve checked over the entire list, I’ll present the user will any potential duplicates I’ve found.
Step 1 is to add in the validation harness that I’ve covered previously, and save my file as a marco-enabled workbook. My code looks like this:
For an explanation of this code, refer back to my post about the Validation Harness.
Step 2 is to apply all of my rules to the contents of the name that the user has typed in. The “LastCellChanged” variable refers to whichever cell the user just finished typing in, so I’ll work on that variable:
Step 3 is to add a loop that will cycle through all previously-entered vendor names. I’ll check the row number on that cell, and then loop through all vendors names above that. Here’s my loop:
I start on row 2, since the headers are on row 1. I proceed down until I reach the row above the one that the user is working on. I’ll apply each of my rules to that value, and then check it against the NewVendorName that I saved:
Now my last step is to build a message that I will display to the user after the loop has finished:
Does It Work?
Let’s see how this works.
Oops. Did you see what went wrong? It worked great for “ABC Co.”, picking up the duplicate “ABC Company”. It also worked for “ABC Inc.” and “ABC Incorporated”.
But when I typed in “A B C Company”, I didn’t get any messages.
Because I forgot one of my rules. I said that I need to remove all spaces before I compare the two values, but I forgot to add the code for that.
This is why we test, people!
Here’s the corrected code:
And here it is in action:
We’ve now got some decent validation that’s going to catch a lot of potential duplicates, and of course, you could extend this code to do a more extensive check of your data. But this is a good start.
In my next VBA post, I’m going to talk about what happens in your Excel code when there is text in a field that should be numeric. We’ll look at the errors that may come up in your code, and how to circumvent them. To be notified when that post goes live, subscribe using the form below.
If you’ve tried this code, or something similar, I’d love to hear your experiences, so drop a comment below.
Want help with your VBA code in Excel or Access? Let’s set up a call to discuss your project. I can pick up where you left off, or start from scratch, if that’s what you need. Email me at email@example.com, or call 978-582-9550.
About the Author
Kimberlee Martin is the owner of North Port Solutions and has 30 years of IT experience. She's worked with several programming languages and database tools over the years, with her favorites being Microsoft Excel VBA, SQL Server, and Visual C#. Her passion is helping small businesses gain insights into their business with effective reporting and data management.
Contact Kim at firstname.lastname@example.org.
© North Port Solutions, LLC, 2022. All rights reserved.