Using Access to Clean Excel Data, Part 2


In a previous post, I talked about using Access tools to clean up your Excel data. Here, I’ll continue that thought.


Let’s assume that you’ve got some Excel data, and it’s got some problems:




As I mentioned before, it’s a small dataset, and you wouldn’t need Access to help clean up this data. But for today’s discussion, let’s assume that you have a much larger dataset that includes many of these errors. In part 1, I imported this data into Access, and built an Update query to replace the spelled-out state names with state abbreviations. In this post, we’re going to tackle the phone number, correcting the formatting so that they are all displayed the same.


Expected Format

There are many valid phone number formats, and you'll need to settle on one of them for your data. Let’s assume that we’ve decided that we’ll use this format:


999-555-1212

No spaces. No parenthesis. No extensions. No country code or “1-“. Just a 10-digit phone number, with dashes.


If you’re familiar with Microsoft Access, you might be aware that its possible to store only the digits, adding the dashes for display purposes only. But remember that we’re working on fixing Excel data here, where its far less likely that phone numbers will be handled that way. For this reason, I’m going to keep the dashes.


Remove Parenthesis

I’ll start by removing any parenthesis. I’ll use an update query to do that, along with the “REPLACE” function.



Take a look at the “Criteria” row first. I’m using the “like” operator with the asterisk wildcard to find any phone number that contains the open parenthesis. I’ll have to make a second pass later to replace the closing parenthesis.


Now look at the “Update To” row. Confusing, right? There’s a lot going on there. Let’s break it down.


I’m using the “Replace” function, with three arguments. If you’re not familiar with arguments, let me give you a primer. The syntax I’m using the Replace is:


Replace(Arg1, Arg2, Arg3)

The three variables inside the parenthesis are called arguments (also sometimes referred to as parameters). We'll need to fill in a value for each of these arguments.


When I fill in the arguments, they are full of their own punctuation, which is why it becomes difficult to read. For the first argument, I’m going to tell Access to get the value from the Phone field in my table. I do that by putting the field name in square brackets: [Phone]. So that’s Arg1.


For the second argument, I’m telling Access to locate the open parenthesis: (


I have to embed that inside double-quotes, because it’s a text string. So I get “(“ for the second argument.


Finally, the last argument tell Access that the replacement value is an empty text string, represented by two sets of double-quotes: ""


Confusing, I know. Looking at these strings with so many mixed types of punctuation is always tricky.


After I run the update query, my data looks like this:



You can see that on row 3, the opening parenthesis has been removed. I’ll repeat the same steps to get rid of the closing parenthesis, leaving my data looking like this:





Now, I’ll do one more replacement with a similar query, replacing all spaces with dashes. My data will then look like this:




Remove the Leading 1

Some of you young’uns out there might not be familiar with the “1-“ in some of those numbers. That’s a country code. Back in the day, you needed to use it when calling any land-line that was in a different state than yours. Those days are gone, but you’ll still sometimes see people enter phone numbers with the 1, especially if you’re got old fogies like me entering your data.


Let’s get rid of the “1-“ using another update query. This time, I’m going to use the “Right” function, combined with the “Len” function. If you’re not familiar with them, there’s a wealth of info on each on the Internet or in the Access help.




Starting with the Criteria row, notice that I only have one wildcard (the asterisk), and it appears at the end. This means that I want to find any phone numbers that begin with “1-“, followed by any other characters.


On the “Update To” row, you’ll once again see a jumble of text and punctuation. In this case, I’ve got two nested functions: Trim and Len. Looking at the “Right” function, the syntax I’m using is:


Right(Arg1, Arg2)

Arg 1 is the text string, and I’m telling Access to use the Phone field – once again embedded in square brackets, so that Access understands that it’s a field.


Arg 2 is the number of characters from the Phone field that I want to keep. I’m keeping all characters except two, so I’m using yet another function, along with a little math. I use the “Len([Phone])” function to tell me how many characters are in the phone number. Then I subtract two from that, to exclude the “1-“. Here’s my thinking:


If the phone number is: 1-603-111-2222
The LEN is: 14
Number of characters I want to keep: 12 (14-2)
Right-most 12 characters: 603-111-2222

After I execute this query, my data looks like this:





Adding Dashes

Next I’ll tackle the fourth row, which is the one without the dashes. I’ll use a query to determine which rows contain no dashes, as follows:



This, of course, returns just the one row that has no dashes.


Now I need to parse that number out, and add dashes to it. This is going to be a complicated “Update To” row in an update query, so I’ll first start by creating separate columns for each section of the number. I will then run them all together for the final result:





As you can see, I used the “Mid” function for all three components, though I could just as easily have used Left and Right for the first and third pieces. My result is this:






Now I need to string all three of those values together, and add the dashes as well. I’ll first do that in a Select query, so that I can see the result. Then, when I’m happy with it, I’ll convert it to an Update query. Here’s the new column in the select query:




It looks like a lot, but it wasn’t that bad. I used copy-and-paste to copy the first mid function from a previous column, and concatenate that with a dash. I then used copy-and-paste again to concatenate the second mid and another dash. And then I used copy-and-paste one more time to get the final mid function.


My result set looks like this:






The final step is to take that entire formula in the last column, and use it to update the table. Once again, I’m just going to use copy & paste. The final query looks like this:




And after I run it, I end up with this:




Wrong Length


As you can see, I’ve only got one phone number left that’s problematic. The last one is missing the 3-digit area code. This can’t be fixed programmatically, because I don’t know what area code to fill in. But I can use a query to tell me which records are still troublesome. Of course, in this case, it’s easy to spot the one bad record. But if you’ve got hundreds of rows, this query will help.


I know that valid phone numbers should be exactly 12 characters long, so I’ll use that in the criteria of a new query:




As you can see, I’m using the “Len” function on the Criteria row. This finds any rows that have more or less than 12 characters. These will be rows that require manual intervention. My result is:





There’s a lot more I might want to do to correct my phone numbers, like parsing out the extension, removing other punctuation, or validating country codes. But as you can see, our phone numbers are much cleaner now.


Next week I’ll continue this series with part 3, which will deal with text in a numeric field.



If you’d like to be notified when that post goes live, join my mailing list by filling out the form at the bottom of this page.


Did I miss phone number fixes that you’re dealing with? If so, share your experiences in the comments below.

As always, I’m here to answer your questions. Call (978-582-9550) or email (kmartin@northportsoftware.com) to discuss your Excel or Access issues. I’m always happy to talk over your issue with you. If I can’t steer you in the right direction with just a short answer, we’ll schedule a one-on-one session to work together on it, or I’ll prepare an estimate for a longer project.

Kim












About the Author


Kimberlee Martin is the owner of North Port Solutions and has 30 years of IT experience. She's worked with several programming languages and database tools over the years, with her favorites being Microsoft Excel VBA, SQL Server, and Visual C#. Her passion is helping small businesses gain insights into their business with effective reporting and data management.


Contact Kim at kmartin@northportsoftware.com.



#northportsolutions #learningneverstops #microsoftaccess #msaccess #database #dataanalysis #smallbusiness #excel #microsoftexcel #msexcel

12 views0 comments

Related Posts

See All

Join our mailing list

Subscribers receive email notification of new posts.

Thanks for subscribing!