top of page
Writer's pictureKimberlee Martin

Using Access to Clean Excel Data, Part 3



If you’ve been following along with my recent posts, you’ll know that I’ve been talking about using Microsoft Access tools to clean Excel data. We all know that Excel is a great tool for tracking your data, but it allows you to get sloppy. If you have a license for Microsoft Access, you can import your Excel data and use Access tools to clean it up. So far, we’ve looked at cleaning U.S. states and phone numbers. Today we’ll look at ways to clean up your numeric fields when they contain text.



The Problem


To get the ball rolling, take a look at the Credit Limit column in my Microsoft Access table:


This is a table that I imported from an Excel file. As you can see, the Excel file doesn’t have any validation in place, so users have entered some invalid values in the Credit Limit column. I’m sure you noticed the “TBD” and the “75K”, but the “50 000” is also troublesome, because it contains a space.


To demonstrate the problem with these text values, let’s say that we’ve decided to increase all credit limits by 10%. In both Excel and Access, this will cause errors, as you see here:



I’m getting error messages when I try to do math with a text value.


As I mentioned previously, you don’t need the Access tools here, because its such a small set of data. But imagine that you’ve got hundreds or thousands of records. In that case, you’ll need some help figuring out which rows are problems, and how to fix them.


Let’s see how to clean up this data using Access tools.



Finding Records with Issues


First, let’s create a query that identifies all rows that are problematic. I’ll create a query with the “IsNumeric” function:



Note the second column in my grid. I’ve used the “IsNumeric” function to determine weather or not the Credit Limit field contains a numeric value. The result looks like this:



The second column is a Boolean value. If you’ve used Microsoft Access tables, you might be familiar with the Yes/No data type. This is the same thing, except that a “Yes” is represented by -1, and a “No” is represented by 0. If you’re familiar with Boolean values in SQL Server or some other database tool, you can think of -1 as True and 0 as False.


As you can see, the IsNumeric function is returning a value of -1 when the Credit Limit is a valid number, and 0 when its not.


My next step is to add a filter on that column, so that only the rows with non-numeric credit limits are shown. I do that by filtering for 0 on the Criteria row:



When I re-run the query, I am shown only the rows that contain an invalid value in Credit Limit:




Correcting Issues


Now that I can clearly see which rows are problematic, I can identify solutions.


In my example, I’ve got three problems:

  • One record has “TBD” in Credit Limit, which, in this case, stands for “To Be Determined”. This might be a value that users initially type into the column until our credit department can determine the actual credit limit.

  • One record has spaces in it. Your data might have other invalid characters, like periods or dashes. For our sample dataset, all we need to do is remove the space, and we’ll be left with a valid credit limit.

  • One record uses a “K” to stand for Thousands. That means that “75K” is the same is “75000”. I will assume that whenever I see a “K” in the Credit Limit field, it means thousands. Your data might contain an “M” for million, or some other indicator.


Of course, since there are only three records, I could easily correct them by hand. But let’s assume that you’ve got many problematic records in your dataset. We’ll see how to use Update queries to correct these three issues.


A Series of Queries

Now that I know what my problems are, I can easily fix them with a series of “Update” queries. In my case I’ll need three of them, and each one will replace one of the invalid values:

  • The first query will replace “TBD” with “0”. Since the credit limit hasn’t been decided yet for these clients, a credit limit of $0 will suffice for now.

  • The second query will remove spaces. I’ll use the Replace function to replace spaces with an empty string. This can be a little confusing for newcomers, but think of it this way: I’ll replace a space with nothing, thus eliminating the space.

  • Finally, the last query will replace “K” with “000”, which will convert the number into thousands.


You might think you can make this process more efficient by combining all three into a single query, and you’d be right. It can be done. However, it makes for a confusing query. Let me show you the three separate update queries first, and then we’ll take a look at the combined query that would do the same job. Note the “Update To” and “Criteria” rows in the following three queries:


TBD:




Replacing space with empty string:


Note the criteria. I’m using a wildcard search to find any Credit Limit that contains a space.

Also note the “Replace” function in the Update To row, replacing all spaces found in Credit Limit with a blank text string.



And finally, replacing the “K” with “000”:



Once again, I’m using a wildcard search in Criteria, and I’m using the Replace function in the Update To row.


After I run all three queries, here’s my table:



As you can see, our data now looks good. Once I export this data back out to Excel, I’ll be able to do math on the Credit Limit column, since all values are numeric.


A Single Step


If you’re new to Access queries, stick with the method above, using multiple queries to clean up your data. If you’re ready for a bigger challenge, you could have corrected all of these rows with a single query, but it would’ve been a doozy.


After resetting my data, I’ll build a query that includes all three of the fixes we’ve identified. I’ll need to nest the three fixes, since they are all acting on a single field. My three “Replace” statements are:


Replace([Credit Limit]," ","")

Replace([Credit Limit],"K","000")

Replace([Credit Limit],"TBD","0")


Here’s how it looks when I nest them:



Told you it was going to be a doozy!


The key is to build just one Replace statement first, and test to make sure you’ve got the right syntax. Then add a second Replace, using the first Replace as the first argument of the new Replace, and test again. Repeat for the last Replace.


Nesting your functions can be tough, and in this case, probably not worth it. The three individual queries each ran quickly, and were safer in their simplicity. I’d only take this complicated route if I had to run this update on thousands of records on a regular basis, and I needed the performance improvement. And, of course, I’d perform ample testing before running this on a live data set.


Always – always – back up your data before testing your Update queries.


Next week, in the final installment of this series, we’ll take a look at detecting and removing duplicates from our imported Excel data. To be notified when that post goes live, subscribe using the form below.



Have you tried this? Share your experience in the comments below.



Got Access questions? Call me at (978) 582-9550 or email kmartin@northportsoftware.com. If I can give you a quick answer to your question, I’m happy to do so. Otherwise, we can discuss arrangements for a one-on-one consulting session or a larger project.


Kim

About the Author




Kimberlee Martin is the owner of North Port Solutions and has 30 years of IT experience. She's worked with several programming languages and database tools over the years, with her favorites being Microsoft Excel VBA, SQL Server, and Visual C#. Her passion is helping small businesses gain insights into their business with effective reporting and data management.



© North Port Solutions, LLC, 2022. All rights reserved.

13 views0 comments

Related Posts

See All

Opmerkingen


Join our mailing list

Subscribers receive email notification of new posts.

Thanks for subscribing!

bottom of page