Using Access to Clean Excel Data, Part 1


Both Excel and Access are commonly used by small businesses to store critical data. Both are great tools, but Excel lacks some of the Access features that allow you to keep your data set clean. The same is sometimes true of Access, when your database hasn’t been planned and configured properly. But I see it much more commonly in Excel.


Even if you plan to keep your data in Excel, you can use Access tools to clean up the data. To do so, you’ll import your data into Access, use Access features to clean it up, and then export it back out to Excel.


Let’s take a look at some of the messes Excel lets you make. Here’s a sample dataset that I’ll be working with:




It’s a small dataset, and you wouldn’t need Access to help clean up this data. It would be faster to simply make the corrections directly in the Excel sheet. But for today’s discussion, let’s assume that you have a much larger dataset that includes many of these errors. Here are the errors we’ll be looking at:


  • Inconsistencies in the states. Some are abbreviated, others are spelled out.

  • Phone number format issues. Dashes, spaces and parenthesis make these inconsistent.

  • Text and spaces in a numeric field (credit limit).

  • Duplicate records (rows 2 and 8).


I’ll tackle each of these issues in separate posts. For today, let’s look at the first: the inconsistencies in the State column.


Consistency

Consistency is important in your database, especially when you’re working with hundreds or thousands of rows. If your data isn’t clean, you’ll have a hard time getting meaningful reports and charts, and you may also have a hard time locating the correct records.


Let’s take a look at the state column. Notice that in some cases, I’ve got the abbreviations for the state, and in others, I have the entire state name. Once again, remember that we are looking at a very small set of data here, so it would be quick and easy to manually scan the table and fix it. But if you have hundreds or thousands of records, that’s more difficult.


We’re going to take a look at a couple of ways to locate and correct this data, using Access queries.


First, let’s get a unique list of all values in the State column. This will tell me how many fixes I need to make. I’ll use a query with the “Group By” function to get this list:




As you can see above, when I click the “Totals” button (which is on the Query Design ribbon), I get a “Total” row added to my grid. I use the “Group By” function on the State column. This feature groups together all records with a matching state, so that I end up with a unique list of values in this column:




In this small dataset, I have four different values appearing in the State column. I can now see that I need to change all instances of “MASSACHUSETTS” to “MA”, and all instances of “NEW HAMPSHIRE” to “NH”. Since there are only two, I can use two Update queries for this.


To change my query to an “Update” query, I click the “Update” button on the Query Design ribbon, in the “Query Type” group:




When I do that, I get a new row in my grid, labeled “Update To”, as shown above.


You can see that I've set the Criteria to “MASSACHUSETTS”, which means that the query will only act on records that have “MASSACHUSETTS” in the State column. I set the “Update To” value to “MA”. When I execute this query, it will replace all instances of “MASSACHUSETTS” in the State column with “MA”. Only the "State" column will be impacted. If "MASSACHUSETTS" appears in any other column, it'll be unchanged.


Here’s what the table looks like when I’m done:




I’ll repeat the process to replace “NEW HAMPSHIRE” with “NH”.


This method works great if you’ve only got a couple of replacements to make. But what if you’ve got 20 replacements? Or 30? Or 50? This method becomes impractical.


In that case, one option is to create a table that lists all states, both spelled out and abbreviated, and use that to make all replacements in a single query. My table might look like this:




Of course, your list will probably be longer. You need to include every variation of “State” in that first column. Keep misspellings in mind too. If someone typed in “MASS” instead of “MASSACHUSETTS”, you’ll need to add a row for that too.


Now I can create a single query that will replace “MASSACHUSETTS”, “NEW HAMPSHIRE”, “RHODE ISLAND” AND “CONNECTICUT”, using this new table. Let’s build that query.


I’ll start by creating a query that joins my “State” column in the Excel data set to the “StateName” column in this new table. It looks like this:




That line between the two tables is a join, and specifically, it’s an inner join. An inner join tells Access to only show the records that have a match between the two tables. In other words, only show me records where my “State” value is found in the “StateName” column of tblStates. Here’s the result:




As you can see, this query is only showing me two of the records in my Excel data set. That’s because those are the only two rows whose “State” value appears in the “StateName” column of tblStates. These are the records that need to be updated.


I’ll convert this query to an Update query, and then remove all of the columns that aren’t relevant to this query:




There’s only one column in the grid, because that’s the only column I want to update.


Notice the “Update To” line in the grid. I’ve got it set to “[Abbreviation]”. Those square brackets tell Access that it should get the value from a column called “Abbreviation”. This column must appear in one of the tables included in the query – here, you can see that “Abbreviation” is a column in tblStates. Because of the join, Access knows which abbreviation is appropriate for each state.


When I execute this query, my table looks like this:




As you can see, one quick query has corrected all states.


If that was the only fix you needed to make in your data set, you could now export that table back out to Excel. In my case, though, I’ve got several other fixes to make. In my next post, I’ll look at fixing the formatting issues in the phone number column. If you’d like to be notified when that post goes live, join my mailing list by filling out the form at the bottom of this page.


Have you tried this? If so, share your experiences in the comments below.


As always, I’m here to answer your questions. Call (978-582-9550) or email (kmartin@northportsoftware.com) to discuss your Excel or Access issues. I’m always happy to talk over your issue with you. If I can’t steer you in the right direction with just a short answer, we’ll schedule a one-on-one session to work together on it, or I’ll prepare an estimate for a longer project.


Kim









About the Author


Kimberlee Martin is the owner of North Port Solutions and has 30 years of IT experience. She's worked with several programming languages and database tools over the years, with her favorites being Microsoft Excel VBA, SQL Server, and Visual C#. Her passion is helping small businesses gain insights into their business with effective reporting and data management.


Contact Kim at kmartin@northportsoftware.com.


#northportsolutions #learningneverstops #microsoftaccess #msaccess #database #dataanalysis #smallbusiness #excel #microsoftexcel #msexcel

10 views0 comments

Related Posts

See All

Join our mailing list

Subscribers receive email notification of new posts.

Thanks for subscribing!