OpenRefine is a free, open source application for manipulating all types of data files. Based in Java, it runs on any operating system in your web browser.
Refine is great for quickly getting an overview of the contents of a data set, resolving inconsistencies, and enhancing it with other data—all in a visual, interactive, and efficient manner.
Refine started as a project called Freebase Gridworks which was bought out by Google and rebuilt as GoogleRefine in 2010. Official Google support ended in 2012, prompting a transition to the open source project OpenRefine. GoogleRefine and OpenRefine are the same application, so many tutorials and documentation use the names interchangeably.
Google created a series of slick trailers that act as a good introduction to Refine:
Introduction: https://youtu.be/B70J_H_zAWM
Data Transformation: https://youtu.be/cO8NVCs_Ba0
Data Augmentation: https://youtu.be/5tsyz3ibYzk
Refine is very flexible, so if you have anything that can be visualized in some tabular format—spreadsheets, databases, XML data, RDF, arrays, data stored in JSON—Refine can help you with it. Furthermore, it is designed to be extensible, the community has created numerous specialized plugins and extensions.
If you have Messy Data, such as:
dates in different formats, numeric data stored inconsistently as text strings, inconsistent categorical data, typos, extra white space, multivalued cells
2015-10-14 |
$1,000 |
ID |
10/14/2015 |
1000 |
I.D. |
10/14/15 |
1,000 |
US-ID |
Oct 14, 2015 |
1000 dollars |
idaho |
Wed, Oct 14th |
US$1000 |
Idaho, |
42291 |
$1k |
Ihaho |
“Using OpenRefine by Ruben Verborgh and Max De Wilde, September 2013” |
OpenRefine can help!
Refine use cases include:
David Huynh, a Google developer who originally worked on the project, says OpenRefine is
A power tool for working with messy data.
More powerful than a spreadsheet,
More interactive and visual than scripting,
More provisional / exploratory / experimental / playful than a database.
You can get a copy of his introduction and tutorial here.