Importing Data with Rails

April 19, 2008

Often when working with Rails applications, you need to import data from other sources. A common source is an excel spreadsheet. A simple import consists of reading each line in the spreadsheet and creating a record in the database for each line. You could do this is a small Ruby script with SQL, you wouldn't need Rails. But sometimes the import is more complicated. For example, you may want to run your application validation logic on each record. Also, maybe you need to create associated record for each row.

To handle this kind of thing, it can be helpful to use your ActiveRecord data model. To do that, you can simply create a Ruby script and add these few lines at the top:

require File.join(File.dirname(__FILE__), "..", "..", "config", "boot")
require File.join(File.dirname(__FILE__), "..", "..", "config", "environment")

This will boot up the Rails environment when your script starts, and then you have full access to your Rails models. You could write a procedural script to handle that, but I've found that creating an object-oriented class gives you a little bit cleaner, more re-usable framework. So let's just get right to the code. Here is the code for a base class for your data import:

class DataImport

  attr_reader :file, :fields, :row_map, :default_e

  #Create DSL methods for subclasses
  class << self
    def default_environment(env)
      self.send(:define_method, :default_environment) do
        env
      end
    end
    def default_file(file_name)
      self.send(:define_method, :default_file) do
        file_name
      end
    end
  end

  def initialize(env, file)
    load_rails(env || respond_to?(:default_environment) ? 
      default_environment : "development")
    @file = file || default_file
    raise "You must specify a file" unless @file
  end

  def self.run(env, file)
    new(file, env).run
  end

  def run
    open(file).each_with_index do |line, i|
      initialize_row!(line, i)
    end
  end

  def initialize_row!(line, i)
    tokenize_row!(line)
    if i < 1
      initialize_fields!
    else
      initialize_row_map!
      process_row
    end
  end

  def process_row
    puts row_map.inspect
  end

  private

    def tokenize_row!(line)
      @row = line.split('|')
    end

    def initialize_fields!
      @fields = @row.map{|e| e.chomp.to_sym}
    end

    def initialize_row_map!
      @row_map = {}
      @row.each_with_index do |c, i|
        @row_map[fields[i]] = c.blank? ? nil : c.strip
      end
    end

    def load_rails(env)
      ENV['RAILS_ENV'] = env
      require File.join(File.dirname(__FILE__), "..", "..", "config", "boot")
      require File.join(File.dirname(__FILE__), "..", "..", "config", "environment")        
    end

end

Alright, that's a pretty big chunk of code, but this is the implementation of a base class that you will reuse. Don't worry, your actual import class will be much shorter. In other words, you can copy and paste this right into your app and use it as is, but if you are interested to find out how it works, read the next few paragraphs.

So the first interesting thing you'll encounter in this code is the DSL-ish methods. To understand how this works, you really need to read Why The Lucky Stiff's Seeing Metaclasses Clearly. The talk Dave Thomas gave just the other day at the NovaRUG would help too. But basically what it does is define 2 class methods that are intended to be used by subclasses during class definition. When called, they will define methods that the base class can then use. This the concept I blogged about the other day in action. They are conceptually the same thing as the definition of the belongs_to and has_many methods in ActiveRecord. It will make more sense when you see an implementation.

Next up is the constructor which handles setting the file instance variable for our data import class, as well as loading up rails with the right environment specified. After that are class and instance methods both called run. The idea here is that we want to work with an instance of the data import class, but it will be convenient to just call OurDataImport.run.

The work happens in the run instance method. This opens up the file and starts processing it line by line. In this method I'm trying to employ a technique, or more of a style I guess, that Marcel Molina spoke about at the DC Ruby Users Group. The idea is that you should strive to as much as possible have all of the code within a method be at the same level of abstraction. If you look at this whole method:

def initialize_row!(line, i)
  tokenize_row!(line)
  if i < 1
    initialize_fields!
  else
    initialize_row_map!
    process_row
  end
end

It's easy to read it and understand what it is going to do. First we are going to tokenize the row, then if it is the first row, we will initialize the fields, otherwise, we will initialize the row map and process the row. For example, this method could be written like this:

def initialize_row!(line, i)
  tokenize_row!(line)
  if i < 1
    initialize_fields!
  else
    @row_map = {}
    @row.each_with_index do |c, i|
      @row_map[fields[i]] = c.blank? ? nil : c.strip
    end
    process_row
  end
end

But there is an abstraction-level switching that you have to go through mentally once you get to the first line after the else. The rest of the method is composed of intent-revealing methods, but then we just have this lower-level chunk of code that deals with setting instance variables. So don't do that, the other implementation is cleaner, leads to code that is composed well and is easier to test and extend.

Ok, so now let's put this to use. Create a rails app with a simple user model:

$ rails myapp
$ cd myapp
$ script/generate model user name:string email:string
$ rake db:migrate

Now copy the whole base DataImport class from above into db/data/data_import.rb. Then create a data file at db/data/users.txt with something like this:

name|email
Paul Barry|mail@paulbarry.com
Someone Else|someone_else@example.com

And then finally we'll create an implementation of our data import at db/data/user_data_import.rb

require 'data_import'
class UserDataImport < DataImport

  default_file "users.txt"

  def process_row
    user = User.create!(row_map)
    puts "Created => #{user.inspect}"
  end

end

UserDataImport.run(ARGV[0], ARGV[1])

So now we have a pretty clear, concise file that explains what we are doing. You can see the call to default_file that allows us to set our default file name using a clean, DSL-ish syntax. We could also call default_environment there as well if we wanted to, but we don't have to. This is a very simple import where we just create a user for each row. The last line of the script runs the import, passing in the command line arguments. If you pass no arguments, it will work, using "development" for the environment and "users.txt" for the file name. A real data import is likely to do some more interesting work with the data, but at least this gets all the plumbing of processing the data file out of the way for you and allows you to focus on the logic of what you need to do with the data. All that's left to do is simply run the db/data/user_data_import.rb script.

Sidenote: I've found that if your want to run the script from textmate, you need to add this line top of your script, due to a conflict in the ruby libraries provides with TextMate.

$:.reject! { |e| e.include? 'TextMate' }

Posted in Technology | Tags Rails, Ruby

Comments Comments Feed

1. I write import scripts a lot for Rails, but I do it very differently. First, I defined them as Rake tasks.

I usually just create a data.rake file in lib/tasks. Any task defined this way can load the whole Rails environment by adding a dependency on the environment task Rails ships with (just add => :environment after your task name). You can pass data into your tasks, which file to import for example, using environment variables.

Finally, I would just use FasterCSV (or the standard CSV library) to handle the parsing. It has all of the features demoed here and much more.

In short, I feel this is just a little too much boilerplate code. You are going through too much effort to recreate tools Rails ships with or that are easily downloaded.

Also, the environment.rb file included with Rails loads boot.rb first thing, so that step in unneeded. Have a peek at the code.

# Posted By James Edward Gray II on Thursday, May 15 2008 at 9:09 AM

2. @James

Thanks for the feedback. Rake tasks might be a better way to go, I'll have to try that. I like using the pipe-separated, because it doesn't require any library at all and I kind of just got used to doing in other languages, like Perl and Java. It seems pragmatic to me because doing line.split("|") is so trivial and you can do that in any language without relying on a library that may or may not be installed. There might be some cases where FasterCSV would be a good option, but using pipe-separated works well in simple cases for me.

But as far as the boilerplate code, you are right, less code is better code, so I'll look into ways of eliminating the unnecessary complication.

# Posted By Paul Barry on Thursday, May 15 2008 at 10:34 AM

3. @James: For some import routines, a CSV import is going to be fine. But there are lots of cases where a simple CSV import isn't going to work, and in those cases, having a more controllable method for doing imports, like this one, will be quite useful.

For example, I am importing a data file provided by a vendor. One line per record is what they provide, but I need to break out the data into multiple associated records. E.g., each line in the vendor-provided file is a catalog item, but I need item, manufacturer, and individual sku (think variation or edition) records. Worse, sometimes there is more than one manufacturer, all in the same field, separated by " and " when there are two, and separated by ", " when there are more than two.

In other words, sometimes you get clean data that can cleanly import. Other times, you gotta massage the data before you can create records for it. In that case, this method is going to work, and the CSV libraries are going to cry like babies.

# Posted By Alderete on Sunday, May 25 2008 at 11:52 PM

4. Very helpful post! Thank you for sharing.

# Posted By Serge on Sunday, January 4 2009 at 11:28 AM

5. Paul,

Can you suggest a way to insert record ids using the method outlined here. The insert works wonderfully for all columns except primary key column.

Thanks!

# Posted By 142.177.189.200 on Sunday, January 4 2009 at 2:22 PM

6. If you need to set the values for id to a specific value, you can't set it using a hash, you have to do record.id = whatever. Try something like this in process row:

user = User.new
row_map.each do |k,v|
user.send("#{k}=", v)
end
user.save!

# Posted By Paul Barry on Sunday, January 4 2009 at 7:05 PM

7. That worked flawlessly. Many thanks!

# Posted By Serge on Sunday, January 4 2009 at 9:03 PM

8. Great sample library! Note that the arguments in DataImport.run are inverted.

# Posted By Rod on Tuesday, June 23 2009 at 2:45 PM

9. brian atwood maniac pumps the perfect wear-with-everything color. These wardrobe staples have 5" stiletto heels and hidden 1 1/2" seamless platforms. These Brian Atwood 'Loca' studded pumps are so ravishing, that I'd totally be willing to pop an aspirin before wearing them. And I really lovebrian atwood mesh sandals and Brian Atwood Asymmetric Patent Pumps.brian atwood nude patent leather katie lee pumps. Bold shoes like these sculptural purple suede Brian Atwood 'Lola' pumps make an outfit. purple suede Brian Atwood 'Lola' pumpswith a heel that measures approximately 140mm / 5.5 inches with a 30mm / 1 inch concealed platform. Brian Atwood pumps have a closed toe, cutout detail at arch with elasticated straps and a leather sole.
Herve Leger Dress shop specialises in superior Herve Leger Dress, provides hundreds of discount and fashion Herve Leger Dress, sexy Herve Leger Dress, Herve Leger Skirt, disount Herve Leger Dress-a professional Herve Leger Shop.

# Posted By Brian Atwood maniac pumps on Saturday, August 28 2010 at 3:18 AM

10. Wonderful brian atwood shoes for your style.BRIAN ATWOOD High-heeled sandals Purple is in stock in our store only one size in 36.5 EU. Elegant purple color, you will like it.
And these Pink Brian Atwood Maniac Patent Platform Pumps not only romantic and gorgeous too! It's patent leather leave us sunshine warmness and sexy!
When I original saw these brian atwood loca studded pumps in a metallic version, a while ago, I likeable them OK, but now I am going unhinged for the lavender version!
I really love brian atwood nico patent pump.
And this year, brian atwood maniac pumps are so hot among stars, I saw many times that Victoria-Beckham wearing them on the show time.
If you don’t know about brian atwood…You should. When everyone else is obsessing over the latest pair of Loubs or YS’L’s it’s very easy to overlook awesome shoe lines and designers who push the envelope. brian atwood is one of those designers.I’m a fan of his lines because Atwood really sets out to make his shoes a conversation piece. The use of alternative textures and fabrics makes brian atwood shoes unique. When wearing his shoes you are guaranteed a compliment from someone and a definate shoe in for best footwear. So ladies (and gents) get into these! And I know my card carrying “Shoe Wh*res” are going to love these!
Buy brian atwood shoes at http://www.brianatwoodcom.com. Buy brian atwood pumps at http://www.brianatwoodcom.com/brian-atwood-brian-atwood-pumps-c-65_67.html.
When I original saw these brian atwood loca studded pumps in a metallic version, a while ago, I likeable them OK, but now I am going unhinged for the lavender version!
When it comes to intricate detailing and superb craftsmanship, you know a pair of brian atwood lexia is worth the investment.
Back by popular demand... brian atwood maniac tan, the perfect wear-with-everything color. These wardrobe staples have 5" stiletto heels and hidden 1 1/2" seamless platforms. In a beautiful, very neutral nude tan.
brian atwood whip snake shoe with Exotic whipsnake hidden platform and back metal inset.
Take a walk on the wild side with Atwood Jagger printed knee-high boots. Style them knee high to add sass to an LBD or wear them with the fold-over flap turned up to work this season's hot over-the-knee trend.
brian atwood cage peep Price: $228. 140mm Covered heel. Rounded peep toe. 40mm covered platform. Interwoven cut outs around front. Ankle strap with adjustable metal buckle. Leather insole and lining. Leather sole. Made in Italy.
When it comes to intricate detailing and superb craftsmanship, you know a pair of brian atwood crystal court shoe is worth the investment.
Brian Atwood Platform Chain Pumps with A silver chain scales the lace-up back of this vamped-up pumps.
brian atwood snakeskin platforms is the classic and the elastic heel detail was used again from last Fall.
brian atwood patent leather and mesh Futuristic patent leather style with mesh details and contrast stitching.
Heel measures approximately 95mm/ 3.5 inches. Let your feet do the talking in brian atwood dorota shoes. Wear this chic style with off-duty denim or to liven up office looks.

# Posted By Brian Atwood loca studded pumps on Saturday, August 28 2010 at 3:19 AM

11. A grand wedding on the beach may be the common dream of most single girls. The bright sunshine, the blue sky and the open sea can satisfy all the demands of the romantic girls. Of course, everything will be perfect with a beach wedding dresses.
Every woman deserve the most beautiful wedding dresses at her special day!The most beautiful smile in the word is belong to the brides when she find her wedding dresses!Love is the time when you slows down in front of the shops selling wedding dresses.
Make the day of your engagement a special one with a A-line or princess wedding dresses!Step like a goddess in tea- length ball gowns,formal ball gowns,victorian ball gowns,debutante ball gowns,prom ball gowns, contemporary ball gownsand vintage ball gowns available in chiffon, organza, silk and satin only at weddingdressesnow.com.
Congratulations! Your little daughter is getting married! But being the mother of the bride (or MOB) is not an easy role. Once upon a time mothers of the bride did most of the wedding planning, and thus got to have their own dreams realized. First of all, choose one from the perfect mother of the bride dresses for yourself here and to be the most elegant mother of bride!
Flower girl dresses must be the perfect accessory for the bride's dress. The flower girls look like the bells ring in the wedding announcements as they toll down the aisle one by one or hand in hand. If you are having flower girls at your wedding, take a look at these beautiful selection of Flower girl dresses from http://www.weddingdressesin.com.

# Posted By A line wedding dresses on Saturday, August 28 2010 at 3:19 AM

12. Find ugg and Australian sheepskin boots, learn the history behind the Ugg boot, and when and what to wear with Cheap ugg boots, as well as cleaning and caring forAuthentic Ugg boots,Cheap ugg cardy boots,20%-40% high discount

# Posted By p on Wednesday, September 1 2010 at 2:09 PM

13. There are many people like to searching the famous brand shoes
online.
Nike Air Max
is always the best choice and you will fall in love when the first sight.As the most sought-after products,
the famous brand shoes
are not only fashionable but also practical.Nike Air Max which is one of the hottest shoes in the summer are available in a variety of styles, colors and materials.
Nike Air Max shoes
are designed for yourself.You can pick up a pair now at www.online268.com.

# Posted By online268 on Wednesday, September 1 2010 at 4:21 PM

14.
There is no doubt that when looking for Cheap nfl jerseys to buy you still want nfl shop to get something buy nfl jerseys authentic. Now that you know that there are Cheap nfl jerseys outlets out there for you to purchase and they are high quality and authentic, and I think you can buy the best one.

# Posted By fewrewree on Wednesday, September 1 2010 at 7:14 PM

15. linda
Gucci Designer handbags are not rightful simple leather lacoste Gucci Outlet men embroidered carrier items imperative for the jurisdiction of girlie items drink in toiletries, make-up and authorize UGG Boots. No girlfriend! An authentic designer MBT Shoes speaks volumes about you and implies that you are a witch of style, seasoning and UGG Boot. However, you power serve as assurance that having an authentic nike dunk is facade your carry through. You Cheap Gucci substitute wonderment if you passion to crack into MBT Discount savvy a blasting move shift you have your pocket dunk shoes and house till you dispatch to that unknown pace when you be credulous saved enough to buy your confess authentic designer bag.

# Posted By nike dunk on Thursday, September 2 2010 at 2:14 AM

16. Louis Vuitton replica
replica louis vuitton
replica handbag
replica bags
replica bag
LV

# Posted By ss on Thursday, September 2 2010 at 7:43 AM

Add a Comment

(If you leave this blank, your IP address will be displayed instead)

(Optional, will not be displayed on the site)