2009-10-17 @ 16:29
Full Text Search on Heroku
YA!! IT’S SATURDAY NIGHT! YOU ALL KNOW WHAT THAT MEANS! Time to get krunk and do some full text searching. OW! I’d like to share with my tens of loyal readers how I’m doing Full Text Search on Heroku.
Heroku’s documentation lists two ways to get full text indexing working with your Heroku application. They talk about using Ferret and Solr for full text indexes. The Ferret option looks OK, but it requires you to rebuild your indexes every time you push. Solr would work, but it requires an EC2 instance or some third party server. Since my budget is precisely $0, using Solr is out of the picture.
But there is a third option. A very secret option. A devious but fun option. You see, Heroku runs PostgreSQL for each rails application database. They’re running a version new enough (Version 8.3) to have full text index support built in. If we’re willing to throw out database agnosticism, we can take advantage of the database’s indexing capability. For this article, I’d like to hop on the Postgres train and show you how to get full text indexes working with Postgres in your rails application. I’ll also show you how to get those indexes on Heroku so we can use them “in the cloud” (Heroku is in the cloud, right?).
For the rest of this article, I’m going to assume you have PostgreSQL version 8.3 or higher installed already and can get your rails application working with Postgres. Installing postgres is outside the scope of this article, but I found these instructions to be very helpful.
Step 1: Go get some coffee
I love it when instructions tell me to go get some coffee because I always do. I have to follow the instructions right?
Step 2: Install Texticle
Texticle is a gem I wrote to help you define your text indexes on a per model basis. To install texticle, we just do the normal gem install:
$ sudo gem install texticle
The gem is pure ruby and isn’t very long, so I encourage you to peek through the source.
While we’re at it, we should configure rails to load the texticle gem. We need to add it to our envoronment.rb file. Here’s what mine looks like:
[ruby] RAILS_GEM_VERSION = ‘2.3.4’ unless defined? RAILS_GEM_VERSION
require File.join(File.dirname(FILE), ‘boot’)
Rails::Initializer.run do |config| config.time_zone = ‘UTC’
config.gem ‘texticle’ end [/ruby]
Texticle also comes with some handy rake tasks (which we’ll talk about later). In order to get those we’ll need update the rails Rakefile:
[ruby] require(File.join(File.dirname(FILE), ‘config’, ‘boot’))
require ‘rake’ require ‘rake/testtask’ require ‘rake/rdoctask’
require ‘tasks/rails’
require ‘rubygems’
Our texticle rake tasks
require ‘texticle/tasks’ [/ruby]
Step 3: Configuring your index
Let’s pretend we have an Article model. The Article model has a “title” field and a “body” field: [ruby] class CreateArticles < ActiveRecord::Migration def self.up create_table :articles do |t| t.string :title t.text :body
t.timestamps
end end
def self.down drop_table :articles end end [/ruby]
To index those two fields, we just create an index block in the model and list the fields we want to index: [ruby] class Article < ActiveRecord::Base index do title body end end [/ruby]
Declaring this index automatically defines a “search” method on the model that we can use to search our articles: [ruby] >> Article.search(‘coffee instruction’) => [#<Article id: 4, title: "coffee", body: "I like getting coffee to be in instructions", created_at: "2009-10-17 21:42:13", updated_at: "2009-10-17 21:42:13">] >> Article.create(:title => ‘kittens’, :body => ‘kitten poop smells bad, but I still like kittens.’) => #<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33"> >> Article.search(‘kittens’) => [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">] >> [/ruby]
Great! We can search our records. There’s just one catch: we haven’t indexed our data. Doing these types of searches will be slow against large sets of data unless we add an index. Writing these indexes is a PITA, so texticle comes with a handy rake task for generating a migration to create your indexes:
$ rake texticle:migration $ rake db:migrate
After running this, Postgres can use the prebuilt indexes when searching your data.
Just remember: every time you modify columns in your index block, or add new index blocks, you should create a new migration to updated the indexes. If you don’t update the indexes, searches will still work as expected, they just might be kind of slow.
Step 4: Integrating With Heroku
This part is pretty easy. First we update our heroku gem manifest:
$ echo "texticle" >> .gems $ git add .gems $ git commit -m'updating gem manifest' $ git push origin master
Once your code is up on heroku, just tell heroku to migrate the database:
$ heroku rake db:migrate
It’s just that easy! Your indexes should be available on the Heroku database server and your application can use them.
Advanced Texticle Usage
Texticle has a few more features I’d like to briefly mention. The first one is search ranking. We can tell Postgres which field has a higher priority. For example, we can tell Postgres to weigh matches in the article’s title higher than matches in the body:
[ruby] class Article < ActiveRecord::Base index do title ‘A’ body ‘B’ end end [/ruby]
The ranks are ‘A’ through ‘D’, and multiple fields can have the same rank.
We can also group indexes. The index we’ve seen so far will search all columns listed. We can add another index so that we only search the “title” field:
[ruby] class Article < ActiveRecord::Base index do title ‘A’ body ‘B’ end
index(‘title’) { title } end [/ruby]
This gives us a “search_title” method in addition to the “search” method: [ruby] >> Article.search_title(‘kittens’) => [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">] >> [/ruby]
The last thing I want to mention is “rank”. When you perform a search, texticle adds an extra field to your model called “rank”. The rank indicates how well your record matched the search criteria:
[ruby] >> Article.search(‘like’).map { |x| x.rank } => ["0.4", "0.4"] >> Article.search(‘coffee’).map { |x| x.rank } => ["1.4"] >> [/ruby]
Search results are already returned sorted by rank in descending order, so no need to worry about sorting.
Conclusion
I hope you enjoy tickling text with texticle as much as I do. So far, I’ve been pretty happy with this solution.
Things I like:
- It's the right price for use with Heroku (namely $0)
- Easy to configure and deploy
- No need to rebuild indexes on pushes
- Postgres can be configured to use different dictionaries, so you aren't stuck with English
The only drawbacks I’ve found so far are:
- INSERTs and UPDATEs are slower
- It's database specific
Inserts and updates will be slower, but that comes with the territory of adding database indexes. My data is mostly doing reads, so it doesn’t bother me. Texticle is database specific, but other databases are starting to have full text search support. I think texticle could be extended to support other databases, but I’m quite happy with postgres.
Anyway, thanks for reading. The final step is that you should go get another cup of coffee.