Full Text Search on Heroku

Posted by – October 17, 2009

YA!! IT’S SATURDAY NIGHT! YOU ALL KNOW WHAT THAT MEANS! Time to get krunk and do some full text searching. OW! I’d like to share with my tens of loyal readers how I’m doing Full Text Search on Heroku.

Heroku’s documentation lists two ways to get full text indexing working with your Heroku application. They talk about using Ferret and Solr for full text indexes. The Ferret option looks OK, but it requires you to rebuild your indexes every time you push. Solr would work, but it requires an EC2 instance or some third party server. Since my budget is precisely $0, using Solr is out of the picture.

But there is a third option. A very secret option. A devious but fun option. You see, Heroku runs PostgreSQL for each rails application database. They’re running a version new enough (Version 8.3) to have full text index support built in. If we’re willing to throw out database agnosticism, we can take advantage of the database’s indexing capability. For this article, I’d like to hop on the Postgres train and show you how to get full text indexes working with Postgres in your rails application. I’ll also show you how to get those indexes on Heroku so we can use them “in the cloud” (Heroku is in the cloud, right?).

For the rest of this article, I’m going to assume you have PostgreSQL version 8.3 or higher installed already and can get your rails application working with Postgres. Installing postgres is outside the scope of this article, but I found these instructions to be very helpful.

Step 1: Go get some coffee

I love it when instructions tell me to go get some coffee because I always do. I have to follow the instructions right?

Step 2: Install Texticle

Texticle is a gem I wrote to help you define your text indexes on a per model basis. To install texticle, we just do the normal gem install:

  $ sudo gem install texticle

The gem is pure ruby and isn’t very long, so I encourage you to peek through the source.

While we’re at it, we should configure rails to load the texticle gem. We need to add it to our envoronment.rb file. Here’s what mine looks like:

RAILS_GEM_VERSION = '2.3.4' unless defined? RAILS_GEM_VERSION

require File.join(File.dirname(__FILE__), 'boot')

Rails::Initializer.run do |config|
  config.time_zone = 'UTC'

  config.gem 'texticle'
end

Texticle also comes with some handy rake tasks (which we’ll talk about later). In order to get those we’ll need update the rails Rakefile:

require(File.join(File.dirname(__FILE__), 'config', 'boot'))

require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'

require 'tasks/rails'

require 'rubygems'

## Our texticle rake tasks
require 'texticle/tasks'

Step 3: Configuring your index

Let’s pretend we have an Article model. The Article model has a “title” field and a “body” field:

class CreateArticles < ActiveRecord::Migration
  def self.up
    create_table :articles do |t|
      t.string :title
      t.text   :body

      t.timestamps
    end
  end

  def self.down
    drop_table :articles
  end
end

To index those two fields, we just create an index block in the model and list the fields we want to index:

class Article < ActiveRecord::Base
  index do
    title
    body
  end
end

Declaring this index automatically defines a “search” method on the model that we can use to search our articles:

>> Article.search('coffee instruction')
=> [#<Article id: 4, title: "coffee", body: "I like getting coffee to be in instructions", created_at: "2009-10-17 21:42:13", updated_at: "2009-10-17 21:42:13">]
>> Article.create(:title => 'kittens', :body => 'kitten poop smells bad, but I still like kittens.')
=> #<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">
>> Article.search('kittens')
=> [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">]
>>

Great! We can search our records. There’s just one catch: we haven’t indexed our data. Doing these types of searches will be slow against large sets of data unless we add an index. Writing these indexes is a PITA, so texticle comes with a handy rake task for generating a migration to create your indexes:

  $ rake texticle:migration
  $ rake db:migrate

After running this, Postgres can use the prebuilt indexes when searching your data.

Just remember: every time you modify columns in your index block, or add new index blocks, you should create a new migration to updated the indexes. If you don’t update the indexes, searches will still work as expected, they just might be kind of slow.

Step 4: Integrating With Heroku

This part is pretty easy. First we update our heroku gem manifest:

  $ echo "texticle" >> .gems
  $ git add .gems
  $ git commit -m'updating gem manifest'
  $ git push origin master

Once your code is up on heroku, just tell heroku to migrate the database:

  $ heroku rake db:migrate

It’s just that easy! Your indexes should be available on the Heroku database server and your application can use them.

Advanced Texticle Usage

Texticle has a few more features I’d like to briefly mention. The first one is search ranking. We can tell Postgres which field has a higher priority. For example, we can tell Postgres to weigh matches in the article’s title higher than matches in the body:

class Article < ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end
end

The ranks are ‘A’ through ‘D’, and multiple fields can have the same rank.

We can also group indexes. The index we’ve seen so far will search all columns listed. We can add another index so that we only search the “title” field:

class Article < ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end

  index('title') { title }
end

This gives us a “search_title” method in addition to the “search” method:

>> Article.search_title('kittens')
=> [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">]
>>

The last thing I want to mention is “rank”. When you perform a search, texticle adds an extra field to your model called “rank”. The rank indicates how well your record matched the search criteria:

>> Article.search('like').map { |x| x.rank }
=> ["0.4", "0.4"]
>> Article.search('coffee').map { |x| x.rank }
=> ["1.4"]
>>

Search results are already returned sorted by rank in descending order, so no need to worry about sorting.

Conclusion

I hope you enjoy tickling text with texticle as much as I do. So far, I’ve been pretty happy with this solution.

Things I like:

  • It’s the right price for use with Heroku (namely $0)
  • Easy to configure and deploy
  • No need to rebuild indexes on pushes
  • Postgres can be configured to use different dictionaries, so you aren’t stuck with English

The only drawbacks I’ve found so far are:

  • INSERTs and UPDATEs are slower
  • It’s database specific

Inserts and updates will be slower, but that comes with the territory of adding database indexes. My data is mostly doing reads, so it doesn’t bother me. Texticle is database specific, but other databases are starting to have full text search support. I think texticle could be extended to support other databases, but I’m quite happy with postgres.

Anyway, thanks for reading. The final step is that you should go get another cup of coffee.

23 Comments on Full Text Search on Heroku

Respond

  1. Mark Holton says:

    Awesome, thanks for sharing the great gems and info — always learn something from these posts even if I don’t use it directly.
    -loyal reader (and former student!)

  2. Ha! Of course! It’s always good to add new tools to the old programming toolbox.

  3. Great post! I’m still running MySQL locally but this might be the tipping point.

  4. [...] Full Text Search on Heroku – Using postgres native FT indexing. [...]

  5. Jason says:

    Just a suggestion, instead of adding

    ## Our texticle rake tasks
    require ‘texticle/tasks’

    to the Rakefile, you’re better off copying the tasks.rb file to the lib/tasks directory as this is the default location which rake will look for custom tasks. Putting a require statement in the Rakefile will force rake to run the task from the gems repository which might not be what users might expect, especially if they have frozen the gem.

  6. Jason says:

    I’m having trouble with the schema.rb that is generated as a result of running the migration generated by rake testicle:migration.

    Once you have the migration, it creates indexes which schema.rb does not really recognise.

    This causes all sorts of issues with rake test, as this runs a db:test:load which tries to create/update the test database using schema.rb which is now invalid.

    Is there a solution to this?

  7. Jason says:

    The AR Schema loader doesn’t like the indexes that Texticle generates. This is why tests fall apart.

    The solution requires redefining the db:schema:dump command to ensure that
    1. add_index … [nil] statements are removed
    2. migrations are appended to the end of the schema.rb file

  8. Alderete says:

    This might be a minor consideration for folks starting a green fields project, but a long time ago Ryan Bates did a Railscast on adding a basic search form to a Rails app:

    http://railscasts.com/episodes/37-simple-search-form

    This involves adding a search class method to the model to be searched. Named “search”. And this same example is used in other Railscasts, and even other people’s tutorials that I’ve seen. So for people who used this Railscast as a starting point for their search forms, there’s a name collision with Texticle’s full text search method.

    Might I suggest that Texticle’s search method be renamed to “tsearch”, or something similar, to be less likely to collide with existing search methods? Or at least provide a way to not generate a “search” method, for projects where there is already a search method on a model.

    (Perhaps using the index(‘name’) method of declaring indexes does this, and if you only do that, no “search” method is created. It would be good to explicitly document that.)

  9. Daniel says:

    Nice plug-in! Say if my Article model has a tag list using a tagging system like “is_taggle” where it’s store in a different table. I want to extent this to be able to search the tags is well without searching multiple indexes. Any pointers on where to start? Does postgresql full text search support something like this?

  10. @ Daniel

    I am using acts_as_taggable_on_steroids and you can have a cached_tag_list column, so I indexed on that, and it works pretty good.

  11. Bloody great article – just spent the last few hours looking for exactly this. You’re a champion.

    Love the mullet gravatar on github – lol.

  12. dud3 says:

    I get an error when i try to run rake texticle:migration

    ~/rails/business$ rake texticle:migration –trace
    (in /home/dude/rails/business)
    ** Invoke texticle:migration (first_time)
    ** Invoke environment (first_time)
    ** Execute environment
    ** Execute texticle:migration
    rake aborted!
    uninitialized constant Busines
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:443:in `load_missing_constant’
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:80:in `const_missing’
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:92:in `const_missing’
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/inflector.rb:361:in `constantize’
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/inflector.rb:360:in `each’
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/inflector.rb:360:in `constantize’
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/core_ext/string/inflections.rb:162:in `constantize’
    /usr/local/lib/ruby/gems/1.8/gems/texticle-1.0.2/lib/texticle/tasks.rb:13
    /usr/local/lib/ruby/gems/1.8/gems/texticle-1.0.2/lib/texticle/tasks.rb:12:in `each’
    /usr/local/lib/ruby/gems/1.8/gems/texticle-1.0.2/lib/texticle/tasks.rb:12
    /usr/local/lib/ruby/gems/1.8/gems/texticle-1.0.2/lib/texticle/tasks.rb:9:in `open’
    /usr/local/lib/ruby/gems/1.8/gems/texticle-1.0.2/lib/texticle/tasks.rb:9
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `call’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `execute’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `each’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `execute’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:597:in `invoke_with_call_chain’
    /usr/local/lib/ruby/1.8/monitor.rb:242:in `synchronize’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:590:in `invoke_with_call_chain’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:583:in `invoke’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2051:in `invoke_task’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `top_level’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `each’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `top_level’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2068:in `standard_exception_handling’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2023:in `top_level’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2001:in `run’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2068:in `standard_exception_handling’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:1998:in `run’
    /usr/local/lib/ruby/gems/1.8/gems/rake-0.8.7/bin/rake:31
    /usr/local/bin/rake:19:in `load’
    /usr/local/bin/rake:19

  13. Yang says:

    Hi, great tutorial! I almost got it to work except one part.

    I can search for a query on console prompt just fine, but in rails app, it gives an error.

    def search
    @query = params[:query]
    @posts = Post.search(@query)
    @total_hits = @posts.size
    end

    When search is performed, I get the following error.

    SQL (0.0ms) PGError: ERROR: syntax error at or near “as”
    LINE 2: plainto_tsquery(E’google’)) as rank) AS count_po…
    ^
    : SELECT count(posts.*, ts_rank_cd((to_tsvector(‘english’, coalesce(posts.title, ”) || ‘ ‘ || coalesce(posts.content, ”))),
    plainto_tsquery(E’google’)) as rank) AS count_posts_all_ts_rank_cd_to_tsvector_english_coalesce_posts_t FROM “posts” WHERE (to_tsvector(‘english’, coalesce(posts.title, ”) || ‘ ‘ || coalesce(posts.content, ”)) @@ plainto_tsquery(E’google’))

    ActiveRecord::StatementInvalid (PGError: ERROR: syntax error at or near “as”
    LINE 2: plainto_tsquery(E’google’)) as rank) AS count_po…
    ^
    : SELECT count(posts.*, ts_rank_cd((to_tsvector(‘english’, coalesce(posts.title, ”) || ‘ ‘ || coalesce(posts.content, ”))),
    plainto_tsquery(E’google’)) as rank) AS count_posts_all_ts_rank_cd_to_tsvector_english_coalesce_posts_t FROM “posts” WHERE (to_tsvector(‘english’, coalesce(posts.title, ”) || ‘ ‘ || coalesce(posts.content, ”)) @@ plainto_tsquery(E’google’)) ):
    (__DELEGATION__):2:in `__send__’
    (__DELEGATION__):2:in `with_scope’
    app/controllers/posts_controller.rb:146:in `search’

    Any clues? Also, how would you incorporate paginate into search results?

    Thank you in advance.

  14. Yang says:

    I am pretty new to Postgres, but this is driving me nuts.

    Everything works on console, but it’s not working at all in rails app. I tried to use some bogus query string I know will return empty list. On console, it works fine, but in rails app, it returns results from previous query…. What could be different from console and rails app????

  15. [...] full-text search and since Heroku uses Postgres, I could use other plug-ins like acts_as_tsearch or texticle for free. Free is important to me, since it’s not making any [...]

  16. dud3 says:

    How do you search nested objects?

    example, products has many deals.

    I want to return products if deals matches the search.

  17. mtraven says:

    dud3 and aaron:

    The fix for your problem is to change the tasks.rb file in the gem so that pluralize is called before classify, ie:
    klass = File.basename(f, ‘.rb’).pluralize.classify.constantize

  18. mtraven says:

    I’m having my own problem: the migration generates but doesn’t run successfully:

    mt@laptop /misc/sourceforge/oscurrency$ heroku rake db:migrate
    rake aborted!
    PGError: ERROR: invalid input syntax for integer: “”:
    CREATE index communications_fts_idx
    ON communications
    USING gin((to_tsvector(‘english’, coalesce(communications.subject, ”) || ‘ ‘ || coalesce(communications.content, ”) || ‘ ‘ || coalesce(communications.recipient_id, ”))))

    Hm, it this is for a class that is not indexed. It looks like another problem in the rake task. Deleting the code for this class fixed the problem.

  19. William says:

    I get the same error as “dud3″ above, but with the model name “Address” instead of what I assume is “Business”. It seems there’s something that doesn’t like class names that end in two “s”es.

    I get:
    rake aborted!
    uninitialized constant Addres

    Any ideas?

  20. sausheong says:

    Nice! I was wondering if this works with something other than ActiveRecord, for example does it work with DataMapper?

  21. sam says:

    What about multi model searching? I need to be able to search for contact information on a Customer model that has_many Contacts. Is that possible?

  22. Levi Cook says:

    Nice idea.

    Just an FYI. I ran into a bug that occurs when multiple indexes are defined on the same model. I haven’t quite figured out why it’s failing. I have some “integration” tests that demonstrate the problem. If anyone’s interested, it lives on a fork here: http://github.com/levicook/texticle/tree/

    Thanks!

  23. Ken Mayer says:

    Has anyone figured out a way to fix the broken schema.rb issues? (Other than manually commenting out the offending lines?)

Respond

Comments

Comments