<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tender Lovemaking &#187; texticle</title>
	<atom:link href="http://tenderlovemaking.com/category/computadora/texticle/feed/" rel="self" type="application/rss+xml" />
	<link>http://tenderlovemaking.com</link>
	<description>The act of making love, tenderly.</description>
	<lastBuildDate>Sun, 15 Jan 2012 04:36:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Full Text Search on Heroku</title>
		<link>http://tenderlovemaking.com/2009/10/17/full-text-search-on-heroku/</link>
		<comments>http://tenderlovemaking.com/2009/10/17/full-text-search-on-heroku/#comments</comments>
		<pubDate>Sat, 17 Oct 2009 23:29:50 +0000</pubDate>
		<dc:creator>Aaron Patterson</dc:creator>
				<category><![CDATA[computadora]]></category>
		<category><![CDATA[texticle]]></category>

		<guid isPermaLink="false">http://tenderlovemaking.com/?p=369</guid>
		<description><![CDATA[YA!! IT&#8217;S SATURDAY NIGHT! YOU ALL KNOW WHAT THAT MEANS! Time to get krunk and do some full text searching. OW! I&#8217;d like to share with my tens of loyal readers how I&#8217;m doing Full Text Search on Heroku. Heroku&#8217;s documentation lists two ways to get full text indexing working with your Heroku application. They [...]]]></description>
			<content:encoded><![CDATA[<p>YA!!  IT&#8217;S <strong>SATURDAY NIGHT</strong>!  YOU ALL KNOW WHAT THAT MEANS!  Time to get krunk and do some <strong>full text searching</strong>.  OW!  I&#8217;d like to share with my tens of loyal readers how I&#8217;m doing Full Text Search on Heroku.</p>
<p>Heroku&#8217;s documentation <a href="http://docs.heroku.com/full-text-indexing">lists two ways</a> to get full text indexing working with your Heroku application.  They talk about using <a href="http://www.davebalmain.com/">Ferret</a> and <a href="http://lucene.apache.org/solr/">Solr</a> for full text indexes.  The Ferret option looks OK, but it requires you to rebuild your indexes every time you push.  Solr would work, but it requires an EC2 instance or some third party server.  Since my budget is precisely $0, using Solr is out of the picture.</p>
<p>But there is a third option.  A very <em>secret</em> option.  A devious but fun option.  You see, <a href="http://docs.heroku.com/database">Heroku runs PostgreSQL</a> for each rails application database.  They&#8217;re running a version new enough (Version 8.3) to have full text index support built in.  If we&#8217;re willing to throw out database agnosticism, we can take advantage of the database&#8217;s indexing capability.  For this article, I&#8217;d like to hop on the Postgres train and show you how to get full text indexes working with Postgres in your rails application.  I&#8217;ll also show you how to get those indexes on Heroku so we can use them &#8220;in the cloud&#8221; (Heroku is in the cloud, right?).</p>
<p>For the rest of this article, I&#8217;m going to assume you have PostgreSQL version 8.3 or higher installed already and can get your rails application working with Postgres.  Installing postgres is outside the scope of this article, but I found <a href="http://www.gregbenedict.com/2009/08/31/installing-postgresql-on-snow-leopard-10-6/">these instructions</a> to be very helpful.</p>
<h3>Step 1: Go get some coffee</h3>
<p>I love it when instructions tell me to go get some coffee because I always do.  I have to follow the instructions right?</p>
<h3>Step 2: Install Texticle</h3>
<p><a href="http://texticle.rubyforge.org/">Texticle</a> is a gem I wrote to help you define your text indexes on a per model basis.  To install texticle, we just do the normal gem install:</p>
<pre>
  $ sudo gem install texticle
</pre>
<p>The gem is pure ruby and isn&#8217;t very long, so I encourage you to <a href="http://github.com/tenderlove/texticle">peek through the source</a>.</p>
<p>While we&#8217;re at it, we should configure rails to load the texticle gem.  We need to add it to our envoronment.rb file.  Here&#8217;s what mine looks like:</p>
<pre class="brush: ruby; title: ; notranslate">
RAILS_GEM_VERSION = '2.3.4' unless defined? RAILS_GEM_VERSION

require File.join(File.dirname(__FILE__), 'boot')

Rails::Initializer.run do |config|
  config.time_zone = 'UTC'

  config.gem 'texticle'
end
</pre>
<p>Texticle also comes with some handy rake tasks (which we&#8217;ll talk about later).  In order to get those we&#8217;ll need update the rails Rakefile:</p>
<pre class="brush: ruby; title: ; notranslate">
require(File.join(File.dirname(__FILE__), 'config', 'boot'))

require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'

require 'tasks/rails'

require 'rubygems'

## Our texticle rake tasks
require 'texticle/tasks'
</pre>
<h3>Step 3: Configuring your index</h3>
<p>Let&#8217;s pretend we have an Article model.  The Article model has a &#8220;title&#8221; field and a &#8220;body&#8221; field:</p>
<pre class="brush: ruby; title: ; notranslate">
class CreateArticles &lt; ActiveRecord::Migration
  def self.up
    create_table :articles do |t|
      t.string :title
      t.text   :body

      t.timestamps
    end
  end

  def self.down
    drop_table :articles
  end
end
</pre>
<p>To index those two fields, we just create an index block in the model and list the fields we want to index:</p>
<pre class="brush: ruby; title: ; notranslate">
class Article &lt; ActiveRecord::Base
  index do
    title
    body
  end
end
</pre>
<p>Declaring this index automatically defines a &#8220;search&#8221; method on the model that we can use to search our articles:</p>
<pre class="brush: ruby; title: ; notranslate">
&gt;&gt; Article.search('coffee instruction')
=&gt; [#&lt;Article id: 4, title: &quot;coffee&quot;, body: &quot;I like getting coffee to be in instructions&quot;, created_at: &quot;2009-10-17 21:42:13&quot;, updated_at: &quot;2009-10-17 21:42:13&quot;&gt;]
&gt;&gt; Article.create(:title =&gt; 'kittens', :body =&gt; 'kitten poop smells bad, but I still like kittens.')
=&gt; #&lt;Article id: 5, title: &quot;kittens&quot;, body: &quot;kitten poop smells bad, but I still like kittens.&quot;, created_at: &quot;2009-10-17 21:42:33&quot;, updated_at: &quot;2009-10-17 21:42:33&quot;&gt;
&gt;&gt; Article.search('kittens')
=&gt; [#&lt;Article id: 5, title: &quot;kittens&quot;, body: &quot;kitten poop smells bad, but I still like kittens.&quot;, created_at: &quot;2009-10-17 21:42:33&quot;, updated_at: &quot;2009-10-17 21:42:33&quot;&gt;]
&gt;&gt;
</pre>
<p>Great!  We can search our records.  There&#8217;s just one catch: we haven&#8217;t indexed our data.  Doing these types of searches will be slow against large sets of data unless we add an index.  Writing these indexes is a PITA, so texticle comes with a handy rake task for generating a migration to create your indexes:</p>
<pre>
  $ rake texticle:migration
  $ rake db:migrate
</pre>
<p>After running this, Postgres can use the prebuilt indexes when searching your data.</p>
<p>Just remember: every time you modify columns in your index block, or add new index blocks, you should create a new migration to updated the indexes.  If you don&#8217;t update the indexes, searches will still work as expected, they just might be kind of slow.</p>
<h3>Step 4: Integrating With Heroku</h3>
<p>This part is pretty easy.  First we update our <a href="http://blog.heroku.com/archives/2009/3/10/gem_manifests/">heroku gem manifest</a>:</p>
<pre>
  $ echo "texticle" >> .gems
  $ git add .gems
  $ git commit -m'updating gem manifest'
  $ git push origin master
</pre>
<p>Once your code is up on heroku, just tell heroku to migrate the database:</p>
<pre>
  $ heroku rake db:migrate
</pre>
<p>It&#8217;s just that easy!  Your indexes should be available on the Heroku database server and your application can use them.</p>
<h3>Advanced Texticle Usage</h3>
<p>Texticle has a few more features I&#8217;d like to briefly mention.  The first one is search ranking.  We can tell Postgres which field has a higher priority.  For example, we can tell Postgres to weigh matches in the article&#8217;s title higher than matches in the body:</p>
<pre class="brush: ruby; title: ; notranslate">
class Article &lt; ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end
end
</pre>
<p>The ranks are &#8216;A&#8217; through &#8216;D&#8217;, and multiple fields can have the same rank.</p>
<p>We can also group indexes.  The index we&#8217;ve seen so far will search all columns listed.  We can add another index so that we only search the &#8220;title&#8221; field:</p>
<pre class="brush: ruby; title: ; notranslate">
class Article &lt; ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end

  index('title') { title }
end
</pre>
<p>This gives us a &#8220;search_title&#8221; method in addition to the &#8220;search&#8221; method:</p>
<pre class="brush: ruby; title: ; notranslate">
&gt;&gt; Article.search_title('kittens')
=&gt; [#&lt;Article id: 5, title: &quot;kittens&quot;, body: &quot;kitten poop smells bad, but I still like kittens.&quot;, created_at: &quot;2009-10-17 21:42:33&quot;, updated_at: &quot;2009-10-17 21:42:33&quot;&gt;]
&gt;&gt;
</pre>
<p>The last thing I want to mention is &#8220;rank&#8221;.  When you perform a search, texticle adds an extra field to your model called &#8220;rank&#8221;.  The rank indicates how well your record matched the search criteria:</p>
<pre class="brush: ruby; title: ; notranslate">
&gt;&gt; Article.search('like').map { |x| x.rank }
=&gt; [&quot;0.4&quot;, &quot;0.4&quot;]
&gt;&gt; Article.search('coffee').map { |x| x.rank }
=&gt; [&quot;1.4&quot;]
&gt;&gt;
</pre>
<p>Search results are already returned sorted by rank in descending order, so no need to worry about sorting.</p>
<h3>Conclusion</h3>
<p>I hope you enjoy tickling text with texticle as much as I do.  So far, I&#8217;ve been pretty happy with this solution.</p>
<p>Things I like:</p>
<ul>
<li>It&#8217;s the right price for use with Heroku (namely $0)</li>
<li>Easy to configure and deploy</li>
<li>No need to rebuild indexes on pushes</li>
<li>Postgres can be configured to use different dictionaries, so you aren&#8217;t stuck with English</li>
</ul>
<p>The only drawbacks I&#8217;ve found so far are:</p>
<ul>
<li>INSERTs and UPDATEs are slower</li>
<li>It&#8217;s database specific</li>
</ul>
<p>Inserts and updates will be slower, but that comes with the territory of adding database indexes.  My data is mostly doing reads, so it doesn&#8217;t bother me.  Texticle <strong>is</strong> database specific, but other databases are starting to have full text search support.  I think texticle could be extended to support other databases, but I&#8217;m quite happy with postgres.</p>
<p>Anyway, thanks for reading.  The final step is that you should go get another cup of coffee.</p>
]]></content:encoded>
			<wfw:commentRss>http://tenderlovemaking.com/2009/10/17/full-text-search-on-heroku/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
	</channel>
</rss>

