title: How to avoid hanging yourself with Rails code-theme: all-hallows-eve gradient: top-bottom black maroon h1. How to avoid hanging yourself with Rails Using ActiveRecord right the first time Rowan Hick @ Toronto Rails Project Nite, February 2008 (Adapted S9 Version from "Original PDF Slide Deck":http://work.rowanhick.com/2008/02/12/how-to-avoid-hanging-yourself-with-rails) h1. Discussion tonight * Intended for new Rails Developers * People that think Rails is slow * Focus on simple steps to improve common @:has_many@ performance problems * Short - 15 mins * All links/references up on "@work.rowanhick.com@":http://work.rowanhick.com tomorrow h1. About me * New Zealander (not Australian) * Product Development Mgr for a startup in Toronto * Full time with Rails for 2 years * Previously PHP/MySQL for 4 years * 6 years Prior QA/BA/PM for Enterprise CAD/CAM software dev company h1. Disclaimer * For sake of brevity and understanding, the SQL shown here is cut down to "pseudo sql" * This is not an exhaustive in-depth analysis, just meant as a heads up * Times were done using ApacheBench through mongrel in production mode * @ab -n 1000 http://127.0.0.1/orders/test_xxxx@ h1. ActiveRecord lets you get in trouble far to quick Super easy syntax comes at a cost. {{{ #!ruby @orders = Order.find(:all) @orders.each do |order| puts order.customer.name puts order.customer.country.name end }}} Congratulations, you just overloaded your DB with (total number of Orders x 2) unnecessary SQL calls h1. What happened there? One query to get the orders: {{{ @orders = Order.find(:all) "SELECT * FROM orders" }}} For *every item* in the orders collection: {{{ customer.name "SELECT * FROM customers WHERE id = x" }}} {{{ customer.country.name "SELECT * FROM customers WHERE id = y" }}} h1. Systemic Problem in Web development I've seen: * 15 second page reloads * 10000 queries per page "_<insert name here>_ language performs really poorly, we're going to get it redeveloped in _<insert new language here>_" h1. Atypical root cause * Failure to build application with *real* data * ie "It worked fine on my machine" but the developer never loaded up 100'000 records to see what would happen * Using Rake tasks to build realistic data sets * Test, test, test * @tail -f log/development.log@ h1. Faker to the rescue in @lib/xchain.rake@: {{{ #!ruby namespace :xchain do desc "Load fake customers" task :load_customers => :environment do require 'Faker' Customer.find(:all, :conditions => "email LIKE('%XCHAIN_%')").each { |c| c.destroy } 1..300.times do c = Customer.new c.status_id = rand(3) + 1 c.country_id = rand(243) + 1 c.name = Faker::Company.name c.alternate_name = Faker::Company.name c.phone = Faker::PhoneNumber.phone_number c.email = "XCHAIN_"+Faker::Internet.email c.save end end end }}} {{{ $ rake xchain:load_customers }}} h1. Eager loading * By using @:include@ in finds you create sql joins * Pull all required records in one query {{{ find(:all, :include => [ :customer, :order_lines ]) v/ order.customer, order.order_lines }}} {{{ find(:all, :include => [ { :customer => :country }, :order_lines ]) v/ order.customer order.customer.country order.order_lines }}} h1. Improvement Let’s start optimising ... {{{ @orders = Order.find( :all, :include => {:customers => :country} ) }}} Resulting SQL... {{{ SELECT orders.*, countries.* FROM orders LEFT JOIN customers ON ( customers.id = orders.customers_id ) LEFT JOIN countries ON ( countries.id = customers.country_id ) }}} v/ 7.70 req/s 1.4x faster h1. Select only what you need * Using the @:select@ parameter in the find options, you can limit the columns you are requesting back from the database * No point grabbing all columns, if you only want @:id@ and @:name@ {{{ Orders.find(:all, :select => 'orders.id, orders.name') }}} h1. The last slide was very important * Not using selects is *okay* provided you have very small columns, and never any binary, or large text data * You can suddenly saturate your DB connection. * Imagine our Orders table had an Invoice column on it storing a pdf of the invoice... h1. Oops * Can't show a benchmark * @:select@ and @:include@ don't work together!, reverts back to selecting all columns * Core team for a long time have not included patches to make it work * One little sentence in ActiveRecord rdoc: "Because eager loading generates the @SELECT@ statement too, the @:select@ option is ignored." h1. 'mrj' to the rescue * "dev.rubyonrails.org/attachment/ticket/7147/init.5.rb":http://dev.rubyonrails.org/attachment/ticket/7147/init.5.rb * Monkey patch to fix select/include problem * Produces much more efficient SQL h1. Updated finder Now @:select@ and @:include@ playing nice: {{{ @orders = Order.find(:all, :select => 'orders.id, orders.created_at, customers.name, countries.name, order_statuses.name' :include => [{:customer[:name] => :country[:name]}, :order_status[:name]], :conditions => conditions, :order => 'order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC' ) }}} v/ 15.15 req/s 2.88x faster h1. r8672 change * "blog.codefront.net/2008/01/30/living-on-the-edge-of-rails-5-better-eager-loading-and-more/":http://blog.codefront.net/2008/01/30/living-on-the-edge-of-rails-5-better-eager-loading-and-more/ * The following uses new improved association load (12 req/s) {{{ @orders = Order.find( :all, :include => [{:customer => :country}, :order_status] ) }}} * The following does not {{{ @orders = Order.find( :all, :include => [{:customer => :country}, :order_status], :order => 'order_statuses.sort_order' ) }}} h1. r8672 output... Here's the SQL {{{ Order Load (0.000837) SELECT * FROM 'orders' WHERE (order_status_id < 100) LIMIT 10 Customer Load (0.000439) SELECT * FROM 'customers' WHERE (customers.id IN (2106,2018,1920,2025,2394,2075,2334,2159,1983,2017)) Country Load (0.000324) SELECT * FROM 'countries' WHERE (countries.id IN (33,17,56,150,194,90,91,113,80,54)) OrderStatus Load (0.000291) SELECT * FROM 'order_statuses' WHERE (order_statuses.id IN (10)) }}} h1. But I want more * Okay, this still isn’t blazing fast. I'm building the next killr web2.0 app * Forgetabout associations, just load it via SQL, depending on application, makes a huge difference * Concentrate on commonly used pages h1. Catch 22 * Hard coding SQL is the fastest solution * No construction of SQL, no generation of ActiveRecord associated classes * If your DB changes, you have to update SQL => Keep SQL with models where possible h1. It ain't pretty... but it's fast Find by SQL {{{ class Order def self.find_current_orders find_by_sql( "SELECT orders.id, orders.created_at, customers.name as customer_name, countries.name as country_name, order_statuses.name as status_name FROM orders LEFT OUTER JOIN customers ON customers.id = orders.customer_id LEFT OUTER JOIN countries ON countries.id = customers.country_id LEFT OUTER JOIN order_statuses ON order_statuses.id = orders.order_status_id WHERE order_status_id < 100 ORDER BY order_statuses.sort_order ASC, order_statuses.id ASC, orders.id DESC" ) end end }}} v/ 28.90 req/s ( 5.49x faster ) h1. And the results | @find(:all)@ | 5.26 req/s | | | @find(:all, :include)@ | 7.70 req/s | 1.4x | | @find(:all, :select, :include)@ | 15.15 req/s | 2.88x | | @find_by_sql()@ | 28.90 req/s | 5.49x | h1. Don't forget indexes * 64'000 orders {{{ OrderStatus.find(:all).each { |os| puts os.orders.count } }}} * Avg 0.61 req/s no indexes * EXPLAIN your SQL {{{ ALTER TABLE xchain_test.orders ADD INDEX order_status_idx(order_status_id); }}} * Avg 23 req/s after index (37x improvement) h1. Avoid @.count@ It's damned slow {{{ OrderStatus.find(:all).each { |os| puts os.orders.count } }}} Add column @orders_count@ plus update code {{{ OrderStatus.find(:all).each { |os| puts os.orders_count } }}} v/ 34 req/s vs 108 req/s (3x faster) h1. For the speed freaks * Merb - "merbivore.com":http://merbivore.com * 38.56 req/s - 7x performance improvement * Nearly identical code * Blazingly fast