Online. Offline. Bottom Line.™ (skip to the content)

Home | About | Jobs | Privacy Policy | Contact | Login or Register


To Join Or Not To Join - Addressing Simple Yet Expensive SQL JOIN Errors

by

An SQL JOIN clause combines records from two tables in a relational database, resulting in a new, temporary table, sometimes called a “joined table”.  While this is a very powerful tool in relational databases, it can also be expensive if used incorrectly.  Here are some simple tips to get the most out of your JOIN clauses.

  1. Don’t be afraid of JOINs. The She-BA offers up some nice advice here. “They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.”
  2. Don’t Index small tables. Yeah, I know this appears to be contrary to the previous top, but as Alexey Kovyrin shows us, when the data set can fit in memory, traversing about same amount of rows full table scan performs about 25 times better than accessing rows via index.  As RAM becomes cheaper and cheaper, the amount of data that can sit in memory becomes larger and larger.  As this continues to increase, significant performances differences will begin to become apparent.
  3. Don’t use OR clauses in the predicate. This is actually a pretty common one, which also causes a huge performance hit.  It is much faster to union two join clauses than it is to write one with an OR in the predicate.
    select T1.col1, sum(col2) AS col2, sum(col3) AS col3
    from T1 left outer join T2
    on T1.col1 = T2.col4 and (T2.col5=T1.col5 or T2.col6=T1.col6)
    group by col1 order by col1

    At first glance this doesn’t look too bad.  But the OR clause in the predicate actually makes this JOIN much more expensive than it needs to be.  A much faster solution would be to use two JOIN clauses and UNION them afterward.
    select col1, sum(col2) AS col2, sum(col3) AS col3 from
    (select T1.col1, sum(col2) AS col2, sum(col3) AS col3
    from T1 left outer join T2
    on T1.col1 = T2.col4 and T2.col5=T1.col5
    group by col1
    union all
    select T1.col1, sum(col2) AS col2, sum(col3) AS col3
    from T1 left outer join T2
    on T1.col1 = T2.col4 and T2.col6=T1.col6
    group by col1)
    group by col1 order by col1

    Similar to the second tip, this is about 25 times faster.
  4. Running a query in a loop is usually a bad idea.If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.  This also came from The She-BA’s blog.

Know any more tips?  Add them as a comment!

Found in • (0) CommentsPermalink http://www.sundog.net/index.php/sunblog/entry/to-join-or-not-to-join-addressing-simple-yet-expensive-sql-join-errors/
Like this post? Subscribe to our RSS feed. RSS icon

© 2008 Sundog, All Rights Reserved xhtml | css | 508 | What's This?