SQL Join syntax: Theta vs ANSI
August 10th, 2006
At the heart of every dynamic website is some form of data source, whether it's a simple flat text file, XML schema, or full blown database. In this article, I'll be addressing database table joins with SQL (specifically MySQL).
Good database design should adhere to the third normal form, or "3NF". In a nutshell, relating bits of data to other bits of data with minimal memory/storage overheads. In order to utilise a 3NF database effectively, table joins (linking and extracting data from two different internal sources) are an absolute must. In standard Structured Query Language syntax, there are two main join-types: theta and ANSI. Here's an example of each, using the common book/author database example:
# This is a theta join example
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`books` b, `authors` a
WHERE
b.`author_id` = a.`author_id`
Here you can see that the table join takes place in the WHERE clause, so we only pull out records where the books table's author_id foreign key column matches the authors table's primary key of the same name.
The equivalent as an ANSI join is as follows:
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`books` b
INNER JOIN # "INNER" is not required, however it is considered best practice
`authors` a ON b.`author_id` = a.`author_id`
Like the theta example, this will pull out everything from books, where there's an author to go with it. Now suppose we haven't yet stored the author data for a given book, but we still want to extract a list of all of the books in our example - with an ANSI join is nice and easy, just swap the word INNER for LEFT, meaning that the left-hand table (books) takes precidence and does not depend upon there being a corresponding record (or set of) from the joined table.
So how would you replicate that with a theta join? Let's take a look:
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`books` b,
`authors` a
WHERE
b.`author_id` = a.`author_id`
OR a.`author_id` IS NULL
Not as elegant, is it? It's also far easier to create accidental cartesian joins using the theta method, which is not a good situation to be in, especially when your row count goes into the hundreds on either table - the result is very CPU/Memory hungry and can even bring a user's browser, and even a server, to a grinding halt. Theta style joins in general (and in my experience) are notably resource intenstive compared to ANSI equivalents.
It's also worth noting that in versions of MySQL prior to 5.x, it was acceptible to mix join types, i.e. use theta and ANSI joins in one query, like so:
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`reading_list` rl,
`books` b
# Here's the ANSI join...
INNER JOIN
`authors` a ON b.`author_id` = a.`author_id`
# ...and here's the theta join
WHERE
rl.`book_id` = b.`book_id`
I hope that it goes without saying that this is very bad programming practice; it's illogical use two different tools (simultaneously) to do the same job! So much so that the chaps at MySQL have opted to flat-out deny mixed join types in the same query.
For further reading, I can thoroughly recommend Jon Stephens/Chad Russels article on the subject, taken from the book "Beginning MySQL Database Design and Optimization: From Novice to Professional" (ISBN 1-59059-332-4).
Add to del.icio.us
Digg this
Technorati
Related Posts:
Entry Filed under: MySQL
4 Comments Add your own
1. FuzzyBlog » Blog Ar&hellip | August 15th, 2006 at 10:45 am
[…] JellyAndCustard has a good article on SQL Joins: At the heart of every dynamic website is some form of data source, whether it's a simple flat text file, XML schema, or full blown database. In this article, I'll be addressing database table joins with SQL (specifically MySQL). Good database design should adhere to the third normal form, or "3NF". In a nutshell, relating bits of data to other bits of data with minimal memory/storage overheads. In order to utilise a 3NF database effectively, table joins (linking and extracting data from two different internal sources) are an absolute must. In standard Structured Query Language syntax, there are two main join-types: theta and ANSI. More… […]
2. Ross Notes » Blog A&hellip | August 16th, 2006 at 12:25 pm
[…] SQL Join syntax: Theta vs ANSI ยท Jelly and Custard (tags: sql) […]
3. John Hirbour | August 17th, 2006 at 4:15 am
Finally an article explaining this…. a lot of Oracle DBAs have told me that the Theta syntax is the "right one". What BS!!! I'd agree that the ANSI snytax is easier to read…. easier to debug… and nicer looking!
Plus it allows you to seperate what joins your table from your selection criteria.
4. Nested queries, SQL_CALC_&hellip | October 18th, 2007 at 2:44 am
[…] Looking at this now, this query could be further optimized. I'm doing a cross-join on sends and haikus to get all the haiku sent to the userID, but cross-joins are heavy. This nested query is a "Theta" style join, which is now discouraged in favor of the more standard "ANSI" style joins (the kinds that use JOIN … ON/USING). From Jelly and Custard: "Theta style joins in general (and in my experience) are notably resource intenstive (sic) compared to ANSI equivalents." […]
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed