Eliminate articles when ordering in MySQL

Posted on 08 March 2009 (10:06 PM)

When generating alphabetically ordered resultsets from your tables, it may be useful to eliminate articles (the and a in the English language).

The following is a simple method of doing this.

Consider the following table of movie titles:

mysql> select * from movies order by name;
+----+----------------------+
| id | name                 |
+----+----------------------+
| 4  | A Day After Tomorrow |
| 7  | Aliens               |
| 6  | Juno                 |
| 5  | Reservoir Dogs       |
| 2  | The Big Lebowski     |
| 3  | The Dark Knight      |
| 1  | The Godfather        |
+----+----------------------+
7 rows in set (0.01 sec)

As you can see, articles mess with our order. Currently only 7 records reside in this table, so it's not hard to keep a clear view of this data, but when your database grows, it can be extremely annoying when half your data gets organized under "T".

Root out articles

Luckily, it's easy to insert some logic in our query which'll make it possible to sort on the substring from the article on:

mysql> select id, name, if(name regexp '^(the|a|an) ', substring(name, instr(name, ' ') + 1), name) as orderName from movies order by orderName asc;
+----+----------------------+--------------------+
| id | name                 | orderName          |
+----+----------------------+--------------------+
| 7  | Aliens               | Aliens             |
| 2  | The Big Lebowski     | Big Lebowski       |
| 3  | The Dark Knight      | Dark Knight        |
| 4  | A Day After Tomorrow | Day After Tomorrow |
| 1  | The Godfather        | Godfather          |
| 6  | Juno                 | Juno               |
| 5  | Reservoir Dogs       | Reservoir Dogs     |
+----+----------------------+--------------------+
7 rows in set (0.00 sec)

See? By creating the field orderName, we get an alphabetically sorted list that actually makes sense.

Let me break it down, from the inside out:

  1. substring(name, instr(name, ' ') + 1)
  2. Download this code: /code/eliminate-articles-when-ordering-in-mysql1.txt

The SUBSTRING function takes the substring of a string. In this case, we want the substring from the first space (which would come after the article, if any) to the end of the string. The INSTR function provides the index we need to extract this substring. In this case, it returns the position of the first space it encounters in the string. I append 1 to the result because otherwise all orderName fields would start with a space.

  1. if(name regexp '^(the|a|an) ', substring(...), name)
  2. Download this code: /code/eliminate-articles-when-ordering-in-mysql2.txt

The IF function will return the substring as extracted in the above example, if the string matches the regular expression, or otherwise just the name as it were.

It's that simple! Here is the entire query for you to copy and paste:

  1. SELECT
  2. id,
  3. name,
  4. IF(name REGEXP '^(the|a|an) ',
  5. SUBSTRING(name, INSTR(name, ' ') + 1),
  6. name
  7. ) AS orderName
  8. FROM
  9. movies
  10. ORDER BY
  11. orderName ASC
  12. Download this code: /code/eliminate-articles-when-ordering-in-mysql3.txt

Adding articles in different languages

If you want to add more articles, for different languages for instance, just edit the regular expression. For instance, a regular expression containing both French and English articles could look like this:

  1. regexp '^(the|a|an|le|la) '
  2. Download this code: /code/eliminate-articles-when-ordering-in-mysql4.txt

That's it! Let me know if you have improvements.

Back to top

Filed under Database design

Comments:

  1. 09 March 2009 (03:01 PM) by David

    Pretty kewl!

    This ensures that your status as MySql fetishist and expert will live on forever :)

  2. 18 February 2011 (03:09 AM) by yiwu wholesale

    Hey,this UI of your blog is nice and clean!Good work now,hope you providing more useful information!

Sorry, due to spam, comments are temporarily out of order.

Back to top

Preferences

These settings will be saved for your convenience.