Reorganizing data with list & dict comprehensions

While writing scripts, I frequently run into the issue of needing to re-arrange sets of data into a more "process friendly" format. A common issue I encounter is needing to turn a list (array) into a dictionary (associative array) or vice versa. More often than not, I find myself needing to be able to access list elements by a key, but since they aren't setup in a dictionary I have to pull out a looping technique to reorganize the data for this to be possible.

Take the following set of data for example:

[[1, 'John Smith', 'admin'],
 [2, 'Jane Doe', 'superuser'],
 [3, 'Sam Jones', 'user']]

What we have here is a few rows of user data. In this example, the data is in a Python list (which in PHP would be an array).

In PHP, this would look something like (using print_r):

Array (
    [0] => Array (
        [0] => 1
        [1] => John Smith
        [2] => admin
    [1] => Array (
        [0] => 2
        [1] => Jane Doe
# (...etc...)

So, what if I found myself writing some code that needed to be able to access each record by its first value, which in this case would be the user_id?

Well, in PHP the easiest way to make this happen would be a good-old-fashioned foreach loop:

foreach ($row as $val) {
    $out[$val[0]] = $val;

Not very "sexy" for something I find myself having to more often than I would like, but it works nonetheless.

Now, here's where the beauty of Python kicks in. Python has a built-in feature called "comprehension" which allows you to modify and/or convert sets of data quickly and easily.

Remember what our original data set looks like? Here's a refresher:

>>> print row
[[1, 'John Smith', 'admin'],
 [2, 'Jane Doe', 'superuser'],
 [3, 'Sam Jones', 'user']]

Now, in Python all we need to do is use the dictionary comprehension:

>>> out = {val[0]: val for val in row} # Python 3.X
# (or, for Python 2.X)
>>> out = dict((val[0], val) for val in row) # Python 2.4+

>>> print out
{1: [1, 'John Smith', 'admin'],
 2: [2, 'Jane Doe', 'superuser'],
 3: [3, 'Sam Jones', 'user']}

See how easy that was?

And what if I had a dictionary in the above format, and needed to turn it into the type of list I had previously? Well, I could simply use the list comprehension to re-generate this set of data as a list:

>>> print [val for val in out.values()]
[[1, 'John Smith', 'admin'],
 [2, 'Jane Doe', 'superuser'],
 [3, 'Sam Jones', 'user']]

This is an extremely simple comprehension example. For a deeper look, check out Wikipedia's Python syntax page or the official Python documentation.


The great web technology shootout - Round 3: Better, Faster, and Shinier

A lot of the information below is out of date. Please see the new framework shootout page for the latest benchmarks.

This post is the continuation of a series. Please read Round 1 and Round 2 first if you are just now joining.

As I mentioned briefly in Round 1, this whole thing came about as an experiment to satisfy my own curiosity. Unfortunately, I wasn't expecting these posts to draw the amount of attention they have been getting, and several people informed me of a few "issues" with the first round. Since my initial approach to this topic was somewhat casual, I didn't really take the time to perform each test in a "proper scientific fashion." Although this was clearly stated in the introduction to round one, it unfortunately resulted in performance estimations that were somewhat less than accurate.

After input from various people much smarter than myself, I quickly went to work tweaking my test environment and building "proper" test apps. In the midst of this, a conversation about PHP accelerators prompted me to put PHP under the spotlight, which brought about Round 2 as an interim round. This gave me a chance to demonstrate the necessity of PHP acceleration, and only continued to solidify my opinion of PHP as an inferior web development language (remember, I just said my opinion).

Which brings us to Round 3. A lot of work has gone into "doing it right" this time, so I am fairly confident that these ...


The great web technology shootout - Round 2: PHP deserves a helping hand

A lot of the information below is out of date. Please see the new framework shootout page for the latest benchmarks.

This post is the continuation of a series. Please read Round 1 first if you are just now joining.

In Round 1, PHP was looking like quite the tortoise of the group. However, if you're familiar with some of the core differences between Python & PHP, you'll know that Python has been "cheating" slightly.

Let me explain: By default, Python compiles each script into bytecode on its first execution, allowing this bottleneck to be skipped on subsequent runs. PHP, however does not perform this type of optimization by default (in the 5.x line at least), so the PHP interpreter must re-compile each file every time it is run. As you can imagine, this can give PHP (without an accelerator) a huge disadvantage when compared to languages such as Python.

With this in mind, I have decided to take Round 2 to focus solely PHP. This will hopefully provide a clear picture of the benefits of PHP bytecode caching (at least when it comes to page-views — the memory benefits are a whole other story), and give you an idea of PHP's performance with the help of an accelerator.

There are many PHP accelerators available, but I have chosen APC for use here (mostly due to its inclusion in the upcoming PHP 6 core).

What you should know about Round 2:

  1. The hardware/software platform is the same ...

The great web technology shootout - Round 1: A quick glance at the landscape

A lot of the information below is out of date. Please see the new framework shootout page for the latest benchmarks.

Recently I went on a benchmarking spree and decided to throw ApacheBench at a bunch of the different web development technology platforms I interact with on a day-to-day basis. The results were interesting enough to me that I decided I'd take a post to share them here.

Disclaimer: The following test results should be taken with a *massive* grain of salt. If you know anything about benchmarking, you will know that the slightest adjustments have the potential to change things drastically. While I have tried to perform each test as fairly and accurately as possible, it would be foolish to consider these results as scientific in any way. It should also be noted that my goal here was not to see how fast each technology performs at its most optimized configuration, but rather what a minimal out-of-the-box experience looks like.

Test platform info:

  • The hardware was an Intel Core2Quad Q9300, 2.5Ghz, 6MB Cache, 1333FSB, 2GB DDR RAM.
  • The OS was CentOS v5.3 32-bit with a standard Apache Webserver setup.
  • ApacheBench was used with only the -n and -c flags (1000 requests for the PHP frameworks, 5000 requests for everything else).
  • Each ApacheBench test was run 5-10 times, with the "optimum average" chosen as the numbers represented here.
  • The PHP tests were done using the standard Apache PHP module.
  • The mod_wsgi tests were done in daemon mode ...

Why I've fallen in love with Python

Now that I'm using Python for a large percentage of my development, I thought it would be fun to highlight a few reasons why Python has become my new language of choice.

In an effort to help you understand where I'm coming from, let me briefly rehash some of my programming history: I spent much of the 90's doing dynamic web development using Perl (weren't those the days). I eventually migrated to PHP which usually made things much easier on the web; and subsequently replaced most of my console scripting with BASH [shell scripting]. However, I'm kind of a hack and love languages so I have occasionally been known to write something in C; and although I'm not a complete stranger to Java and Ruby, I never really felt like I "clicked" with either of those languages.

Ok, now that I've hopefully convinced you that I'm not just a fly-by-night programmer, let me show you some Python code. Brace yourself, as this article is bound to get lengthy...

Reason #1: "Whitespace done right" is actually a good thing

The first thing that people either absolutely love or adamantly hate about Python is the fact that its syntax is heavily tied to proper usage of whitespace. At first glance, this causes many curious onlookers from other languages to shy away from Python and continue in their brace-encapsulated bondage. I'll admit, at first I wasn't too wild about these new restrictions either ...