JP's Laboratory

Posts / Code / Photos / Videos / Contact

List comprehension

Written: 20130202

List Comprehensions in Python


List Comprehensions

A list comprehension is a syntactic construct, which makes creating a list from a list easier and nicer. It handles the overhead of creating the new list for you, resulting in code that simpler, and often just a one-liner. The code is more readable, and easier to understand.


Let us start with an example. We have a list of numbers, and we want to make a new list, with each number of the old list squared. A solution could look like this:


old_list = range(4)

new_list = []

for n in old_list:

    new_list.append(n**2)


With a list comprehension this can be rewritten as:


old_list = range(4)

new_list = [n**2 for n in old_list]


This code does the same thing, but this version is shorter. It is also easier to read what new_list is supposed to be: the brackets give that it is a list, and its contents is given by the expression inside the brackets. In the first version we had to read three lines of code, before we got the whole picture.

The first part of the expression inside the brackets, n**2, gives the expression for how each element in the new list is formed. The n is the name given to each element in the old list, and is defined just after the "for" in the expression. The last part of the list comprehension gives the old list, old_list, that the new list should be made from. More generally it could be written as:


[expression for variable in iterable]

Until now we have said that the a list comprehension makes a list of list, but actually it can make a list of any iterable. The only requirement for the iterable is that it has to be finite. Otherwise python will just be creating the list until no more memory is available.


The variable part of the list comprehension does not have to be a  single variable. Due to Python’s tuple unpacking we can have a tuple with several variables. Here is an simple example iterating over the key and value items from a dictionary:


>>> d = {'kyle': 3, 'stan': 5, 'eric': 0, 'kenny': 7}

>>> d.items()

[('stan', 5), ('kenny', 7), ('kyle', 3), ('eric', 0)]

>>> ["%s = %d" % (k, v) for k, v in d.items()]

['stan = 5', 'kenny = 7', 'kyle = 3', 'eric = 0']


The items method returns a tuple of keyword and value, which are assigned to v and k.


Python also allows an optional filter can be also be added to the list comprehension:


[expression for variable in iterable if condition]


The equivalent code would look something like this with traditional code:


new_list = []

for variable in iterable:

    if condition:

        new_list.append(expression)


As an example we could use the last example, but now we only want values above zero:


>>> d = {'kyle': 3, 'stan': 5, 'eric': 0, 'kenny': 7}

[('stan', 5), ('kenny', 7), ('kyle', 3), ('eric', 0)]

>>> ["%s = %d" % (k, v) for k, v in d.items() if v>0]

['stan = 5', 'kenny = 7', 'kyle = 3']


The "if" used in the filter can be somewhat confusing. It is only used for selecting what elements to include in the new list. An ternary conditional expression "a if test else b " can be used in the expression part of the list comprehension. When learning about list comprehensions, it might be easy to confuse these two. As an example let us take the previous example, and modify it, so instead of removing the user with score 0, we will give him 5 points:

>>> d = {'kyle': 3, 'stan': 5, 'eric': 0, 'kenny': 7}

[('stan', 5), ('kenny', 7), ('kyle', 3), ('eric', 0)]

>>> ["%s = 5" % k if (v==0) else "%s = %d" % (k, v) for k, v in d.items()]

['stan = 5', 'kenny = 7', 'kyle = 3', 'eric = 5']


This list comprehension is quite long, so if it became any longer, it would probably and idea to split it up. Maybe use a function for the whole expression part of the list comprehension to make it more readable too.


A note of warning, in Python versions 2.x the list comprehensions scope leaks into the enclosing scope. This can introduce hard to find errors in the code. Try this code in python 2, and you’ll see that it leaks the variable i:


>>> [i for i in range(3)]

[0, 1, 2]

>>> i

2


This has been fixed in Python 3:


>>> [i for i in range(3)]

[0, 1, 2]

>>> i

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

NameError: name 'i' is not defined



At this point we have probably been through the most important aspects of list comprehensions. The list comprehension syntax allows for a little more complex structures, and these will be described later, but first we will look at generator expressions, which are closely related to list comprehensions.


Generator Expressions

When working with a list, it is in many cases not actually necessary to have the whole list in memory. Often we just iterate once over the list, using the elements once. A generator is therefore sometimes preferred over list.

A simple syntax for making a generator is use use a generator expression. Generator expressions are written in almost the same way as list comprehensions, just replace the enclosing square brackets with parentheses.


>>> [x**2 for x in xrange(4)]

[0, 1, 4, 9]

>>> g = (x**2 for x in xrange(4))

>>> g

<generator object <genexpr> at 0x2ca7d8>

>>> g.next()  # In python 3.x use g.__next__()

0

>>> g.next()

1


The generator expression returns a generator object, which behaves just like an iterator. They create values on demand, and are therefore less memory demanding. For Python 2 the method next is used to get the next element from an iterator, this has been changed to __next__ in Python 3. When no more elements are available, a stopIteration exception is raise.


There is a special exception where a generator expression does not have to be enclosed in parenthesis, and that is when it is used a an argument to a function that only takes one argument. Here is an example with the sum function that can consume an iterable:


>>> sum(x**2 for x in range(4))

14


The generator expression is a very compact, but it is not very flexible. If it is not possible to describe the generator with a generator expression, it has to implement the generator pattern, an iterable class, or implement a functions that yields the values.


Dictionary and Set Comprehensions

In Python 2.7 and 3.0 dictionary and set comprehensions were also introduced. The syntax is similar to the list comprehensions. Here is an example of a set and dictionary comprehension:


>>> {x for x in range(4)}

set([0, 1, 2, 3])

>>> {x:"A"*x for x in range(4)}

{0: '', 1: 'A', 2: 'AA', 3: 'AAA'}


List Comprehensions Continued

Now we will go back to list comprehensions, and look at the rest its syntax.


Nested Loops

The description below describes the full list comprehension syntax, and is taken directly from the Python 2.7 reference:


comprehension ::=  expression comp_for
comp_for      ::=  "for" target_list "in" or_test [comp_iter]
comp_iter     ::=  comp_for | comp_if
comp_if       ::=  "if" expression_nocond [comp_iter]


This way of describing a grammer is called the Backus-Naur form. What it says, is that there has to be at least one "for .. in" in a list comprehension, but then after that there can be any number of "for .. in" or "if .." clauses. The for and if clauses are nested in the same order as they appear in the list comprehension.


A list comprehension of this form:


res = [expression for i1 in L1 if cond1 for i2 in L2 if cond2 ...]


is equivalent to:


res = []

for i1 in L1:

    if cond1:

        for i2 in L2:

            if cond2:

                ...

                res.append(expression)


The nested loops can also iterate over variables from the outer loops. Here is an example


>>> L = [[1,2,3],[2,3],[1]]

>>> res = []

>>> for vec in L:

...     if len(vec)>2:

...         for e in vec:

...             res.append(e*2)

...

>>> res

[2, 4, 6]


which equivalent to:


>>> L = [[1,2,3],[2,3],[1]]

>>> [e*2 for vec in L if len(vec)>2 for e in vec]

[2, 4, 6]


Here the program iterates over all the nested lists, and only returns the lists longer than 2, and the values doubles for those.


Nested list comprehensions

It possible to nest list comprehension. The expression part of a list comprehension that describes how the new elements should be formed, can be expressed with a list comprehension itself. It will have access to all the variables in the outer list comprehension, as it is enclosed in the innermost loop of the outer list comprehension.


A simple example could be:

    >>> L = [[1,2,3],[2,3],[1]]

    >>> [[i*2 for i in vec] for vec in L if len(vec)>2]

    [[2, 4, 6]]    




Summary

List comprehensions and list generators facilitate us to make compact and easily readable expression. But on the other hand they also allow for very complex expressions - the expressions can be so long, that they have to be broken up into several lines. The complex instances of list comprehensions do not inherit the easy readability, as their shorter and simple cousins; actually they might be harder to understand than conventional loops. Conventional loops follow Python’s strict indentation rules, and therefore more structured than a long list comprehension, which only is separated by mandatory whitespaces.


References


comments powered by Disqus