Tuesday, May 17, 2011

Getting picky with arrays

Today I learned how to choose multiple items from a list. For example, imagine you have a C# program with an array of addresses. If you want a list of only the addresses in a specific zip code, you could iterate over all the addresses and add them to a list one by one. It would look something like this:

List<Address> one_zip = new List<Address>();
foreach (addr in addresses)
{
    if (addr.zip == "12345")
    {
        one_zip.Add(addr);
    }
}

You could also replace that with a single line of more readable code:


IEnumerable<Address> = addresses.Where(addr => addr.zip == "12345");
If the Where syntax above looks familiar, you’ve probably learned SQL or some other database query language. C# 3.0 introduced a framework called LINQ which allows you to do the same kinds of queries in C# as you would do in a database.

Where isn’t truly a method of arrays, but it behaves like one for our purposes. It receives one parameter, a function that maps a single item from the array to a boolean value (true or false). This function (think of it as a condition), tells Where which items to keep and which to ignore.

The only daunting part of this syntax is the function, in this case a lambda expression. A lambda expression uses the => to say that it maps the parameter, in this case addr to some value, in this case true if the zip code is 12345 and false otherwise. Naturally, Where calls this function for each item in the array and keeps only the items where the function returns true.

In order to use the Where method, you need to include the LINQ namespace by declaring using System.Linq; and ensuring that you have the necessary resources. I don’t know if it’s bundled in the .NET SDK, but I do know it’s included with Visual Studio.

In Python, on the other hand, you don’t need to import any modules to select items from a list. It’s part of the basic functionality. (A structure similar to Address above isn’t part of the basic functionality, but you can define one with a namedtuple from the collections module.) The code looks like this:

[addr for addr in addresses if addr.zip == "12345"]

The square parentheses define a list and the first addr defines the items in the list. This is even more flexible than Where in C# in several ways. For example, if you only want a list of the street names:

[addr.street for addr in addresses if addr.zip == "12345"]

This is why addr appears twice in the declaration. The second time it appears, it defines the variable you are iterating with. But you don’t have to save those items, you can save any value you want. For example, a truly odd way of counting the addresses in the zip code would be:

sum([ 1 for addr in addresses if addr.zip == "12345"])

With just a little extra work you can also make your selection based on the indices. If you want every other address in your list:

[addresses[i] for i in range(len(addresses)) if i % 2 == 1]

I don’t know how you could do this in C# without a loop.

Thanks to Imri for the lessons in Python.

 

 

1 comment: