“I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important.”
– Linus Torvalds
Learning about data structures and algorithms makes you a stonking good programmer.
Data structures and algorithms are patterns for solving problems. The more of them you have in your utility belt, the greater variety of problems you’ll be able to solve. You’ll also be able to come up with more elegant solutions to new problems than you would otherwise be able to.
You’ll understand, in depth, how your computer gets things done. This informs any technical decisions you make, regardless of whether or not you’re using a given algorithm directly. Everything from memory allocation in the depths of your operating system, to the inner workings of your RDBMS to how your networking stack manages to send data from one corner of Earth to another. All computers rely on fundamental data structures and algorithms, so understanding them better makes you understand the computer better.
Cultivate a broad and deep knowledge of algorithms and you’ll have stock solutions to large classes of problems. Problem spaces that you had difficulty modelling before often slot neatly into well-worn data structures that elegantly handle the known use-cases. Dive deep into the implementation of even the most basic data structures and you’ll start seeing applications for them in your day-to-day programming tasks.
You’ll also be able to come up with novel solutions to the somewhat fruitier problems you’re faced with. Data structures and algorithms have the habit of proving themselves useful in situations that they weren’t originally intended for, and the only way you’ll discover these on your own is by having a deep and intuitive knowledge of at least the basics.
But enough with the theory, have a look at some examples
Figuring out the fastest way to get somewhere
Let’s say we’re creating software to figure out the shortest distance from one international airport to another. Assume we’re constrained to following routes:
Based on that graph of destinations and the distances between them, how can we find the shortest distance say, from Helsinki to London? Dijkstra’s algorithm is the algorithm that will definitely get us the right answer in the shortest time.
In all likelihood, if you ever came across this problem and knew that Dijkstra’s algorithm was the solution, you’d probably never have to implement it from scratch. Just knowing about it would point you to a library implementation that solves the problem for you.
If you did dive deep into the implementation, you’d be working through one of the most important graph algorithms we know of. You’d know that in practice it’s a little resource intensive so an extension called A* is often used in it’s place. It gets used everywhere from robot guidance to routing TCP packets to GPS pathfinding.
Squeezing data with Huffman coding
Huffman coding is an algorithm used for lossless data compression. It works by analyzing the data you want to compress and creating a binary code for each character. More frequently occurring characters get smaller codes, so
e might be encoded as
x might be
10010. The codes are created so that they can be concatenated without a delimeter and still be decoded accurately.
Huffman coding is used along with LZ77 in the DEFLATE algorithm which is used by gzip to compress things. gzip is used all over the place, in particular for compressing files (typically anything with a
.gz extension) and for http requests/responses in transit.
Knowing how to implement and use Huffman coding has a number of benefits:
- You’ll know why a larger compression context results in better compression overall (e.g. the more you compress, the better the compression ratio). This is one of the proposed benefits of SPDY: that you get better compression on multiple HTTP requests/responses.
- If you ever find yourself trying to forcibly decipher encrypted information , you may realize that since repeating data compresses better, the compression ratio of a given bit of ciphertext will help you determine it’s block cipher mode of operation.
Picking what to learn next is hard
As long as our dominant model of computation stays the same, data structures and algorithms that we use today will be used in some form or another in the future. You can safely spend time on gaining a deep and thorough knowledge of them and know that they will pay dividends for your entire career as a programmer.