Algorithms, Data Structures & Protocols

published 2013-01-28 [ home ]

Algorithms & Data Structures

For a long time I have thought that there were only two fundamental parts to Computer Science: Algorithms and Data Structures. This is how it used to be taught, and how I think it should still be taught to beginners.

There are two main categories of parts in a computer: processing units (CPUs, GPUs…) and data storage units (caches, RAM, disk…). Your typical information processing program takes data from a durable storage unit, moves it to a temporary storage unit, modifies it thanks to a processing unit and stores the result in a permanent storage unit. Of course there are variations around this theme, and some programs do a bit of extra work such as handling user input, but that is the idea.

We call “algorithms” the different ways we instruct the processing units to act on the data and “data structures” the different layouts we use to store data in the storage units.

A very simple view of fundamental CS could be summed up to these two disciplines. You may argue that other things are indispensable, for instance theoretical aspects like lambda calculus or Turing machines, or maybe paradigms. You would be right, these things are important, but if you really want the baseline, it all comes down to algorithms and data structures.

Or so I thought.

Protocols

It turns out looking at the aforementioned parts of the computer is not enough. We need to look at the negative space too: how does the data move around? Inside the computer, data mostly circulates in buses. Between computers, data may circulate in networks, on detachable storage units…

With computers becoming increasingly parallel, the Internet reaching ubiquity and programs turning into complex distributed systems, understanding the interactions between computer parts, software parts or entirely different systems has become a necessity. This reveals a third fundamental side of CS, on equal footing with algorithms and data structures: protocols.

Protocols are more present in CS than you may think. Even the original vision of OOP by Alan Kay was mostly about protocols. Joe Armstrong, a father of Erlang, recently insisted on the necessity to teach protocols (and algorithms - if he had added data structures this post would look like plagiarism ;p).

CS classes at engineering school did not teach me much or influence the way I write programs, but majoring in network engineering did. It made me start to think in terms of independent blocks and the protocols they use to communicate. I cultivated that vision during my MSc in distributed systems (it used to be called “grid computing”) and I still have it today.

So when I see people arguing endlessly about static versus dynamic typing or such matters, I think: OK, that may be important, but do not forget CS is, at heart, about algorithms, data structures and protocols.