Why we are choosing Clojure as our main programming language
Posted by Gísli Kristjánsson at 4:20 pm
[EDIT: It's been fun to follow the lively discussions on Hacker News. Head there for some good points regarding choosing a programming language to base your startup on]
[EDIT: Just noticed the traffic from Reddit. There's also a lively discussion there]
When we first set out to build the prototype for appvise.me I started hacking away in Python, a language I’d become familiar with over the last few years. I especially enjoy doing research programming with the REPL.
I had written a database loader to import Apple’s Enterprise Partner Feed (EPF) and a web crawler in Python and next up was the web interface. The Google AppEngine (GAE) with its support for Java and Python seemed to be a good match for a start-up as the cost-of-doing-business was close to zero for a low traffic web site. Perfect for the time being and by not confining oneself to the AppEngine Web Framework, webapp, and by abstracting the AppEngine Datastore away, it’s not all to hard to port the code from GAE as traffic picks up. A forked version of Django, the popular web framework even runs on it. What’s there not to like?
It all seemed like a smooth sailing but in the back of my head I was beginning to have doubts about my decisions. At the time I kept seeing more and more horror stories about GAE, high latency, hard-to-debug problems and I quickly realized a more solid infrastructure was needed. Abandoning GAE led me to revisit my earlier decision to use Python as well and I went into research mode – again.
Here I plan to go briefly through the options I evaluated:
- Python
While Python has always made sense to me I’ve always been annoyed by the build tools available with Python. While there are solutions like virtualenv, easy_install, pip and others I’m constantly reliving the scenario where you need to port your code to another machine and for some reason getting the correct libraries in the correct version just seems impossible. Library dependencies in Python are seriously hard stuff.Then there is the GIL. For those of you unfamiliar with the Global Interpreter Lock (GIL) it is a locking strategy for interpreted languages that ensures that only one thread, the one holding the lock, can safely access objects. This is fine on a single-core CPU as there is really just one thread running at a given time. The only surfaces when you run your code on a multi-core CPU where literally hundres of cores can be working simultaneously and the GIL just prevents all but one core from accessing the data/objects at a time. With this fact in mind and knowing that Moore’s Law does not hold any longer and we have to start scaling horizontally, i.e. adding more cores and machines instead of just waiting for a faster machine, and developing concurrent solutions it seems controversial to choose Python as the concurrent programming language. Fortunately not all problems require concurrent solutions as they are either IO-bound or can be scaled by forking multiple processes.
Python 3.0 the next major version of Python will break backwards compatibility with Python 2.x. There is going to be a period where you want to move to Python 3.0 but one of the many libraries you use breaks under 3.0 so you postpone your migration to Python 3.0. This in turn minimizes the pressure on the library authors to port their code to the new version and creates a catch-22 scenario. We see the same thing with the migration to IPv6.
- Perl
Perl is an old friend. Having used it to program the busiest website in Iceland I know it’s strengths and weaknesses. Since then mbl.is is being rewritten in Python (Django). Perl was not really one of the contestants but it has its place. - PHP
While there are enough programmers that know PHP (which is clearly a plus) there’s just not enough sex going on here. As with Perl, PHP really wasn’t in the loop. - Ruby
Ruby, Ruby, Ruby. Like Python it is hampered by the GIL. I like Ruby but even still I prefer Python over it. - Java
I’ve always had a massive respect for the JVM but every time I intend to pick up some java technology I always get swamped in XML configuration files that make my eyes bleed and class on top of classes just do do something very simple. Frustration has kept me away from Java. - Javascript
We have seen some crazy benchmarks for Node.js, the event-driven web server. It’s pretty interesting but I’m not sure I’d base my system on it because the library support is pretty poor and Node.js relies on cooperative threading. That means that if there’s a CPU intensive part a request is processing the incoming requests are blocked. So you really have to be careful how you write your code and split CPU intensive regions into smaller ones or you risk having your next visitor leave because your website is not responding. It’s neat to be able to write code for the backend and the browser in the same programming language but the ecosystem just doesn’t feel mature enough to base anything serious on it. - Erlang
Erlang is cool. I deeply enjoy programming simple solutions in the language and the possibility to hotload your code is awesome. The process-oriented actor-model along with the Erlang’s implementation of Supervisor trees enable highly distributed, concurrent and tolerant computing. Lately people have been using these features to create some pretty unique web frameworks and libraries in Erlang. String manipulation is one of Erlang’s weak spots. It’s slow and working with binary streams is sometimes just to much for someone coming from Perl.
- Haskell
I’m a functional language pervert and Haskell is the ultimate, pure, lazy evaluation, functional programming language. It has served as my functional drug since 2007 and I deeply enjoy the mathematical aspects of Haskell. I’ve used it to create a blogging system for my personal website and photo gallery (upload, image manipulation, categories, tag support, etc.) for my brother the photographer. Even still I just don’t have the guts to base anything on Haskell because there’s no middle ground; it’s either Haskell’s way or the highway and sometimes you just have to compromise. - Lisp/Scheme
I’ve written a proxy server in Chicken Scheme which compiles to C code. The library support is OK for some purposes but as with Lisp there are some cool continuation-based web frameworks but general support is not great. - Clojure
And then there was Clojure. I’ll go through the rationale of using Clojure below.
Overview
Taken from Clojure’s website:
Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multi-threaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.
Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multi-threaded designs.
Being a Lisp dialect, Clojure is functional and many of the functional idioms readily available. It has a great library support – the Clojure ecosystem is advancing quickly – but where it’s lacking you just find what you need in Javaland. This way enterprise quality debuggers and profilers are made available to you. Clojure runs on a battle tested virtual machine (JVM) and exposes a brilliant lock-free concurrency system that is ready for multi-cores.
Web programming
I am not a fan of all-in-one web frameworks like Rails or Django instead I prefer composable libraries. The HTTP request/response protocol has been abstracted in different programming languages (Rack for Ruby, WSGI for Python, Hack for Haskell) and Clojure is no exception. Ring is to Clojure what WSGI is for Python. Building on the functional foundation Ring enforces a composable pattern which is easily extendable with Ring’s middleware. Compojure is another library that build on top of Ring which facilitates routing and manipulating Ring’s functionality. Here’s a demo application written using the above mentioned libraries:
(ns hello-world
(:use compojure.core, ring.adapter.jetty)
(:require [compojure.route :as route]))
(defroutes main-routes
(GET "/" [] "<h1>Hello World</h1>")
(route/not-found "<h1>Page not found</h1>"))
(run-jetty main-routes {:port 8080})
To interact with a relational database there’s the great ClojureQL which allows you to functionally construct SQL statements, like this:
user=> (-> (table :users) (take 5) (drop 2))
SELECT users.* FROM users LIMIT 3 OFFSET 2
IDE
Clojure can be integrated to Netbeans, Eclipse, IntelliJ and other IDEs and editors. But since I live in Emacs I was happy to find out that Clojure has a Swank backend for Emacs SLIME. This allows me to start the JVM up once and then connect my editor to it when ever. I can even hotload code, replace running functions and so forth. For those who have not seen the magic of SLIME I recommend this video by the author of ClojureQL.
Big Data
The recommendation engine is going to base its recommendation on a massive amount of data. Fortunately others have been working on scaling solutions. A lot of the work is being done in Javaland and being able to write small Clojure wrappers around libraries like Hadoop/Cascading is a major benefit.
Clojure’s weaknesses
Is my decision the right one? What are the downsides of using Clojure you might ask. There are definitely a few. For me the usual Lisp parenthesis-madness is not so much of a problem because of great Emacs modules like paredit basically prevents unbalanced parenthesis.
Clojure is however a very young programming language and you never know what the support is going to be in 5 years time. Before a language gains a critical mass it’s relatively easy to abandon it if the author does not want to support further development. On top of that we have seen how Oracle, the new owner of Java, has been treating the Java community.
I have yet to hire a Clojure developer so that’s probably an area where I’m restricting us to a smaller group of programmers.
Conclusion
I hope I’ve given a rationale for my technical decisions that make sense to the reader. I’m quite happy with my decision and I hope others in a similar situation give Clojure a try – it’s well worth it.
There may currently be a relatively small pool of people with Clojure skills, but those who get it are probably those who are experienced developers who’ve seen the pros and cons of other languages … a variation on Paul Graham’s Python Paradox http://www.paulgraham.com/pypar.html
I read yesterday that BankSimple are using the JVM as a convergence platform that allows them to bring together the best bits of JRuby, Clojure and Scala. http://www.quora.com/Which-startups-are-using-Clojure
Good luck!