in data science

Simple (as Possible) Drake Installation

Factual has created a data-workflow tool called Drake. Drake lets the analyst outline her command-line instructions -- including data collection, pre-processing, analysis, validation, and visualization -- and easily run them together. If the analyst modifies code or data in the workflow, Drake naturally re-runs all instructions that depend on that modified piece. This makes for a cleaner, more efficient, more reproducible workflow.

Installing Drake requires Java JDK, Leiningen, the Drake uberjar, and a shell script. Here I provide a series of steps that can install these things on an Ubuntu system. Note that I chose to put the Drake files in the /usr/local/bin/ directory, which resides in my PATH.


sudo apt-get update && sudo apt-get upgrade
sudo apt-get install default-jre
sudo apt-get install default-jdk


sudo wget -O /usr/local/bin/lein
sudo chmod a+x /usr/local/bin/lein
bash lein

Update: on some systems I got a curl certificate error when I ran bash lein. Often this points to a missing ca-certificates package, but it was installed on my systems. I had to do this.


sudo wget -O /usr/local/bin/drake.jar
sudo chmod a+x /usr/local/bin/lein

Then create/usr/local/bin/drake:

#!/usr/bin/env bash
java -cp $(dirname $0)/drake.jar drake.core $@

And finally

sudo chmod 755 /usr/local/bin/drake

I can now run Drake from any directory that has a Drakefile (the file that contains the Drake instructions).

Write a Comment