Making beautiful plots in Python (plus a shameless book plug!)

written in python, ggplot2

When I transitioned over to working primarily in Python from R, one of the things that I missed was ggplot2. For me, the plots in ggplot2 look so much nicer and the syntax is more intuitive compared to matplotlib. Happily, last year I discovered that Hassan Kibirige has made a comprehensive port of ggplot2 to Python called plotnine. Although Hassan is still developing the package, I’ve found that the port is fairly complete and there is not much I’m unable to do in plotnine that I could do in the original ggplot2.

In fact, the port is so good that Mauricio and I have written a technical book about how to get the most out of plotnine. This book is an adaptation of our popular The Hitchhiker’s Guide to Ggplot2, but written entirely in Python using the plotnine package, and it’s (very originally) called The Hitchhiker’s Guide to Plotnine. Like in the original ggplot2 book, we’ve written chapters on a wide range of graphs, from line and bar graphs, scatterplots and boxplots, through to histograms, linear regression graphs and LOWESS plots. Using plotnine, we have been able to recreate graphs that are very true to the original ggplot2 style and degree of customisation.

For example, you can create charts like the density plot below, with customisation on the fonts, fill, background layout and legend.

You can also use plotnine to recreate the styles of other publications. For example, this weighted scatterplot was created in the style of fivethirtyeight.

We’ve been able to recreate the XKCD style plots we made for our previous book, entirely in plotnine.

plotnine also allows you to go beyond simpler plots like bar and scatterplots, and create more advanced statistical visualisations. In the plot below, you can see we’ve combined the capabilities of the scipy package with plotnine to chart some probability density functions.

The package also includes the useful faceting functions that are used in ggplot2, so you can create subplots like the one below.

Hassan has made the syntax of plotnine as close to that of ggplot2 as possible, so the overall experience is quite intuitive for someone who is familiar with using ggplot2. For example, in order to create a histogram with a customised title and axis labels, as in the code below, you can see that you use the same ggplot() and geom_histogram() functions to create the plot that you would use in R. You can also see that, like in R, you can specify the metric used by the histogram, its binwidth, and the spacing of the x-axis, although you can see we’ve used numpys arange function to create the breaks rather than the seq function you might use in R. Finally, plotnine uses the same ggtitle, xlab and ylab functions to allow you to label your title and your axes.

p7 = (
  ggplot(diamonds, aes("price")) 
  + geom_histogram(aes(y="..count.."), binwidth=500) 
  + scale_x_continuous(breaks=np.arange(0, 22500, 2500)) 
  + ggtitle("Price of diamonds by carat") 
  + xlab("Price of diamond (US$)") 
  + ylab("Frequency of price")
     )
p7

Overall, I really recommend plotnine as an alternative to matplotlib in Python, especially for when you need to create presentation-ready charts. Our book is available on Leanpub, should you need some extra help learning how to use the package or want to push your charts to the next level. Finally, I want to give a huge thanks to Hassan for all of his hard work in building and maintaining this wonderful implementation of ggplot2 in Python!