Liam Brannigan

Blog posts

Published on: 5th September 2022

Don’t loop over columns in Polars

This post was created while writing my Data Analysis with Polars course. Check it out on Udemy

If you’re writing Polars code like this

for col in df.columns:
do stuff

then STOP!!!!

Instead, use expressions and then Polars will parallelise the loop over the columns for you. By looping explicitly in python you’re killing the parallelisation.

For example if we want to count the number of unique values in every column we do

df.select(pl.all().n_unique())

or if we wanted to count the number of unique values but only in string (Utf8) columns we do

df.select(pl.col(pl.Utf8)).select(pl.all().n_unique())

Doing it this way with expressions will will give you the 🚀 performance you expect!

Learn more

Want to know more about Polars for high performance data science? Then you can:

follow me on twitter
connect with me at linkedin
check out my youtube videos

or let me know if you would like a Polars workshop for your organisation.

This site is open source. Improve this page.