Published on: 5th September 2022
This post was created while writing my Data Analysis with Polars course. Check it out on Udemy
If you’re writing Polars code like this
for col in df.columns:
do stuff
then STOP!!!!
Instead, use expressions and then Polars will parallelise the loop over the columns for you. By looping explicitly in python you’re killing the parallelisation.
For example if we want to count the number of unique values in every column we do
df.select(pl.all().n_unique())
or if we wanted to count the number of unique values but only in string (Utf8) columns we do
df.select(pl.col(pl.Utf8)).select(pl.all().n_unique())
Doing it this way with expressions will will give you the 🚀 performance you expect!
Want to know more about Polars for high performance data science? Then you can:
or let me know if you would like a Polars workshop for your organisation.