Published on: 13th September 2022
This post was created while writing my Data Analysis with Polars course. Check it out on Udemy
Polars has optimizations for when you’re working with sorted data.
To access them you tell Polars the data is sorted with the set_sorted
flag.
In this simple example we find the median 1500x faster when we tell Polars the series is sorted.
# Create a series with 10 million entries
s = pl.Series("a", range(0,int(1e7)))
# Call .median without set_sorted
s.median()
# Time: 0.3 s
# Call .median with set_sorted
s.set_sorted().median()
# Time: 0.0002 s
You may already be taking advantage of set_sorted
without realising it. Polars will apply set_sorted automatically if you do any operations with an implicit or explicit sort.
set_sorted
also works with other operations - in some of my workflows a groupby
on a large dataset is 40% faster on a column that Polars knows is sorted.
Want to know more about Polars for high performance data science and ML? Then you can:
or let me know if you would like a Polars workshop for your organisation.