Today I was digging through some code written by somebody else and trying to establish benchmarks against some new data we had just received. In doing so, I noticed the application was spending a lot of time at one particular stage. Upon inquiry, my coworkers said something to the effect of “oh yeah, that part always takes a long time to run.”
In looking through the code in question, some ideas began to form in my head, but I was missing data. So I turned to an old Python friend: cProfile
!
cProfile
is a built-in package—as in, part of the default Python distribution—that performs code profiling and is dead-simple to use. Given that I was interested only in a small slice of the overall program, I did have to dig a bit to understand how to essentially “wrap” a certain part of the program in profiling, so that was new for me.
It basically amounted to:
import cProfile, pstats
= cProfile.Profile()
profiler
profiler.enable()###
# All the code you want to profile goes here!
###
profiler.disable()= pstats.Stats(profiler)
stats "outputdir/outputfile.stats") stats.dump_stats(
That dumps out the profile information (in a binary format, sorry) to a file. From there, you can spin up something like IPython to read the binary and print out the stats:
= pstats.Stats("outputdir/outputfile.stats")
s "tottime").print_stats() s.sort_stats(
And then you have all the runtime stats of the code enclosed by your profiler! It even helped me identify a strange \(O(n^2)\) search loop; easy to miss, because it was a 1-liner, but was responsible for over 50% of the observed runtime! I probably would never have found it on my own.
For more information, check out the docs on the built-in Python profilers!
Citation
@online{quinn2024,
author = {Quinn, Shannon},
title = {\#WeblogPoMo2024, {Day} 16: {Python} {Profiling} {Rocks}},
date = {2024-05-16},
url = {https://magsol.github.io/2024-05-16-weblogpomo2024-day-16-python-profiling-rocks},
langid = {en}
}