Yesterday I spent a little bit of time improving the speed of adding a new printer using system-config-printer. The main problem was that several bugs had conspired to make it search for all printer drivers three times instead of just once (oops). After fixing that I tried profiling it to see what was taking the most time.
I used the cProfile/pstats
modules to do this. It’s quite easy to do. I replace the part where my program says:
gtk.main()
with:
import cProfile cProfile.run("gtk.main()", "profile.out")
After running the program I used an interactive Python shell to display the profiling statistics, ordered by cumulative time taken in each function:
$ python >>> import pstats >>> p=pstats.Stats("profile.out") >>> p.strip_dirs().sort_stats("cumulativ").print_stats() Wed Jul 15 17:57:41 2009 profile.out 1962150 function calls (1959323 primitive calls) in 21.377 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 21.377 21.377 <string>:1(<module>) 1 3.968 3.968 21.377 21.377 {gtk._gtk.main} 1 0.000 0.000 12.303 12.303 system-config-printer.py:4109(on_btnNPForward_clicked) 1 0.000 0.000 12.303 12.303 system-config-printer.py:4112(nextNPTab) 1 0.000 0.000 6.549 6.549 system-config-printer.py:4052(loadPPDs) 1 0.001 0.001 6.549 6.549 system-config-printer.py:4018(fetchPPDs) 60 6.515 0.109 6.515 0.109 {time.sleep} 1 0.000 0.000 4.631 4.631 system-config-printer.py:3360(on_new_printer_activate) 1 0.000 0.000 4.539 4.539 system-config-printer.py:3720(init) 1 0.000 0.000 4.342 4.342 ppds.py:475(getPPDNameFromDeviceID) 1 0.024 0.024 4.279 4.279 system-config-printer.py:4850(fillDeviceTab) 1 0.000 0.000 4.208 4.208 system-config-printer.py:4538(fetchDevices) 1 0.000 0.000 4.096 4.096 cupshelpers.py:511(getDevices) 1 0.000 0.000 4.096 4.096 authconn.py:184() 1 2.069 2.069 4.096 4.096 authconn.py:186(_authloop) 4 0.083 0.021 3.985 0.996 ppds.py:816(_init_makes) 11133 0.753 0.000 3.933 0.000 ppds.py:34(ppdMakeModelSplit) 202792 0.463 0.000 2.056 0.000 re.py:144(sub) ...
As system-config-printer is an interactive graphical application, a lot of time was spent waiting for me to react to the interface. The functions at the top of the list are there for that reason. However, getPPDNameFromDeviceID
is not interactive yet is taking an enormous amount of time: over four seconds! This time does not include the time taken to retrieve the list of available PPDs from CUPS (see loadPPDs
), but is entirely spent on processing that data and trying to find a matching driver for the printer I was using.
In a lot of cases it would not take as long as this. In this instance, however, there is no exact match for my printer so it has to perform a lot more processing in order to find a “close” match. Part of that processing involves splitting the (often messy) ppd-make-and-model attributes for the PPDs into manufacturer names and model names.
The reason this was taking so long is apparent from the profiling, in conjunction with reading the code. It is the fact that regular expressions are used extensively to process the ppd-make-and-model attributes. This is confirmed by running p.print_callers(20)
:
... Function was called by... ncalls tottime cumtime ... re.py:144(sub) <- 202791 0.463 2.055 ppds.py:34(ppdMakeModelSplit) ...
One way of making this more efficient would be to "compile" the regular expressions once, then execute the compiled expressions on each call. However, by using simple string operations instead of regular expressions in that function, the cumulative time used by ppdMakeModelSplit
was reduced from 3.93s to 0.98s.
The lesson: regular expressions are expensive!