{"id":751,"date":"2009-07-16T11:03:07","date_gmt":"2009-07-16T10:03:07","guid":{"rendered":"http:\/\/cyberelk.net\/tim\/?p=751"},"modified":"2014-05-20T09:07:28","modified_gmt":"2014-05-20T08:07:28","slug":"profiling-python","status":"publish","type":"post","link":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/","title":{"rendered":"Profiling Python"},"content":{"rendered":"<p>Yesterday I spent a little bit of time improving the speed of adding a new printer using <a href=\"https:\/\/cyberelk.net\/tim\/software\/system-config-printer\/\">system-config-printer<\/a>.  The main problem was that several bugs had conspired to make it search for all printer drivers three times instead of just once (oops).  After fixing that I tried profiling it to see what was taking the most time.<\/p>\n<p><!--more--><\/p>\n<p>I used the <code>cProfile\/pstats<\/code> modules to do this.  It&#8217;s quite easy to do.  I replace the part where my program says:<\/p>\n<pre>gtk.main()<\/pre>\n<p>with:<\/p>\n<pre>import cProfile\r\ncProfile.run(\"gtk.main()\", \"profile.out\")<\/pre>\n<p>After running the program I used an interactive Python shell to display the profiling statistics, ordered by cumulative time taken in each function:<\/p>\n<pre>$ python\r\n>>> import pstats\r\n>>> p=pstats.Stats(\"profile.out\")\r\n>>> p.strip_dirs().sort_stats(\"cumulativ\").print_stats()\r\nWed Jul 15 17:57:41 2009    profile.out\r\n\r\n         1962150 function calls (1959323 primitive calls) in 21.377 CPU seconds\r\n\r\n   Ordered by: cumulative time\r\n\r\n   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\r\n        1    0.000    0.000   21.377   21.377 &lt;string&gt;:1(&lt;module&gt;)\r\n        1    3.968    3.968   21.377   21.377 {gtk._gtk.main}\r\n        1    0.000    0.000   12.303   12.303 system-config-printer.py:4109(on_btnNPForward_clicked)\r\n        1    0.000    0.000   12.303   12.303 system-config-printer.py:4112(nextNPTab)\r\n        1    0.000    0.000    6.549    6.549 system-config-printer.py:4052(loadPPDs)\r\n        1    0.001    0.001    6.549    6.549 system-config-printer.py:4018(fetchPPDs)\r\n       60    6.515    0.109    6.515    0.109 {time.sleep}\r\n        1    0.000    0.000    4.631    4.631 system-config-printer.py:3360(on_new_printer_activate)\r\n        1    0.000    0.000    4.539    4.539 system-config-printer.py:3720(init)\r\n        1    0.000    0.000    4.342    4.342 ppds.py:475(getPPDNameFromDeviceID)\r\n        1    0.024    0.024    4.279    4.279 system-config-printer.py:4850(fillDeviceTab)\r\n        1    0.000    0.000    4.208    4.208 system-config-printer.py:4538(fetchDevices)\r\n        1    0.000    0.000    4.096    4.096 cupshelpers.py:511(getDevices)\r\n        1    0.000    0.000    4.096    4.096 authconn.py:184(<lambda>)\r\n        1    2.069    2.069    4.096    4.096 authconn.py:186(_authloop)\r\n        4    0.083    0.021    3.985    0.996 ppds.py:816(_init_makes)\r\n    11133    0.753    0.000    3.933    0.000 ppds.py:34(ppdMakeModelSplit)\r\n   202792    0.463    0.000    2.056    0.000 re.py:144(sub)\r\n...<\/pre>\n<p>As system-config-printer is an interactive graphical application, a lot of time was spent waiting for me to react to the interface.  The functions at the top of the list are there for that reason.  However, <code>getPPDNameFromDeviceID<\/code> is not interactive yet is taking an enormous amount of time: over four seconds!  This time does not include the time taken to retrieve the list of available PPDs from CUPS (see <code>loadPPDs<\/code>), but is entirely spent on processing that data and trying to find a matching driver for the printer I was using.<\/p>\n<p>In a lot of cases it would not take as long as this.  In this instance, however, there is no exact match for my printer so it has to perform a lot more processing in order to find a &#8220;close&#8221; match.  Part of that processing involves splitting the (often messy) ppd-make-and-model attributes for the PPDs into manufacturer names and model names.<\/p>\n<p>The reason this was taking so long is apparent from the profiling, in conjunction with reading the code.  It is the fact that regular expressions are used extensively to process the ppd-make-and-model attributes.  This is confirmed by running <code>p.print_callers(20)<\/code>:<\/p>\n<pre>\r\n...\r\nFunction                  was called by...\r\n                              ncalls  tottime  cumtime\r\n...\r\nre.py:144(sub)            <-  202791    0.463    2.055  ppds.py:34(ppdMakeModelSplit)\r\n...\r\n<\/pre>\n<p>One way of making this more efficient would be to \"compile\" the regular expressions once, then execute the compiled expressions on each call.  However, by using simple string operations instead of regular expressions in that function, the cumulative time used by <code>ppdMakeModelSplit<\/code> was reduced from 3.93s to 0.98s.<\/p>\n<p>The lesson: regular expressions are expensive!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday I spent a little bit of time improving the speed of adding a new printer using system-config-printer. The main problem was that several bugs had conspired to make it search for all printer drivers three times instead of just once (oops). After fixing that I tried profiling it to see what was taking the [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3],"tags":[35,96,50],"class_list":["post-751","post","type-post","status-publish","format-standard","hentry","category-software","tag-python","tag-software","tag-system-config-printer"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Profiling Python - PRINT HEAD<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Profiling Python - PRINT HEAD\" \/>\n<meta property=\"og:description\" content=\"Yesterday I spent a little bit of time improving the speed of adding a new printer using system-config-printer. The main problem was that several bugs had conspired to make it search for all printer drivers three times instead of just once (oops). After fixing that I tried profiling it to see what was taking the [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/\" \/>\n<meta property=\"og:site_name\" content=\"PRINT HEAD\" \/>\n<meta property=\"article:published_time\" content=\"2009-07-16T10:03:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2014-05-20T08:07:28+00:00\" \/>\n<meta name=\"author\" content=\"Tim Waugh\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tim Waugh\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/\"},\"author\":{\"name\":\"Tim Waugh\",\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/#\\\/schema\\\/person\\\/23b749f30a67f1b1c6af17024fc94bf6\"},\"headline\":\"Profiling Python\",\"datePublished\":\"2009-07-16T10:03:07+00:00\",\"dateModified\":\"2014-05-20T08:07:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/\"},\"wordCount\":343,\"publisher\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/#\\\/schema\\\/person\\\/23b749f30a67f1b1c6af17024fc94bf6\"},\"keywords\":[\"python\",\"Software\",\"system-config-printer\"],\"articleSection\":[\"Software\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/\",\"url\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/\",\"name\":\"Profiling Python - PRINT HEAD\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/#website\"},\"datePublished\":\"2009-07-16T10:03:07+00:00\",\"dateModified\":\"2014-05-20T08:07:28+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/2009\\\/07\\\/16\\\/profiling-python\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Profiling Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/#website\",\"url\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/\",\"name\":\"PRINT HEAD\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/#\\\/schema\\\/person\\\/23b749f30a67f1b1c6af17024fc94bf6\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/#\\\/schema\\\/person\\\/23b749f30a67f1b1c6af17024fc94bf6\",\"name\":\"Tim Waugh\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/wp-content\\\/uploads\\\/2023\\\/01\\\/printhead.png\",\"url\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/wp-content\\\/uploads\\\/2023\\\/01\\\/printhead.png\",\"contentUrl\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/wp-content\\\/uploads\\\/2023\\\/01\\\/printhead.png\",\"width\":731,\"height\":140,\"caption\":\"Tim Waugh\"},\"logo\":{\"@id\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/wp-content\\\/uploads\\\/2023\\\/01\\\/printhead.png\"},\"sameAs\":[\"http:\\\/\\\/cyberelk.net\\\/tim\"],\"url\":\"https:\\\/\\\/cyberelk.net\\\/tim\\\/author\\\/twaugh\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Profiling Python - PRINT HEAD","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/","og_locale":"en_GB","og_type":"article","og_title":"Profiling Python - PRINT HEAD","og_description":"Yesterday I spent a little bit of time improving the speed of adding a new printer using system-config-printer. The main problem was that several bugs had conspired to make it search for all printer drivers three times instead of just once (oops). After fixing that I tried profiling it to see what was taking the [&hellip;]","og_url":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/","og_site_name":"PRINT HEAD","article_published_time":"2009-07-16T10:03:07+00:00","article_modified_time":"2014-05-20T08:07:28+00:00","author":"Tim Waugh","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Tim Waugh","Estimated reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/#article","isPartOf":{"@id":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/"},"author":{"name":"Tim Waugh","@id":"https:\/\/cyberelk.net\/tim\/#\/schema\/person\/23b749f30a67f1b1c6af17024fc94bf6"},"headline":"Profiling Python","datePublished":"2009-07-16T10:03:07+00:00","dateModified":"2014-05-20T08:07:28+00:00","mainEntityOfPage":{"@id":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/"},"wordCount":343,"publisher":{"@id":"https:\/\/cyberelk.net\/tim\/#\/schema\/person\/23b749f30a67f1b1c6af17024fc94bf6"},"keywords":["python","Software","system-config-printer"],"articleSection":["Software"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/","url":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/","name":"Profiling Python - PRINT HEAD","isPartOf":{"@id":"https:\/\/cyberelk.net\/tim\/#website"},"datePublished":"2009-07-16T10:03:07+00:00","dateModified":"2014-05-20T08:07:28+00:00","breadcrumb":{"@id":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/cyberelk.net\/tim\/2009\/07\/16\/profiling-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cyberelk.net\/tim\/"},{"@type":"ListItem","position":2,"name":"Profiling Python"}]},{"@type":"WebSite","@id":"https:\/\/cyberelk.net\/tim\/#website","url":"https:\/\/cyberelk.net\/tim\/","name":"PRINT HEAD","description":"","publisher":{"@id":"https:\/\/cyberelk.net\/tim\/#\/schema\/person\/23b749f30a67f1b1c6af17024fc94bf6"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cyberelk.net\/tim\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":["Person","Organization"],"@id":"https:\/\/cyberelk.net\/tim\/#\/schema\/person\/23b749f30a67f1b1c6af17024fc94bf6","name":"Tim Waugh","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/cyberelk.net\/tim\/wp-content\/uploads\/2023\/01\/printhead.png","url":"https:\/\/cyberelk.net\/tim\/wp-content\/uploads\/2023\/01\/printhead.png","contentUrl":"https:\/\/cyberelk.net\/tim\/wp-content\/uploads\/2023\/01\/printhead.png","width":731,"height":140,"caption":"Tim Waugh"},"logo":{"@id":"https:\/\/cyberelk.net\/tim\/wp-content\/uploads\/2023\/01\/printhead.png"},"sameAs":["http:\/\/cyberelk.net\/tim"],"url":"https:\/\/cyberelk.net\/tim\/author\/twaugh\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pnnS2-c7","_links":{"self":[{"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/posts\/751","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/comments?post=751"}],"version-history":[{"count":17,"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/posts\/751\/revisions"}],"predecessor-version":[{"id":768,"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/posts\/751\/revisions\/768"}],"wp:attachment":[{"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/media?parent=751"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/categories?post=751"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cyberelk.net\/tim\/wp-json\/wp\/v2\/tags?post=751"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}