From version v0.25 onwards Graph::Easy no longer uses the
Graph module. This
page explains why and also documents time and memory benchmarks
across different software versions and graph sizes. It also compares
the performance of Graph::Easy vs. dot
.
Unless otherwise noted, the benchmarks were done on the following system:
CPU |
processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Athlon(tm) XP 2400+ stepping : 1 cpu MHz : 2010.963 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 3964.92 |
---|---|
OS | SuSE 9.1 |
Kernel | 2.6.5-7.201-default |
gcc | 3.3.3 |
Perl | This is perl, v5.8.6 built for i686-linux (32 bit) |
Memory |
MemTotal: 1035680 kB SwapTotal: 0 kB SwapFree: 0 kB |
Simple graph
In the first example we try to measure the overhead of loading
the program. We do render a very small graph, consisting of only
two nodes, connected by one edge. Shown below is the input to
examples/as_ascii
and dot
.
graph { autolink: name; } [ Bonn ] -> [ Berlin ]
digraph GRAPH_0 { // Generated by Graph::Easy 0.29 at Fri Sep 9 23:14:05 2005 edge [ arrowhead=open ]; graph [ rankdir=LR ]; node [ fontsize=11, fillcolor=white, style=filled, shape=box ]; Berlin [ URL="/wiki/index.php/Berlin" ] Bonn [ URL="/wiki/index.php/Bonn" ] Bonn -> Berlin }
Note that the graph contains two links for the nodes. By default,
dot
would only produce a PNG image, so we also need
to call it so that it constructs an image map which can be used
to link the individual nodes to their targets.
Since the extension responsible for converting the graph input
to the appropriate image+map does not know whether there
are links or not, it will typically try to create the
image map all the time. There are two ways this can be done,
either by calling dot twice with the same input, and different
output formats, or by asking dot for both at the same time,
storing the image in a file and receiving the image map
on STDOUT.
The current
graphviz extension
does run dot
always twice, which is the slower way as
we will see below.
Here are the two command lines used to benchmark all variants.
The variant where dot
produces only the image is
included just for comparisation and labeled "dot, png". The variant
that calls dot twice is labeled "dot, map + dot, png", and the
variant that chains both calls together is labeled "dot, map + png".
# dot producing image alone (dot, png) time dot -Tpng -otest.png test.dot # dot producing image and map (dot, map + dot, png) time dot -Tpng -otest.png test.dot time dot -Tcmapx -otest.map test.dot # dot producing image and map in one go (dot, map + png) time dot -Tpng -otest.png -Tcmapx test.dot # Graph::Easy time perl -Ilib examples/as_html test.txt
In all cases, the lowest observed real
time
was noted, based on the idea that any higher time represents background
noise from the system. Typically, the times do not vary more than a few
ms between runs, anyway. Although with small times like seen from
dot
, the relative variance might be very big.
Program | Version | Time in s |
---|---|---|
Graph::Easy | v0.24 | 0.159 |
Graph::Easy | v0.29 | 0.086 |
Graph::Easy | v0.34 | 0.114 |
Graph::Easy | v0.38 | 0.160 |
Graph::Easy | v0.40 | 0.154 |
dot, png | v1.1 | 0.015 |
dot, map + png | v1.1 | 0.017 |
dot, map + dot png | v1.1 | 0.026 |
dot, png | v2.6 | 0.090 |
dot, map + png | v2.6 | 0.091 |
dot, map + dot png | v2.6 | 0.171 |
As can be seen from the table above, the startup time for Graph::Easy
is much higher than the one from the old (v1.1) dot
- but it got
much lower from v0.25 onwards, and the sole reason for that is that Graph::Easy
no longer loads the really huge Graph module. Below we will examine this in
detail.
Also worth noting is that dot
alone is the fastest, however it
will not produce the desired result. Generating the image map takes some additional
time, albeit only a little bit. Running dot
twice is
slower than producing both results in one go (no wonder :)
V0.34 takes a bit longer, due to the increased code base. It is still faster
than v0.24 using Graph, though.
We also see that modern versions of dot
take much longer, and
come very close to the time used by Graph::Easy
. Who says
C is faster than Perl? :)
The timing above shows that the startup cost for Graph::Easy is high, and since for each graph rendered the startup occurs again, a way to reduce this cost must be found. One such way would be to write a deamon using Net::Server. This deamon would listen on a network port, take the the graph text as input and return the graph output. Thus the modules would be sitting in memory and not be needed to be loaded and compiled each time you want to render a graph. This might bring the per-graph time for small graphs even below the time from graphviz/dot!
Memory consumption
Also interesting is the memory consumption of the Perl process itself. This was measured by running:
perl -Ilib -MGraph::Easy -le 'sleep(100)'
and then looking with top
at the process size. While not 100%
accurate, it allows us to see trends. Here are the results on my system:
Version: | Virtual | Resident | Shared |
---|---|---|---|
v0.24 | 8244 | 5740 | 1456 |
v0.25 | 5980 | 3452 | 1440 |
v0.26 | 6240 | 3704 | 1440 |
v0.27 | 6104 | 3592 | 1432 |
v0.29 | 6108 | 3628 | 1436 |
v0.30 | 6104 | 3644 | 1436 |
v0.34 | 6368 | 3860 | 1436 |
v0.40 | 6760 | 4196 | 1372 |
The table shows us a few things:
- The large drop from 0.24 to 0.25 is due to the elimination of loading Graph. Despite the fact that Graph::Easy needs now a bit of code to emulate a few basic features of Graph it self, the memory base line dropped quit a bit. Graph is a module with many bells and whistles, spread over dozends source code files. Graph::Easy, on the other hand, only needs very few features and these are implemented in about 20 lines of Perl. So removing Graph not only reduces the initial loading time, but also the memory consumption. So far, so good :)
- The increase from 0.25 to 0.26 is due to more features and code in Graph::Easy.
- The decrease from 0.26 to 0.27 is due to not loading Heap, and using Heap::Binary instead of Heap::Fibonacci for A* pathfinding. Heap::Binary is faster, and smaller. In addition, the node cluster management was simplified, removing about 3% of code from Graph::Easy.
Creation and Dumping of big Graphs
In this test we create a series of graphs, with an increasing number of nodes and edges. The graph looks like this, continued to the right until N is met (the dotted nodes and edge indicate the continuation of the series):
+----+ +----+ +----+ +----+ +....+ | 2B | | 3B | | 4B | | 5B | : 6B : +----+ +----+ +----+ +----+ +....+ ^ ^ ^ ^ ^ | | | | : | | | | : +----+ +----+ +----+ +----+ +----+ | 1 | --> | 2 | --> | 3 | --> | 4 | --> | 5 | ..> +----+ +----+ +----+ +----+ +----+ | | | | : | | | | : v v v v v +----+ +----+ +----+ +----+ +....+ | 2A | | 3A | | 4A | | 5A | : 6A : +----+ +----+ +----+ +----+ +....+
So the resulting graph has N * 3 nodes and N * 3 edges, resulting in N * 6 objects in total.
The script to run the benchmark can be found inside the Graph::Easy package, or locally.
Graph::Easy v0.24 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
---|---|---|---|---|---|---|---|
Creation | 0.0090 | 0.0510 | 0.1100 | 0.1816 | 0.3225 | 0.7452 | 1.4581 |
as_txt | 0.0655 | 0.0726 | 0.3905 | 1.6635 | 7.6464 | 45.7824 | n/a |
as_ascii | 0.0073 | 0.1160 | 0.7574 | 3.0330 | 14.0460 | 73.6218 | n/a |
Memory | 44106 | 81622 | 412087 | 934900 | 2334018 | 8898275 | 29546659 |
Graph::Easy v0.25 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
Creation | 0.0016 | 0.0031 | 0.0092 | 0.0517 | 0.0698 | 0.1268 | 0.2347 |
as_txt | 0.0048 | 0.0060 | 0.0616 | 0.1004 | 0.1692 | 0.4278 | 0.8863 |
as_ascii | 0.0065 | 0.0582 | 0.1897 | 0.4048 | 1.0100 | 4.1961 | 14.0087 |
Memory | 30294 | 63595 | 411382 | 1723768 | 7058119 | 36626130 | 163530013 |
Graph::Easy v0.27 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
Creation | 0.0017 | 0.0029 | 0.0395 | 0.0579 | 0.0904 | 0.1783 | 0.3836 |
as_txt | 0.0053 | 0.0059 | 0.0584 | 0.0937 | 0.1639 | 0.3838 | 0.7867 |
as_ascii | 0.0051 | 0.0547 | 0.1562 | 0.3571 | 0.9905 | 4.4629 | 16.2674 |
Memory | 27721 | 53295 | 308183 | 974750 | 3398114 | 16663158 | 69591398 |
Graph::Easy v0.30 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
Creation | 0.0017 | 0.0029 | 0.0101 | 0.0591 | 0.0881 | 0.1755 | 0.3778 |
as_txt | 0.0055 | 0.0062 | 0.0611 | 0.0962 | 0.1685 | 0.3889 | 0.8188 |
as_ascii | 0.0053 | 0.0596 | 0.1872 | 0.4094 | 1.0915 | 4.7288 | 16.8745 |
Memory | 25868 | 49642 | 290130 | 938697 | 3326061 | 16483105 | 69231345 |
Graph::Easy v0.34 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
Creation | 0.0026 | 0.0050 | 0.0184 | 0.1082 | 0.1730 | 0.3477 | 0.7107 |
as_txt | 0.0057 | 0.0066 | 0.0587 | 0.0983 | 0.1795 | 0.4267 | 0.8927 |
as_ascii | 0.0076 | 0.0682 | 0.2395 | 0.4778 | 0.9776 | 2.7422 | 7.2366 |
Memory | 24527 | 48259 | 293486 | 1036800 | 3883069 | 19678914 | 84634059 |
Graph::Easy v0.39 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
Creation | 0.0018 | 0.0032 | 0.0384 | 0.0573 | 0.0920 | 0.1839 | 0.3894 |
as_txt | 0.0074 | 0.0066 | 0.0631 | 0.0977 | 0.1804 | 0.4227 | 0.8897 |
as_ascii | 0.0077 | 0.0763 | 0.2600 | 0.5194 | 1.0608 | 2.9838 | 7.6551 |
Memory | 24630 | 48362 | 293589 | 1036903 | 3883172 | 19679017 | 84634162 |
Graph::Easy v0.40 | 5 | 10 | 50 | 100 | 200 | 500 | 1000 |
Creation | 0.0019 | 0.0032 | 0.0421 | 0.0610 | 0.0901 | 0.1455 | 0.3655 |
as_txt | 0.0061 | 0.0067 | 0.0647 | 0.1010 | 0.1738 | 0.3943 | 0.9027 |
as_ascii | 0.0081 | 0.0812 | 0.2725 | 0.5396 | 1.1676 | 3.1486 | 8.5641 |
Memory | 24093 | 46985 | 285492 | 1020406 | 3849875 | 19595320 | 84466465 |
Times are in seconds, memory in bytes. The numbers at top are N, the resulting graph has N*3 nodes and N*3 edges.
There are a few interesting things to note here:
-
The first is that
as_txt
was a quadratic operation when using Graph (0.24), and is now a linear operation (0.25 onwards). Sinceas_txt()
is just a dump and processes each node/edge exactly once, this means that a layout (likeas_ascii()/as_html()
etc) has exactly the same properties, and will be even slower since it needs to do some more work, processing nodes/edges more than once.
Note that the time for a 6000-object graph is missing for 0.24, it would several minutes to completeas_txt()
and even longer foras_ascii()
. If you double the number of nodes/edges, it takes approximately 5 times as long. Uh. This scaled very badly. -
There are basically three times involved to create a graph. The first is
to parse the graph description. This time is absent in this test because
we constructed the graph via Perl code, not via Graph::Easy::Parser.
The second part is to create the node and edge objects (Graph::Easy::Node
and Graph::Easy::Edge) itself. The third time was the time taken by
Graph
to create an internal representation of the graph, and to connect it via edge and node attributes to the objects from Graph::Easy.
My naive idea was that the object creation took most of the time (afterall, it is heavy OO Perl code :), followed by parsing and Graph comes in with the smallest overhead.
In praxis, without parsing,Graph
took about 3/4 of all the time, and object creation only about 1/4. The effect is that after 0.24, creating a graph via Perl code is between 3 and 4 times faster, because storing nodes/edges simply as keys in two hashes if much faster than whatever Graph does. As shown previously, retrieving them is also much faster and scales much, much better. -
Removing Graph means we no longer need to store nodes/edges inside a
Graph object. This removes a bit of memory overhead. While I thought
this would be only a very small amount, it turned out that in praxis
about 30% of all the memory was wasted inside the structures Graph
created and only 60% was used by Graph::Easy itself.
In 0.27, the memory was reduced again by about 10% due to not initializing unneeded fields inside nodes and edges, like the error field, etc. That also makes graph creation about 10% faster, too. -
v0.34 is much faster in creating the
as_ascii()
version. This is mainly due to the rewritten chain tracking code and some streamlining. Note that v0.33 and earlier took about 3.5 times as long foras_ascii()
if you doubled the number of objects - v0.34 only takes about twice as long. So it dropped from about O(N*N) to O(N). - The memory shown is bogus (I used Devel::Size 0.64 instead of Devel::Size 0.58), and way too big. For instance, for 0.40, top says "virtual: 27348 resident: 23m shared: 1504" (in KByte), while Devel::Size claims that 0.40 uses around 84 Megabytes.
Parsing and Rendering
Not done yet.
Contact and Bugreports
If you have questions, feel free to send me an email (Gnupg key). Bugreports should go to rt.cpan.org.