As of Python 3.2, the “ipaddress” module has been integrated into the stdlib. Personally, I find it a bit premature, as the library code does not look to be very PEP8 compliant. Still, it fills a huge gap in the stdlib.
In the last days, I needed to find a way to collapse consecutive IP networks into supernets whenever possible. Turns out, there’s a function for that: ipaddress.collapse_addresses. Unfortunately, I was unable to use it directly as-is because I don’t have a collection of networks, but rather object instances which have “network” as a member variable. And it would be impossible to extract the networks, collapse them and correlate the results back to the original instances.
So I decided to dive into the stdlib source code and get some “inspiration” to accomplish this task. To me personally, the code was fairly difficult to follow. About 60 lines comprised of two functions where one calls the other one recursively.
I thought I could do better. And preliminary tests are promising. It’s no longer recursive (it’s shift-reduceish if you will) and about 30 lines shorter. Now, the original code does some type checking which I might decide to add later on, increasing the number of lines a bit, and maybe even hit performance. I’m still confident.
A run with 30k IPv6 networks took 93 seconds with the new algorithm using up 490MB of memory. The old, stdlib code took 230 seconds to finish with a peak memory usage of 550MB. All in all, good results.
Note that in both cases, the 30k addresses had to be loaded into memory, so they will take up a considerable amount as well, but that size is the same in both runs.
I still have an idea in mind to improve the memory usage. I’ll give that a try.
Here are a few stats:
With the new algorithm:
generating 300000 addresses...
... done
new: 92.98410562699428
Command being timed: "./env/bin/python mantest.py 300000"
User time (seconds): 92.79
System time (seconds): 0.28
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:33.07
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 491496
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 123911
Voluntary context switches: 1
Involuntary context switches: 154
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
and with the old algorithm:
generating 300000 addresses...
... done
old: 229.66894743399462
Command being timed: "./env/bin/python mantest.py 300000"
User time (seconds): 229.35
System time (seconds): 0.38
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:49.76
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 549592
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 144970
Voluntary context switches: 1
Involuntary context switches: 1218
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I’ll add more details as I go… I’m too “into it” and keep forgetting time and to post fun stuff on-line… stay tuned.