I was updating the nmap package for ubuntu on a MIPS machine and I noticed the performance was unaceptable. I changed to libstdcpp and it improved in the order of 1000 times(1s vs 1000s). I narrowed the problem to two calls on a std:map variable: class std::map<port_spec, service_node> service_table. line 241: i = service_table.find(ps); takes 50% of the time. line 274: service_table[ps] = sn; takes 45% of the time. Can I help in any way to find the problem/is this not a real issue? Regards
(In reply to comment #0) > I was updating the nmap package for ubuntu on a MIPS machine and I noticed the I mean openwrt, and not ubuntu.
(In reply to comment #1) > (In reply to comment #0) > > I was updating the nmap package for ubuntu on a MIPS machine and I noticed the > > I mean openwrt, and not ubuntu. You could provide a small, self-contained testcase that illustrates the problem. How big is that map? How does it look like?
Sorry, I didn't provide that much information in fact. The elements of class std::map<port_spec, service_node> service_table are: struct port_spec { int portno; /* Network byte order */ std::string proto; /* Sort in the usual nmap-services order. */ bool operator<(const port_spec& other) const { if (this->portno < other.portno) return true; else if (this->portno > other.portno) return false; else return this->proto < other.proto; } }; /* This is a servent augmented by a frequency ratio. */ struct service_node : public servent { public: double ratio; }; It will take up to 65536 elements. I checked on http://cxx.uclibc.org/faq.html: A deque needs only 3 pointers and a size parameter (essentially 4 integers) to manage itself. Compare this with at least 2 pointers per item for a tree. After two items, the memory overhead of the map already outweighs that of the deque. Also, deque lookups may be faster as moving to the next level in the search does not require indirection. On the down side, a deque does not resize cleanly like a map does. It will use more memory than needed in buffering, and will occasionally need to copy all of the elements to a new container for more free space. Inserts also execute in O(n) time instead of O(log(n)) time on a deque unless they are to be added to the front or back of the deque, at which point they are inserted in O(1)+ time. Lookups and updates are O(log(n)) on both systems. In the end, I thought that the memory usage of the system outweighed the performance issues involved. Does this mean this is a intrinsic performance problem to save memory? I must always use the standard libstdc++?
(In reply to comment #3) > Sorry, I didn't provide that much information in fact. > > The elements of class std::map<port_spec, service_node> service_table are: > > struct port_spec { > int portno; /* Network byte order */ > std::string proto; > > /* Sort in the usual nmap-services order. */ > bool operator<(const port_spec& other) const { > if (this->portno < other.portno) > return true; > else if (this->portno > other.portno) > return false; > else > return this->proto < other.proto; This sounds rather erm elaborate, but ok. > } > }; > > /* This is a servent augmented by a frequency ratio. */ > struct service_node : public servent { > public: > double ratio; > }; > > > It will take up to 65536 elements. > Does this mean this is a intrinsic performance problem to save memory? I must > always use the standard libstdc++? Well, *currently*. As said, please provide a small, self contained testcase with real world data and we'll see what we can do.
(In reply to comment #4) > As said, please provide a small, self contained testcase with real world data > and we'll see what we can do. Attached is a small testcase for the function find. I did not try if add/insert is as bad too. The times were meassured on a 360 MHz mipsel embedded system. The binaries were compiled static without optimization flags. root@fritz:/var/mod/root# time ./map_test.uClibcxx < data You typed 'spoon' 5 time(s) real 0m 0.73s user 0m 0.69s sys 0m 0.05s root@fritz:/var/mod/root# time ./map_test.libstdcxx < data You typed 'spoon' 5 time(s) real 0m 0.02s user 0m 0.02s sys 0m 0.01s
Created attachment 3349 [details] testcase for std::map function find
Created attachment 3355 [details] Data (text) taken from Wikipedia
With a 180 KB text file the results are more clear: root@fritz:/var/mod/root# time ./map_test.libstdcxx < data You typed 'spoon' 5 time(s) real 0m0.924s user 0m0.880s sys 0m0.040s root@fritz:/var/mod/root# time ./map_test.uClibcxx < data You typed 'spoon' 5 time(s) real 14m19.181s user 9m28.790s sys 4m43.530s
Any update on this? Oliver Metz provided the test case.
(In reply to comment #4) > (In reply to comment #3) > > Sorry, I didn't provide that much information in fact. > > > > The elements of class std::map<port_spec, service_node> service_table are: > > > > struct port_spec { > > int portno; /* Network byte order */ > > std::string proto; > > > > /* Sort in the usual nmap-services order. */ > > bool operator<(const port_spec& other) const { > > if (this->portno < other.portno) > > return true; > > else if (this->portno > other.portno) > > return false; > > else > > return this->proto < other.proto; > > This sounds rather erm elaborate, but ok. > > > } > > }; > > > > /* This is a servent augmented by a frequency ratio. */ > > struct service_node : public servent { > > public: > > double ratio; > > }; > > > > > > It will take up to 65536 elements. > > > Does this mean this is a intrinsic performance problem to save memory? I must > > always use the standard libstdc++? > > Well, *currently*. > As said, please provide a small, self contained testcase with real world data > and we'll see what we can do. Bernhard Reutner-Fischer: Ping
Wow, seems I might have pulled this one back up from the grave. I didn't notice the last post was over 2 years ago.
Created attachment 6001 [details] авк Жесть
The content of attachment 6001 [details] has been deleted by Bernhard Reutner-Fischer <aldot@uclibc.org> who provided the following reason: pdf The token used to delete this attachment was generated at 2015-04-16 08:13:27 UTC.