Tim's Erlang Exercise - Summary
>>> Updated Nov 1:
Tim tested my last attempt tbray5.erl, which was described on Learning Coding Binary (Was Tim's Erlang Exercise - Round VI), got for his 971,538,252 bytes of data in 4,625,236 lines log file:
real 0m20.74s user 3m51.33s sys 0m8.00s
It's not the fastest, since I did not apply Boyer-Moore searching. But it's what I want: a balance between simple, readable and speed.
========
>>> Updated Oct 24:
The Erlang code can be faster than un-parallelized Ruby, a new version run 2.97 sec on the 4-CPU box: Learning Coding Binary (Was Tim's Erlang Exercise - Round VI)
========
>>> Updated Oct 22:
As Bjorn's suggestion, I added "+h 4096" option for 'erl', which means "sets the default heap size of processes to the size 4096", the elapsed time dropped from 7.7s to 5.5s immediately:
time erl +h 4096 -smp -noshell -run tbray4 start o1000k.ap 10 -s erlang halt
The +h option seems to affect on binary version a lot, but few on list version. This may be caused by that list is always copied and binary may be left in process' heap and passed by point?
The default heap size for each process is set to 233 Word, this number may be suitable for a lot of concurrent processes to avoid too much memory exhaust. But for some parallelization tasks, with less processes, or with enough memory, the heap size can be adjusted to a bit large.
Anyway, I think Erlang/OTP has been very good there for Concurrency, but there may be still room to optimize for Parallelization.
BTW, with +h option, and some tips for efficient binary, the most concise binary version tbray5.erl can run into 3 sec now.
========
This is a performance summary on Tim's Erlang exercise on large dataset processing, I only compare the results on a 4-CPU Intel Xeon 2.80G linux box:
| Log File | Time | Erlang(1 Proc) | Erlang(Many Proc) | Erlang(Many Proc) +h 4096 | Ruby |
| 1 milli lines | real | 22.088s | 7.700s | 5.475s | 4.161s |
| user | 21.161s | 25.750s | 18.785s | 3.592s | |
| sys | 0.924s | 3.552s | 1.352s | 0.568s | |
| 5 milli lines | real | 195.570s | 37.669s | 27.911s | 20.768s |
| user | 192.496s | 126.296s | 98.162s | 19.009s | |
| sys | 3.480s | 17.789s | 7.344s | 3.116s |
Notice:
- The Erlang code is tbray4.erl, which can be found in previous blog, the Ruby code is from Tim's blog.
- Erlang code is parallelized, Ruby code not.
- Erlang code is with tons of code, but, parallelization is not free lunch.
- With an 8-CPU box, Erlang's version should exceed or near to non-parallelized Ruby version*.
- Although we are talking about multiple-core era, but I'm not sure if disk/io is also ready.
* Per Steve's testing.
Posted at 01:54AM Oct 21, 2007 by dcaoyuan in Erlang | Comments[4]
i want to know some article from here .
Posted by CZ on October 21, 2007 at 07:32 AM PDT #
Hi Deng,
I made some new code with interesting behaviour and potential. Can you test it on your MacBook and some other multi core machine?
http://pichis-blog.blogspot.com/2007/10/wide-finder-project-fold.html
Posted by Hynek (Pichi) Vychodil on October 21, 2007 at 11:55 AM PDT #
Hi Pichi,
With procNum = 10, I got
real 0m4.890s
user 0m13.065s
sys 0m5.128s
on 4-CPU linux box of your code.
For small piece, list is faster than binary. But I'm waiting for OTP R12B for a new testing.
Posted by Caoyuan Deng on October 21, 2007 at 09:25 PM PDT #
Deng, thanks for testing very much.
Posted by Hynek (Pichi) Vychodil on October 21, 2007 at 11:43 PM PDT #