Tim's Erlang Exercise - Summary

>>> Updated Nov 1:
Tim tested my last attempt tbray5.erl, which was described on Learning Coding Binary (Was Tim's Erlang Exercise - Round VI), got for his 971,538,252 bytes of data in 4,625,236 lines log file:

real    0m20.74s
user    3m51.33s
sys     0m8.00s

It's not the fastest, since I did not apply Boyer-Moore searching. But it's what I want: a balance between simple, readable and speed.
========

>>> Updated Oct 24:
The Erlang code can be faster than un-parallelized Ruby, a new version run 2.97 sec on the 4-CPU box: Learning Coding Binary (Was Tim's Erlang Exercise - Round VI)
========

>>> Updated Oct 22:
As Bjorn's suggestion, I added "+h 4096" option for 'erl', which means "sets the default heap size of processes to the size 4096", the elapsed time dropped from 7.7s to 5.5s immediately:

time erl +h 4096 -smp -noshell -run tbray4 start o1000k.ap 10 -s erlang halt

The +h option seems to affect on binary version a lot, but few on list version. This may be caused by that list is always copied and binary may be left in process' heap and passed by point?

The default heap size for each process is set to 233 Word, this number may be suitable for a lot of concurrent processes to avoid too much memory exhaust. But for some parallelization tasks, with less processes, or with enough memory, the heap size can be adjusted to a bit large.

Anyway, I think Erlang/OTP has been very good there for Concurrency, but there may be still room to optimize for Parallelization.

BTW, with +h option, and some tips for efficient binary, the most concise binary version tbray5.erl can run into 3 sec now.
========

This is a performance summary on Tim's Erlang exercise on large dataset processing, I only compare the results on a 4-CPU Intel Xeon 2.80G linux box:

Log FileTimeErlang(1 Proc)Erlang(Many Proc)Erlang(Many Proc)
+h 4096
Ruby
1 milli linesreal22.088s7.700s5.475s4.161s
user21.161s25.750s18.785s3.592s
sys0.924s3.552s1.352s0.568s
5 milli linesreal195.570s37.669s27.911s20.768s
user192.496s126.296s98.162s19.009s
sys3.480s17.789s7.344s3.116s

Notice:

  • The Erlang code is tbray4.erl, which can be found in previous blog, the Ruby code is from Tim's blog.
  • Erlang code is parallelized, Ruby code not.
  • Erlang code is with tons of code, but, parallelization is not free lunch.
  • With an 8-CPU box, Erlang's version should exceed or near to non-parallelized Ruby version*.
  • Although we are talking about multiple-core era, but I'm not sure if disk/io is also ready.

* Per Steve's testing.

Comments

1. CZ -- 2007-10-20 09:00

i want to know some article from here .

2. Hynek (Pichi) Vychodil -- 2007-10-21 09:00

Hi Deng, I made some new code with interesting behaviour and potential. Can you test it on your MacBook? and some other multi core machine?

 http://pichis-blog.blogspot.com/2007/10/wide-finder-project-fold.html

3. Caoyuan Deng -- 2007-10-21 09:00

Hi Pichi,

With procNum = 10, I got real 0m4.890s user 0m13.065s sys 0m5.128s on 4-CPU linux box of your code.

For small piece, list is faster than binary. But I'm waiting for OTP R12B for a new testing.

4. Hynek (Pichi) Vychodil -- 2007-10-21 09:00

Deng, thanks for testing very much.