Posts for the month of December 2008

A Case Study of Scalability Related "Out of memory" Crash in Erlang

We are building a platform for message switching, in Erlang. Everything looks OK on stability and features. It actually has run more than half year with zero down. We tested its performance on our 2-core CPU machine before, and got about 140 transactions/second, it's good enough.

Then, we got a 8-core CPU machine several weeks ago, and we did same performance testing on it, to see the scalability. Since Erlang is almost perfect on scalability, you can image the result, yes, about 700 transactions/second now, scaled almost linear. Until it crashed with "out of memory" when million hits processed.

It left a very big "erl_crash.dump" file there, I had to dig the issue. My first guess was, were some remote requests (access db, access remote web service etc) timeout but the process itself was not timeout yet, and cause more and more processes kept in VM?

A quick grep "=proc:" erl_crash.dump showed that the total number of processes was about 980, which was reasonable for our case.

So, which process ate so many memory? A quick grep "Stack+head" erl_crash.dump showed that there was indeed a process with 285082125 size of Stack+head there.

Following this clue, I caught this process:

=proc:<0.4.0>
State: Garbing
Name: error_logger
Spawned as: proc_lib:init_p/5
Last scheduled in for: io_lib_format:pad_char/2
Spawned by: <0.1.0>
Started: Sun Apr  1 01:21:50 2012
Message queue length: 2086029
Number of heap fragments: 1234053
Heap fragment data: 281266956
Link list: [<0.27.0>, <0.0.0>, {from,<0.42.0>,#Ref<0.0.0.88>}]
Reductions: 72745575
Stack+heap: 285082125
OldHeap: 47828850
Heap unused: 121777661
OldHeap unused: 47828850
Program counter: 0x0764c66c (io_lib_format:pad_char/2 + 4)
CP: 0x0764c1b4 (io_lib_format:collect_cseq/2 + 124)

This process was error_logger, which is from OTP/Erlang standard lib: error_logger, writing received messages to log file or tty. The typical usage is:

error_logger:info_msg("~p:~p " ++ Format, [?MODULE, ?LINE] ++ Data))

Which will format Data to a String according to the Format string, and write it to tty or log file.

he above case showed the message queue length of process "error_logger" had reached 1234053, and the Stack+heap was 285082125, about 272M size.

So the cause may be, that the message queue could not be processed in time, the messages were crowded in error_logger's process and finally caused "out of memory". The bottle-neck was that when error_logger tried to format the message to String, Erlang VM was weak on processing them, which seemed to need a lot of CPU cycles. In my previous blog, I talked about Erlang is bad on massive text processing. Erlang processes String/Text via List, which is obvious bottle-neck in Erlang now, with Erlang is getting much and much popular and more and more Erlang applications are written.

But, why this did not happen on our 2-core CPU machine? It's an interesting scalability related problem:

"error_logger" module will registered one and only one process to receive and handle all log messages. But Erlang VM's scheduler can not distribute ONE process to use multiple CPUs' computing ability. In our 2-core machine, the whole ability is about 140 transactions/second, the one process of "error_logger" just happened to have the power to handle corresponding log messages in time. Under 8-core CPUs machine, our platform scales to handle 700 transactions/second, but there is still only one process of "error_logger", which can not use 8-core CPUs' ability at all, and finally fail on it.

Erlang treats every process fairly (although you can change the priority manually), we can do a simple/quick evaluation:

  1. 2-Core machine, keeping hits at 140 trans/second:

The number of simultaneous processes will be about 200, each process shares the CPU cycles: 1/200 * 2 Core = 1%

  1. 8-Core machine, keeping hits at 700 trans/second:

The number of simultaneous processes will be about 980, each process shares the CPU cycles: 1/980 * 8 Core = 0.82%

So, the CPU cycles shared by error_logger process actually not increases. BTW, I think error_logger should cut its message queue when can not process them in time (disk IO may also be slower than receiving messages).

The Year That Will Be

It's 2009 now, in Beijing.

1==0.999999999......

I met Erlang 2 years ago, which finally brings me to Scala. I learnt a lot from Erlang, and I entered the Scala world with Erlang atmosphere surrounding me. The FP, the Pattern Match, the Actor/Process, I found these familiar friends in Scala everywhere.

Scala has extra bonus, to me, static types and OO/FP. The domains I face are usually with a lot of business logic, or, the worlds I try to describe are not only messages, they are, models I don't think are suitable to describe in Function only.

The world itself is OO/FP mixed, like Martin's quote: Two sides of coin. It's something like the Particle/Wave in Quantum. The world is an infinite whole, but the reason of Human Being is always finite, we are using out finite reason to measure the infinite world, it's an unsolvable contradiction: Infinity vs Finite. We have to read our world in OO and, in FP, in snapshot and in continuation.

There won't be "Super Hero" in computer languages, the world is getting self-organization and harmony, so do the languages. Each language is living in an eco-system, born, growing via interacting with environment, disappear ...

The Economy

It was bad in 2008. I tried to do some computing on stock market based on my neural network. What I can say is it will be swing in the next half-year, no big drop, no big rise. The Shanghai Stock Index will swing between 1200 and 3000. At least, no big worse any more.

My Self

I need to make some big decisions in this a year.

CN Erlounge III

I attended CN Erlounge III last weekend, it was a 2-day conference. I did a presentation about Scala vs Erlang.

I met Jackyz who is one of the translators of Chinese version "Programming Erlang". And Aimin who is writing a Delphi module to support Erlang c-node and c-driver in Delphi.

There is a commercial network monitoring product using Erlang from a major telecom company in China. And our Mobile-Banking platform (in Erlang) is scheduled to launch at middle of January too.

I talked with Yeka and Diuera from Broadview, a leading publisher in IT in China, they are really interested in importing "Programming in Scala" to mainland China.

And many thanks to Shiwei Xu, who is heavy working on Erlang community in China, and took the place to organize this conference.

I gave some encouragements to younger developers on learning Erlang and reading "Programming Erlang", since I'm the oldest one in attendees :-). Erlang is one of the best pragmatic and clear languages to learn concurrent/parallel and functional programming, and the book, is a very thoughtful and philosophic one on these perceptions.

And I'd like to see "Programming in Scala" also appeals in China soon, Scala is another pragmatic language on solving real world problems and, the book, is also thoughtful and philosophic one on our real world on Types, OO and FP.

Of course, choosing Scala or Erlang for your real world project should depend on the requirements.

I may be back to Vancouver next month for a while. Oh, it will be the beginning of new year.

CN Erlounge III photos by krzycube

Thinking in Scala vs Erlang

Keeping Erlang in mind, I've coded two months in Scala, I'm thinking something called "Scala vs Erlang", I wrote some benchmark code to prove me (the code and result may be available someday), and I'd like to do some gradually summary on it in practical aspect. These opinions may be or not be correct currently due to lacking of deep experience and understanding, but, anyway, I need to record them now and correct myself with more experiences and understanding got on both Scala and Erlang.

Part I. Syntax

Keeping Erlang in mind, I've coded two months in Scala, I'm thinking something called "Scala vs Erlang", I wrote some benchmark code to prove me (the code and result may be available someday), and I'd like to do some gradually summary on it in practical aspect. These opinions may be or not be correct currently due to lacking of deep experience and understanding, but, anyway, I need to record them now and correct myself with more experiences and understanding got on both Scala and Erlang.

Part I. Syntax

List comprehension

Erlang:

Lst = [1,2,3,4],
[X + 1 || X <- Lst],
lists:map(fun(X) -> X + 1 end, Lst)

Scala:

val lst = List(1,2,3,4) 
for (x <- lst) yield x + 1
lst.map{x => x + 1}
lst.map{_ + 1} // or place holder

Pattern match

Erlang:

case X of
   {A, B} when is_integer(A), A > 1 -> ok;
   _ -> error
end,

{ok, [{A, B} = H|T]} = my_function(X)

Scala:

x match {
   case (a:Int, b:_) if a > 1 => OK // can match type
   case _ => ERROR
}

val ("ok", (h@(a, b)) :: t) = my_function(x)

List, Tuple, Array, Map, Binary, Bit

Erlang:

Lst = [1, 2, 3] %% List
[0 | Lst]  %% List concat
{1, 2, 3}  %% Tuple
<<1, 2, "abc">>  %% Binary
%% no Array, Map syntax

Scala:

val lst = List(1, 2, 3)  // List
0 :: lst  // List concat
(1, 2, 3) // Tuple
Array(1, 2, 3) // Array
Map("a" -> 1, "b" -> 2) // Map
// no Binary, Bit syntax

Process, Actor

Erlang:

the_actor(X) -> 
   receive 
      ok -> io:format("~p~n", [X]);
      I -> the_actor(X + I) %% needs to explicitly continue loop
   end.
P = spawn(mymodule, the_actor, [0])
P ! 1
P ! ok

Scala I:

class TheActor(x:Int) extends Actor { 
   def act = loop {
      react {
         case "ok" => println(x); exit // needs to explicitly exit loop
         case i:Int => x += i
      }
   }
}
val a = new TheActor(0)
a ! 1
a ! "ok"

Scala II:

val a = actor { 
   def loop(x:Int) = {
      react {
         case "ok" => println(x)
         case i:Int => loop(x + i)
      }
   }
   loop(0)
}
a ! 1
a ! "ok"

Part II. Processes vs Actors

Something I

Erlang:

  • Lightweight processes
  • You can always (almost) create a new process for each new comer
  • Scheduler treats all processes fairly
  • Share nothing between processes
  • Lightweight context switch between processes
  • IO has been carefully delegated to independent processes

Scala:

  • Active actor is delegated to JVM thread, actor /= thread
  • You can create a new actor for each new comer
  • But the amount of real workers (threads) is dynamically adjusted according to the processing time
  • The later comers may be in wait list for further processing until a spare thread is available
  • Share nothing or share something upon you decision
  • Heavy context switch between working threads
  • IO block is still pain unless good NIO framework (Grizzly?)

Something II

Erlang:

  • Try to service everyone simultaneously
  • But may loss service quality when the work is heavy, may time out (out of service)
  • Ideal when processing cost is comparable to context switching cost
  • Ideal for small message processing in soft real-time
  • Bad for massive data processing, and cpu-heavy work

Scala:

  • Try to service limited number of customers best first
  • If can not service all, the later comers will be put in waiting list and may time out (out of service)
  • It's difficult for soft real-time on all coming concurrent customers
  • Ideal when processing cost is far more than context switching cost (context switch time is in μs on modern JVM)
  • When will there be perfect NIO + Actor library?