Tip de rendimiento: reutilizar la JVM entre tareas Map

En un e-mail de la lista core-user de hadoop, alguien preguntó lo siguiente:

Subject: Can I share datas for several map tasks?
Hi,
I want to share some data structures for the map tasks on a same node(not through files), I mean, if one map task has already initialized some data structures (e.g. an array or a list), can other map tasks share these memorys and directly access them, for I don’t want to reinitialize these datas and I want to save some memory. Can hadoop help me do this?

Eason.Lee sugirió:

I think you can just define the data structures in your map classinit it in
setup(Context context) and use it in your map method
hope it is helpful!

Pero si lo que se quiere es que los mappers que se levanten en el mismo nodo re-utilicen la estructura de datos creada por el primer Map task levantado en ese nodo, entonces la solución—planteada por Sharad Agarwal de Yahoo!—es re-utilizar la JVM:

You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks in mapred-default.xml for usage. Then you can cache the data in a static variable in your mapper.

Tags: , ,

Leave a Reply