{"id":286,"date":"2009-11-16T08:33:14","date_gmt":"2009-11-16T13:33:14","guid":{"rendered":"http:\/\/blog.espol.edu.ec\/hadoop\/?p=286"},"modified":"2009-11-18T09:29:01","modified_gmt":"2009-11-18T14:29:01","slug":"reduce-empezando-antes-que-termine-map","status":"publish","type":"post","link":"https:\/\/blog.espol.edu.ec\/hadoop\/2009\/11\/16\/reduce-empezando-antes-que-termine-map\/","title":{"rendered":"Reduce empezando antes que termine Map"},"content":{"rendered":"<p>En los gr\u00e1ficos que ilustran las implementaciones MapReduce podemos ver una \"barrera\" entre la fase Map y la Reduce. Una \"barrera\" es un mecanismo de sincronizaci\u00f3n entre procesos que espera a que todos los procesos de un lado de la barrera terminen antes que empiecen los procesos del otro lado. En este caso, eso significa que la fase Map debe terminar antes que empiece la fase Reduce. Este comportamiento parece no verse plasmado en Hadoop, en donde podemos ver que la fase Reduce empieza antes que terminen los Maps. Por ejemplo:<\/p>\n<pre>09\/11\/14 10:58:50 INFO mapred.JobClient:\u00a0map 79% reduce 18%\r\n09\/11\/14 10:58:54 INFO mapred.JobClient:\u00a0map 79% reduce 19%\r\n09\/11\/14 10:58:55 INFO mapred.JobClient:\u00a0map 80% reduce 19%\r\n09\/11\/14 10:58:58 INFO mapred.JobClient:\u00a0map 80% reduce 20%\r\n09\/11\/14 10:59:00 INFO mapred.JobClient:\u00a0map 81% reduce 20%\r\n09\/11\/14 10:59:04 INFO mapred.JobClient:\u00a0map 82% reduce 20%\r\n09\/11\/14 10:59:05 INFO mapred.JobClient:\u00a0map 82% reduce 21%\r\n09\/11\/14 10:59:08 INFO mapred.JobClient:\u00a0map 82% reduce 22%<\/pre>\n<p>Recientemente, en un <a href=\"http:\/\/mail-archives.apache.org\/mod_mbox\/hadoop-common-user\/200911.mbox\/%3c80049.37097.qm@web54206.mail.re2.yahoo.com%3e\">thread en la lista common-user de Hadoop<\/a> se justific\u00f3 muy bien este comportamiento.<\/p>\n<blockquote><p><a href=\"http:\/\/twitter.com\/dehowell\">David Howell<\/a> dijo: \"The first 2\/3 of the reduce phase (as reported by the progress meters) are all about getting the map results from the map tasktracker to the reduce tasktracker and sorting them. The real reduce happens in the last third, and that part won't start until all of the maps are done.\u00a0\"<\/p>\n<p><a href=\"http:\/\/twitter.com\/kevinWeil\">Kevin Weil<\/a> (l\u00edder del equipo de Analytics de Twitter)\u00a0dijo: \"The first third of the reduce phase is really the shuffle, where map outputs get sent to and collected at their respective refucers. You'll see this transfer happening, and the \"reduce\" creeping up towards 33%, towards the end of your map phase.\u00a0 The 33% mark is where the real barrier is.\"<\/p><\/blockquote>\n<p>Lo que quiere decir que la fase Reduce no empieza realmente hasta que termina la fase Map. Ese porcentaje de \"Reduce\" que se ve avanzando en paralelo con los Maps, en realidad es la llamada fase de \"Shuffle and Sort\" que copia y ordena los valores intermedios generados por los Mappers antes de que puedan ser procesados por los Reducers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>En los gr\u00e1ficos que ilustran las implementaciones MapReduce podemos ver una \"barrera\" entre la fase Map y la Reduce. Una \"barrera\" es un mecanismo de sincronizaci\u00f3n entre procesos que espera a que todos los procesos de un lado de la barrera terminen antes que empiecen los procesos del otro lado. En este caso, eso significa [&hellip;]<\/p>\n","protected":false},"author":1510,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[945],"tags":[2860],"class_list":["post-286","post","type-post","status-publish","format-standard","hentry","category-desarrollo","tag-hadoop"],"_links":{"self":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/286","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/users\/1510"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/comments?post=286"}],"version-history":[{"count":5,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/286\/revisions"}],"predecessor-version":[{"id":288,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/286\/revisions\/288"}],"wp:attachment":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/media?parent=286"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/categories?post=286"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/tags?post=286"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}