{"id":88,"date":"2009-05-12T20:04:33","date_gmt":"2009-05-13T01:04:33","guid":{"rendered":"http:\/\/blog.espol.edu.ec\/hadoop\/?p=88"},"modified":"2009-05-12T20:06:44","modified_gmt":"2009-05-13T01:06:44","slug":"usando-los-scripts-de-cloudera-para-procesar-la-wikipedia","status":"publish","type":"post","link":"https:\/\/blog.espol.edu.ec\/hadoop\/2009\/05\/12\/usando-los-scripts-de-cloudera-para-procesar-la-wikipedia\/","title":{"rendered":"Usando los scripts de Cloudera para procesar la Wikipedia"},"content":{"rendered":"<p>Hay una <a href=\"http:\/\/www.cloudera.com\/blog\/2009\/05\/11\/using-clouderas-hadoop-amis-to-process-ebs-datasets-on-ec2\/\">entrada muy detallada en el blog de Cloudera<\/a>\u00a0que demuestra paso a paso como usar los scripts de Cloudera para procesar la Wikipedia, usando Hadoop corriendo en EC2 y una versi\u00f3n separada por tabs (TSV) de la Wikipedia que est\u00e1 disponible de manera gratuita en S3. La informaci\u00f3n est\u00e1 detallada a manera de tutorial, y nos ser\u00e1 muy \u00fatil en la materia de graduaci\u00f3n; sobre todo porque 3 grupos trabajar\u00e1n procesando la Wikipedia (pistas: \u00bfqu\u00e9 se sabe de Ecuador en la Wikipedia?, \u00bfqu\u00e9 se sabe de Guayaquil en la Wikipedia?, WikiGrep... m\u00e1s detalles el primer d\u00eda de clases).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hay una entrada muy detallada en el blog de Cloudera\u00a0que demuestra paso a paso como usar los scripts de Cloudera para procesar la Wikipedia, usando Hadoop corriendo en EC2 y una versi\u00f3n separada por tabs (TSV) de la Wikipedia que est\u00e1 disponible de manera gratuita en S3. La informaci\u00f3n est\u00e1 detallada a manera de tutorial, [&hellip;]<\/p>\n","protected":false},"author":1510,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[945,852,6],"tags":[6047,6120,6163,2860,6232,110],"class_list":["post-88","post","type-post","status-publish","format-standard","hentry","category-desarrollo","category-educacion","category-espol","tag-aws","tag-cloudera","tag-ec2","tag-hadoop","tag-s3","tag-wikipedia"],"_links":{"self":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/88","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/users\/1510"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/comments?post=88"}],"version-history":[{"count":4,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/88\/revisions"}],"predecessor-version":[{"id":92,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/88\/revisions\/92"}],"wp:attachment":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/media?parent=88"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/categories?post=88"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/tags?post=88"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}