{"id":171,"date":"2009-05-25T10:42:22","date_gmt":"2009-05-25T15:42:22","guid":{"rendered":"http:\/\/blog.espol.edu.ec\/hadoop\/?p=171"},"modified":"2009-05-25T12:03:18","modified_gmt":"2009-05-25T17:03:18","slug":"datasets-disponibles-en-la-web","status":"publish","type":"post","link":"https:\/\/blog.espol.edu.ec\/hadoop\/2009\/05\/25\/datasets-disponibles-en-la-web\/","title":{"rendered":"Datasets disponibles en la Web"},"content":{"rendered":"<p>En clase surgi\u00f3 la pregunta de qu\u00e9 datasets hay disponibles de manera gratuita para procesar. Hay una lista muy extensa disponible en <a href=\"http:\/\/www.datawrangling.com\/some-datasets-available-on-the-web\">datawrangling.com<\/a>, la cual puede servir de punto de partida para ideas de proyectos de procesamiento masivo de datos. La lista es realmente grande, pero vale la pena analizarla ya que hay datasets interesantes, como por ejemplo el de\u00a0<a href=\"http:\/\/www-etud.iro.umontreal.ca\/~bergstrj\/audioscrobbler_data.html\">Audioscrobbler<\/a>, que puede ser usado para sistemas de recomendaciones de m\u00fasica.<\/p>\n<p>En otra entrada en el mismo blog,\u00a0Peter Skomoroch escribe:<\/p>\n<blockquote><p><span style=\"color: #808080\">So what can you do with Elastic MapReduce? Here are a few initial ideas:<\/span><\/p>\n<ul>\n<li><span style=\"color: #808080\">Offload background processing from your Rails or Django app to Hadoop by sending the ElasticMapReduce API job requests pointing to data stored on S3: convert PDFs, classify spam, deduplicate records, batch geocoding, etc.<\/span><\/li>\n<li><span style=\"color: #808080\">Process large amounts of retail sales and inventory transaction data for sales forecasting and optimization<\/span><\/li>\n<li><span style=\"color: #808080\">Use the AddJobFlowSteps method in the API to run iterative machine learning algorithms using MapReduce on a remote Hadoop cluster and shut it down when your results converge to an answer<\/span><\/li>\n<\/ul>\n<p><span style=\"color: #808080\">I\u2019ll post more on this later today - including a detailed explanation of using Netflix Prize data in the code example and some next steps for using Elastic MapReduce.<\/span><\/p><\/blockquote>\n<p><span style=\"color: #000000\">Me parece que lo publicado en ese blog es de inter\u00e9s para la materia, as\u00ed que lo he a\u00f1adido al costado de esta p\u00e1gina para poder seguirlo con facilidad.\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>En clase surgi\u00f3 la pregunta de qu\u00e9 datasets hay disponibles de manera gratuita para procesar. Hay una lista muy extensa disponible en datawrangling.com, la cual puede servir de punto de partida para ideas de proyectos de procesamiento masivo de datos. La lista es realmente grande, pero vale la pena analizarla ya que hay datasets interesantes, [&hellip;]<\/p>\n","protected":false},"author":1510,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[945,852,6],"tags":[6553],"class_list":["post-171","post","type-post","status-publish","format-standard","hentry","category-desarrollo","category-educacion","category-espol","tag-datasets"],"_links":{"self":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/171","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/users\/1510"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/comments?post=171"}],"version-history":[{"count":7,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/171\/revisions"}],"predecessor-version":[{"id":173,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/posts\/171\/revisions\/173"}],"wp:attachment":[{"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/media?parent=171"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/categories?post=171"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/hadoop\/wp-json\/wp\/v2\/tags?post=171"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}