Hadoop – Pig (Intro)


Apache Hadoop Logo


Well, after many years of producing a lot of information everywhere, it is necessary to create some sort of processing massive data for customers. But that’s not a easy task, actually it depends on feasible resources to use in a short time (time is precious, time implies money).

The most common storage and processing tools are the RDBMS, been honest you need some special abilities in order to translate some user requirements into SQL statements! that another story and another post.

But, by the time data sets has grown exponentially and only few RDBMS can reach same processing level requirements, but there some problems one of them: simplicity and storage (again space = speed * time, time is money, so space is costly!)

Fortunately, there are some inexpensive alternatives, one of them provided by Apache Software Foundation, called Apache Hadoop.
Hadoop is an open-source project started by Doug Cutting, based on some papers published by Google, on which were described which kind of methodology was used to deal with storing and processing all their big amount of data.

After Hadoop has been consolidated on open source communities, was necessary to create some upper layer tools, such Apache Pig, that provides a higher level of abstraction for data users, without writing long data-processing applications in low-level (Java).





Enhanced by Zemanta
Esta entrada fue publicada en Hadoop, Varios y etiquetada , , , , . Guarda el enlace permanente.

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *