Hadoop is an open source project from Apache that has become into a major technology movement. It has emerged as the best option to handle the Big data, including not only structured data but also complex, unstructured data as well. It has become popular because of its ability to store and process large amounts of data, quickly and cost effectively across clusters of commodity hardware
With Hadoop, organizations are discovering and putting into practice new data analysis and mining techniques that were previously impractical for performance, cost, and technological reasons.
Advanatage of Hadoop is cost-effective scalability to control hardware. It provides support for the processing of all data types – whether structured, semi-structured or unstructured – and the open extensibility of Hadoop
Apache Hadoop is actually a collection of several components including the following:
- Hadoop Distributed File System (HDFS)