What is Big data!! It’s nothing but a collection of complex data which is difficult to process with the existing tools. The size of data ranges forms few dozen terabytes to many petabytes of data in a single data set. This data can be posts to social media sites, digital pictures and videos or any other information
Apache Hadoop is a very popular solution Big Data .For the storage of Big data we use different kind of storage like S3, Hadoop Distributed File System (HDFS)
- Amazon S3 filesystem.
What is S3??
Amazon S3 (Simple Storage Service) is an online file storage web service (Internet hosting service specifically designed to host user files.) offered by Amazon Web Services. Apache Hadoop file systems can be hosted on S3, also Tumblr, Formspring, Pinterest, and Posterous images are hosted on the S3 servers.
S3 stores arbitrary objects (computer files,) up to 5 terabytes in size. This are stored in the form of buckets. It can store data from web applications to media files and we can retrieve from anywhere in Web.
- Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is a portable file system built for the Hadoop framework.
HDFS is to store very large amount of data by sharing the storage and computation across many servers. HDFS stores large files with ideal file size is a multiple of 64 MB
Below are some of the organizations that are using Hadoop.
- Google and many more companies…