Data in a form which cannot be represented in databases are known as Unstructured/Semi-structured data. A collection of a huge set of such data which conventional software is unable to capture, manage and process in a stipulated amount of time is known as “BIG DATA”. It is not an exact term it is characterized by accumulation of exponential unstructured data. It describes data sets which are large and raw which conventional relational databases are unable to analyze.
Now ‘how much is BIG’, it is a moving target size which is increasing as the day passes. Currently in 2012 it is represented by few dozen terabytes to many petabytes of data in a single data set. We also think it also depends on the context in which it is used. For example size of sets would vary if we compare astronomical data with data collected from an online feedback.
Notwithstanding the fact that the data itself is overwhelming, the magnitude and complexity of extracting information out of it and making sense of it is “big” too. Scientists all round the world are looking for answers to solve these complexities. The best example of that is http://amplab.cs.berkeley.edu/.
Asked In: Many Interviews |