One of the underlying characteristics of big data is the diversity of the data sources which results in various data types. Data is not perfectly ordered and ready for processing. One of the biggest challenges of processing is extracting meaningful information out of it. Here is the role of VARIETY in big data. The first technique used is the SQL-NoSQL integration .The integration of the relational and non-relational world provides the most powerful analytics by bringing together the best of both also it provides storage solutions for various data types. Linked data, semantics are two techniques which have gained some popularity too. NLP plays a role too in Entity Extraction. Statistics plays a big role in flattening out and extracting data sets. The open source statistical language R provides great integration points for several tools and solutions for big data. Apache projects also have a couple of solutions which craters to this space, which along with a couple of proprietary technologies are currently used to solve the problems of variety.
Asked In: Many Interviews |