Columnar databases have been traditionally developed with horizontal scalability
as a primary design goal. As such, they’re particularly suited to “Big
Data” problems, living on clusters of tens, hundreds, or thousands of nodes.
They also tend to have built-in support for features such as compression and
versioning. The canonical example of a good columnar data storage problem
is indexing web pages. Pages on the Web are highly textual (benefits from
compression), somewhat interrelated, and change over time (benefits from
Different columnar databases have different features and therefore different
drawbacks. But one thing they have in common is that it’s best to design
your schema based on how you plan to query the data. This means you should
have some idea in advance of how your data will be used, not just what it’ll
consist of. If data usage patterns can’t be defined in advance—for example,
fast adhoc reporting—then a columnar database may not be the best fit.
Asked In: Many Interviews |