Here’s an image for you. There is no such thing as a data lake. The multi-petabyte storage racks nearly overflowing with unstructured and semi-structured data that are being built by hyperscalers, enterprises, and governments can probably be best described as a vast data lava lamp, with different kinds of data rising and falling as it warms and then cools.
Systems of record – you know, boring ERP, supply chain, customer relationship, and other systems – sit off to the side, with relatively small amounts of white-hot data that needs to be correlated with this larger pool of churning data. The …
Making Spark and Hadoop Run SQL Better And Faster was written by Timothy Prickett Morgan at The Next Platform.