Let's face it. It is time to say goodbye to the white elephant. Perhaps as you read more, you will appreciate the analogy. In case you don't, it is at the bottom. The monolithic, static, money-guzzling reporting infrastructure, often masquerading as "Business Intelligence," is finally replaceable. I am referring to what we commonly refer to as data warehousing. OLAP, in hindsight, was a workaround for poor reporting options with traditional relational databases that primarily optimize and maintain the integrity of business transactions. It was a 'bad patch.'
So what makes this shift possible? While there are many aspects of the technology, we call "Big Data," two developments stand out.
Firstly, the advent of analytical SQL engines, such as Presto, Impala, BigQuery, SPARKsql, and Trino (a Presto Fork).
Secondly, optimized columnar and row storage file formats, such as Parquet, ORC, and Avro.
Finally, HADOOP's scalability, performance, and low cost are now available for enterprise business intelligence and eliminate data warehouses.
Together, they have made it possible to connect to multiple databases with different data formats, making it easy to eliminate OLAP storage options (onerous cubes, materialized views, star schemas, and denormalized tables).
Performance - SQL queries with computational functions such as sum, count, and average that require scanning a large set of data have poor response times (ROLAP) and are resource-intensive. It gave birth to OLAP cubes (MOLAP), which allowed you to store pre-computed values
Latency - All pre-computational storage formats - cubes, star schema tables, materialized views, are built on snapshots and have inherent latency in building these data stores. At best, they were suitable for a day-old-information euphemistically labeled "Daily Intelligence"
ETL - Data warehousing required expensive ETL to take data from diverse transactional sources such as ERP, CRM, MES, HCM, and SCM and bring it together in a single database. In theory, this seemed like a good idea. In practice, however, it is expensive to build and maintain. In many cases, it is very fragile and is prone to error
Hardware and Software Cost - Despite its design to solve the performance problem, OLAP systems require significant tuning. When hardware is the constraint, the only way to improve it is to scale vertically and add expensive and specialized hardware. Besides, software vendors exploited this by tying their licensing to hardware performance (e.g., cores). Horizontal scaling with cheap hardware is not an option
With the advent of the afore-mentioned Analytical SQL query engines and storage formats, you can:
Eliminate expensive transformation from ETL
Eliminate latency and provide real-time reporting and analytics
Scale horizontally with commodity hardware
Provide very high performance with Terabytes of raw, untransformed data.
Use open-source software
P.S. In case you have not already figured it out, the mascot for Hadoop is Horton, as in "Horton hears a who".