Using Hadoop for Real-time Business Intelligence



Adopting Big Data as the repository and query platform is in sync with Triniti's philosophy of providing actionable insights in real-time. We, in the past, have picked materialized views over traditional cubes to avoid latency using LEAN. Continuing our approach to modernization of business intelligence, we now offer Hadoop ecosystem consulting services to modernize Data warehouses. The Hadoop ecosystem provides low-cost, highly scalable data lakes as an alternative. We use a 6-step approach:

  1. Discovery - We review your hardware and software landscape along with how users use the data warehouse. We also :

    • Present an end-state

    • Identify gaps in business processes and data that impede achieving the end-state

    • Define success criteria

  2. Roadmap - We translate the strategic output of step1 to a tactical and operational action-plan for business intelligence modernization. Triniti will identify components in the Hadoop, Apache, and open-source ecosystems such as Sqoop, Airflow, Ambari, Presto. It will start with the mandatory starter pack and present the final version with all the bells and whistles that migrate all your reporting and analytics

  3. Proof of Concept - Bringing the concept to life with rapid prototyping, we do a POC. It helps visualize data sources, data flows, performance benchmarks, and insight into things to come

  4. The Beachhead - Based on the inputs from POC, making course corrections as required, we implement the beachhead identified in the POC. It will reinforce to the organization the benefits of modernizing the BI infrastructure

  5. Roll-out - We then use the organizations' lessons learned during the initial implementation to fine-tune and roll-out to the rest of the BI domains

  6. Continuous Improvement -  As technology evolves and business requirements change, we enable you to become an agile enterprise. One that is proactive, not reactive, and is a driver of growth and success.  It goes beyond business insight.  E-Commerce and media giants such as Amazon and Google, social networking sites such as Facebook, Twitter, and Linked-in, shared economy organizations such as Uber and Airbnb are already there. What is stopping you? Call 866-531-9587 or fill out the contact form.


Modernize Your Business Intelligence with Big Data

Get the imposter out!

Let's face it. It is time to say goodbye to the white elephant. Perhaps as you read more, you will appreciate the analogy. In case you don't, it is at the bottom. The monolithic, static, money-guzzling reporting infrastructure, often masquerading as "Business Intelligence," is finally replaceable. I am referring to what we commonly refer to as data warehousing. OLAP, in hindsight, was a workaround for poor reporting options with traditional relational databases that primarily optimize and maintain the integrity of business transactions. It was a 'bad patch.'
 

Seismic shift

So what makes this shift possible? While there are many aspects of the technology, we call "Big Data," two developments stand out.
Firstly, the advent of analytical SQL engines, such as Presto, Impala, BigQuery, SPARKsql, and Trino (a Presto Fork).  
Secondly, optimized columnar and row storage file formats, such as Parquet, ORC, and Avro.  
Finally, HADOOP's scalability, performance, and low cost are now available for enterprise business intelligence and eliminate data warehouses.
Together, they have made it possible to connect to multiple databases with different data formats, making it easy to eliminate OLAP storage options (onerous cubes, materialized views, star schemas, and denormalized tables).
 

So what was the problem with the old ways?

  1. Performance - SQL queries with computational functions such as sum, count, and average that require scanning a large set of data have poor response times (ROLAP) and are resource-intensive. It gave birth to OLAP cubes (MOLAP), which allowed you to store pre-computed values

  2. Latency - All pre-computational storage formats - cubes, star schema tables, materialized views, are built on snapshots and have inherent latency in building these data stores. At best, they were suitable for a day-old-information euphemistically labeled "Daily Intelligence"

  3. ETL - Data warehousing required expensive ETL to take data from diverse transactional sources such as ERP, CRM, MES, HCM, and SCM and bring it together in a single database. In theory, this seemed like a good idea. In practice, however, it is expensive to build and maintain. In many cases, it is very fragile and is prone to error

  4. Hardware and Software Cost - Despite its design to solve the performance problem, OLAP systems require significant tuning. When hardware is the constraint, the only way to improve it is to scale vertically and add expensive and specialized hardware. Besides, software vendors exploited this by tying their licensing to hardware performance (e.g., cores). Horizontal scaling with cheap hardware is not an option
     

A new age

With the advent of the afore-mentioned Analytical SQL query engines and storage formats, you can:

  1. Eliminate expensive transformation from ETL

  2. Eliminate latency and provide real-time reporting and analytics

  3. Scale horizontally with commodity hardware

  4. Provide very high performance with Terabytes of raw, untransformed data.

  5. Use open-source software

P.S. In case you have not already figured it out, the mascot for Hadoop is Horton, as in "Horton hears a who".

Call 866-531-9587 / Fill out the contact form.