Data-lakes are very trendy, more and more prospects tell us about their data-lakes when we ask about their data management processes. It usually contains data from various departments which are then used to run analytics. In PLM, analytics have always been a complicated piece of work. We've heard many PLM projects where it was initially said that analytics would be done by Business Intelligence (BI) specialists but then never really reached this point. The issue is mainly about how much can you understand PLM data when you are provided with a database. (In most cases they end-up developing custom dashboards connected to PLM)
The data-lake concept triggered a lot of hype, mainly through business leaders who bought the promise to value their data (because data is Gold) by gathering them in a single place. Snowflake a data-lake leader published "Cloud data-lakes for dummies". There is an interesting paragraph about why it had failed so much in the past.
Unfortunately, many of these on-premises data-lake projects failed to fulfill the promise of data-lake computing, thanks to burdensome complexity, slow time to value, and heavy system management efforts. The inherent complexities of a distributed architecture and the need for custom coding for data transformation and integration, mainly handled by highly skilled data engineers, made it difficult to derive > useful analytics and contributed to Hadoop’s demise. In fact, some estimates place the failure rate for Hadoop data-lake projects as high as 70 percent.
Kayla Matthews for InformationAge also points out 5 key mistakes that could transform a data-lake into a data-swamp.
The main reasons why your data-lake is at risk are the following :
- Missing good metadata
- Data is not always relevant
- Lack of data governances (always a problem in PLM projects)
- not enough automated process to maintain the data
- missing data cleaning
Understanding the Data
Technology is getting better and the lessons from the past years have been learnt. But it doesn't solve the major reason why PLM is hard to plug into existing BI systems: Understanding the PLM data. One of the core concepts of Ganister, as we rely on a graph database and we believe that a company can describe their PLM data and processes with a graph, is the following:
You should understand your PLM with just the database.
We are an enabler for creating and maintaining PLM data to make it last. If you take a snapshot of your graph databases you end-up with knowledge base which you can easily query.
Road to Data Wisdom
The DIKW pyramid is an interesting description of the data usage stages.
The US army enriched this pyramid to previse what each level is about:
In the graph world we have an interesting way of explaing DIKW:
Don't Fish for Intellectual Property
Once again, the PLM is your brain, connections have more values than the data itself. It's not just a matter of data management, it's knowledge management. Make sure you've got your knowledge graph right before going fishing on a data-lake.