Querying complex data : SQL vs Cypher example

thumbnail

on June 29, 2021 • Back to Blog index

Graph databases have implemented specific query languages which better match the data representation of a graph. Neo4j has invented its own query language "Cypher" which was standardized then into OpenCypher.

Our use-case

For some time now, we have a slide in our main presentations representing a specific graph problem for which we explain the power of graphdb.

image-20210628104454884

The use case is the following. You have systems, and each system has a bill of materials made of parts. At any depth in this bill of material we want to know what is the top assembly part consumed in a system which contains the part we are looking at. In the slide, if we visit Part I, the answer should be part A and not D as it is not used in a system. If we visit part H it gives the same result. If we visit part G, the result would be B and A.

We implemented it with SQL server by creating a datamodel containing 4 tables:

  • System
  • SystemPart (relationship table between system and part)
  • Part
  • PartBom (relationship table between part and part to create a bill of material)

SQL Query

WITH RecursiveBOM(related_id,source_id,indent_level) AS
(
    SELECT parentPart_id,childPart_id, 0
    FROM  dbo.bom pb
    WHERE parentPart_id = 6
    UNION ALL
    SELECT pb.parentPart_id,pb.childPart_id, indent_level +1
    FROM dbo.bom  pb
    INNER JOIN RecursiveBOM rb ON rb.source_id = pb.parentPart_id
)

SELECT distinct(rb.source_id), rb.related_id, rb.indent_level
FROM RecursiveBOM rb
INNER JOIN dbo.systempart sp ON sp.childPart_id = rb.source_id 

Cypher Query

MATCH (sp:part{ref: 6})<-[:consumes..*]-(tp:part)<-[:has]-(s:system)
RETURN tp

For the graph database, we did not have to specify a schema in advance, we just built nodes and edges out of the blue.

Conclusion

We did not compare performances because it is very complicated for us to make sure both test are done in the same condition. With much larger datasets, comparing cost of infrastructure and speed of queries could give a better comparison result. What was really important for us is the ease of writing cypher queries. Such queries are very common in PLM and you don't need to be an expert to write the cypher query. Remember, shorter queries are easier to maintain.