Using NEO4J To Find Paths To All Parent Or Child Objects


If you don’t know already, there are many layers, upon layers of objects in Oracle. As I’ve wrote before a typical scenario could be:

TYPE
  TABLE
    VIEW
      FUNCTION
        PROCEDURE

When you make your own Oracle schemas, you also create many layers of objects, on top of other objects.

One classic problem I had as a DBA in, shall we say, “fast paced” environments, was that all the developers were making Oracle objects in a trial and error manner. This especially occured when their main skill set was not Oracle, but say, java. In these environments, it would not be unusual for a dozen objects to suddenly need to be moved into the test or production environment.

If you did not keep track of the objects, it would be a difficult time to figure out the exact order of operations to create them. If they were not done in the correct order, they would not compile or be created successfully.

SQL and RDBMS Do Not Represent Trees Well:

It’s partially because of this classic problem, that I did so much work creating SQL scripts to determine the order of operations. As I mentioned in my other posts,

https://rodgersnotes.wordpress.com/2012/01/05/finding-all-generations-of-an-objects-children/
https://rodgersnotes.wordpress.com/2011/12/27/scripts-to-find-object-dependencies/
https://rodgersnotes.wordpress.com/2011/12/29/the-parents-and-the-order-of-operations/

the scripts were never perfect. Much of the reason being that, the RDBMS objects are actually created in a tree structure. However, SQL returns information in rows and columns, a structure that is clearly not a tree.

It was not unusual to see the same object returned multiple times in the result set. An example might be an error log procedure that was used by every other procedure. If you didn’t know this, what would be the correct order of operations to create this object and the others?

For more on this classic problem, see this great slide presentation by
Lorenzo Alberton: Trees In The Database,
http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures

It covers a number of attempts to represent trees in RDBMS/SQL. However, even with 128 slides, there is no clear or simple solution.

Using Cypher To Find All The Child Paths Of An Object:

After loading all DBA_OBJECTS into the NEO4J graph database, finding all the parents or children becomes pretty easy. First, start with an object.

start p = node:node_auto_index ( object_id = ‘3192’  )
return p

+————————————————————————————–+
| p                                                                                    |
+————————————————————————————–+
| Node[77758]{owner:”SYS”,object_name:”DBA_OBJECTS”,object_type:”VIEW”,object_id:3192} |
+————————————————————————————–+

Then run a 6 line Cypher query.  

start p = node:node_auto_index ( object_id = ‘3192’  )
MATCH path =  p – [ : PARENT_OF* ] -> ; c
return  distinct
length ( path )  AS PATH_LENGTH
, extract ( n in nodes ( path ) : n.object_id ) as the_path
order by length ( path )

Note: The phrase “PARENT_OF”, representing the type of relationship, is directly preceded by a colon. However, WordPress keeps converting it into a big smilie face. Surprise!

+—————————————+
| PATH_LENGTH | the_path                |
+—————————————+
| 1           | [3192,8575]             |
| 1           | [3192,73104]            |
| 1           | [3192,73098]            |
| 1           | [3192,73086]            |
| 1           | [3192,73081]            |
| 1           | [3192,73079]            |

| 2           | [3192,3193,13460]       |
| 2           | [3192,3193,13458]       |
| 2           | [3192,3193,13453]       |
| 3           | [3192,8575,12128,12131] |
| 3           | [3192,8575,12128,12130] |
| 3           | [3192,8575,12128,12129] |
+—————————————+
62 rows

For those familiar with the Oracle data dictionary, these final objects are:

8575:    PACKAGE         SYS.DBMS_COMPRESSION
12128:   PACKAGE         SYS.PRVT_COMPRESSION
12129:   PACKAGE BODY    SYS.PRVT_COMPRESSION
12130:   PACKAGE BODY    SYS.DBMS_COMPRESSION
12131:   TYPE BODY       SYS.WRI$_ADV_COMPRESSION_T

So, if SYS.DBA_OBJECTS was dropped or became invalid, many other objects would also become invalidated.

As you can see, the Packge SYS.DBMS_COMPRESSION is counted on both the first row, and multiple times in the final rows. In this case, the number of rows, 62, does not represent the correct number of distinct children of the object.

Using Cypher to Find All The Parent Paths Of An Object:

After copying all the DBA_OBJECTS from my database into NEO4J, there are 48,692 nodes, and 61,716 relationships. As you can imagine, there would be a lot of paths through the graph. And the length of the paths could get very big.

Finding all the parents of an object is similar to finding all the children. Using the View, SYS.KU$_M_VIEW_FH_VIEW,

start c = node:node_auto_index ( object_id = ‘10179’  )
return c

+———————————————————————————————-+
| c                                                                                            |
+———————————————————————————————-+
| Node[79870]{owner:”SYS”,object_FH_VIEW”,object_type:”VIEW”,object_id:10179} |
+———————————————————————————————-+

—-

start c = node:node_auto_index ( object_id = ‘10179’  )
MATCH path =  c <- [ : PARENT_OF* ] – p
return  distinct
length ( path )  AS PATH_LENGTH
, extract ( n in nodes ( path ) : n.object_id ) as the_path
order by length ( path )

+—————————————————————————————————————–+
| PATH_LENGTH | the_path                                                                                          |
+—————————————————————————————————————–+
| 1           | [10179,9906]                                                                                      |
| 1           | [10179,9905]                                                                                      |
| 1           | [10179,9887]                                                                                      |
| 1           | [10179,9885]                                                                                      |
| 1           | [10179,9883]                                                                                      |
| 1           | [10179,9878]                                                                                      |
| 1           | [10179,9868]                                                                                      |

| 7           | [10179,10169,9744,9743,9724,9673,9672,1219]                                                       |
| 7           | [10179,10169,9744,9743,9617,9616,9608,4908]                                                       |
| 7           | [10179,10169,9744,9743,9617,9616,9608,1219]                                                       |
| 7           | [10179,10169,10166,10165,10164,10163,10162,1219]                                                  |

| 8           | [10179,9906,9905,9888,9887,9886,9885,9884,9883]                                                   |
| 8           | [10179,9906,9905,9888,9887,9886,9885,9884,1219]                                                   |
| 8           | [10179,9906,9905,9888,9887,9886,9883,9882,1219]                                                   |
| 8           | [10179,9906,9905,9872,9609,9608,4908,4902,4900]                                                   |
| 8           | [10179,9906,9905,9872,9609,9608,4908,4902,4896]                                                   |

| 17          | [10179,9906,9870,9862,9763,9747,9745,9725,9674,9547,8243,8207,8205,8203,8201,8199,8197,1219]      |
| 17          | [10179,10169,9905,9869,9868,9867,9864,9758,9757,9743,9724,9705,9704,9608,4908,4902,4900,4895]     |
| 17          | [10179,10169,9905,9869,9868,9867,9864,9758,9757,9743,9724,9705,9704,9608,4908,4902,4900,1219]     |
| 17          | [10179,10169,9905,9869,9868,9867,9861,9758,9757,9743,9724,9705,9704,9608,4908,4902,4900,4895]     |
| 17          | [10179,10169,9905,9869,9868,9867,9861,9758,9757,9743,9724,9705,9704,9608,4908,4902,4900,1219]     |

| 18          | [10179,9906,9870,9865,9763,9747,9745,9725,9702,9698,9547,8243,8207,8205,8203,8201,8199,8197,1219] |
| 18          | [10179,9906,9870,9862,9763,9747,9745,9725,9702,9698,9547,8243,8207,8205,8203,8201,8199,8197,1219] |
+—————————————————————————————————————–+
29,533 rows

9660 ms (just under 10 seconds)

What Does This Query Mean?

There are many parents upon parent objects to the view, SYS.KU$_M_VIEW_FH_VIEW. In two cases, there are 18 layers of parents!

As with the other query, the number of rows does not represent the distinct number of parents. Node 9906, is found on the first row, and the last, so it would be double counted.

Sanity check:  there are only 48,692 nodes in the entire graph. So it’s doubtful that there are 29,533 (60%) distinct parents to this object.

What this does mean, is that with the many parent objects, there are over 29K distinct paths back to all the root objects.

More Distinct Paths Than Nodes Or Relationships:

Note that many paths end with the object_id: 1219. This object is actually: PACKAGE SYS.STANDARD. It’s interesting to note that PACKAGE SYS.STANDARD, by itself, actually has 7,128 immediate children! That is, 7,128 objects refer directly to SYS.STANDARD.

If all the paths, to all the children of PACKAGE SYS.STANDARD are counted, there are 543,827 distinct paths.  Wow!

start p = node:node_auto_index ( object_id = ‘1219’  )
MATCH path =  p – [ : PARENT_OF* ] -> c
return count (   distinct
extract ( n in nodes ( path ) : n.object_id )
)
as the_path

+———-+
| the_path |
+———-+
| 543827   |
+———-+

Again as a sanity check, there are only 48,692 nodes, and 61,716 relationships in the whole graph database. Obviously, 543,827 is multiples more than the 48,692 nodes in the database. The same nodes are being counted multiple times through the many distinct paths.

Cartesian Products:

These results are reminding me of cartesian products in RDBMS.

Do check out the PDF presentation that I did on cartesian products a few years ago. They are a very critical issue in SQL, but I’ve found often completely unknown to even experienced database professionals.

https://rodgersnotes.wordpress.com/2010/09/15/stamping-out-cartesian-products/

The Order Of Operations:

So, given that you need to recreate the object SYS.KU$_M_VIEW_FH_VIEW, which objects would you need to create first?

Then, after getting the distinct set of objects, what would be the correct order of operations to create them?  🙂

Next Time:

Cypher queries to get a more accurate result set of the parents and children.  Stay tuned.

3 Responses to Using NEO4J To Find Paths To All Parent Or Child Objects

  1. Hey Rodger, could you reformat your queries and results? Or put them into gists ? Would be much easier to read them that way! WordPress also has a plugin that you can use for that.

  2. Hi Roger! The query doesn’t seem to work. Can you re-format this?

  3. rodgersnotes says:

    These queries were done with NEO4J version 1 or 2. And the latest NEO4J is version 3. I’ve been very busy with other things recently, so I haven’t much time for the blog. Hopefully I’ll revisit them soon. Thanks, R.

Leave a comment