Recently, we were working on a group project in my Data Science Primer Course. A dataset scraped from iFlix.com is given in an Excel spreadsheet, tabbed with the category name in each sheet. My teammate managed to transform all these sheets into a single master dataset and export as .csv file (his article). I copied his first few sentenses to start off my post... thanks bro! My task is to import the .csv file into Neo4j and visualize it using Neo4j to help data story telling.

iFlix.com, HomepageiFlix.com, Homepage

iFlix CSV file from my teammateiFlix CSV file from my teammate

Step 1. Create a iFlix Neo4j database

We need to work on a clean database. So, I created a new folder call "iflix.graphdb" and selected as my "Database Location" on the Neo4j desktop app dialog. Click the "Start" button to start the Neo4j engine then click on the localhost link to launch the Neo4j browser.

Neo4j desktop app dialogNeo4j desktop app dialog

Neo4j browserNeo4j browser

Step 2. Create index

CREATE INDEX ON :Category(name)
CREATE INDEX ON :Title(name)
CREATE INDEX ON :Type(name)

Step 3. Import the CSV file into Neo4j

There are 3 columns in the CSV file: title, category and type. In order to get the correct result, I will need to apply some fundamental relational database design concept where normalization is needed.

I created a new CSV file which contains only title and category.

iFlix category csviFlix category csv

Load the CSV file into Neo4j:-

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///D:/cheet/Downloads/iflix_category.csv" AS csvline
WITH csvline
WHERE csvline.title IS NOT NULL AND csvline.category IS NOT NULL
MERGE (t:Title {name: csvline.title})
MERGE (c:Category {name: csvline.category})
CREATE (t)-[:CATEGORIZED_AS]->(c)

iFlix category Neo4jiFlix category Neo4j

This will basically:-

  1. Create Title nodes with the property name label as t
  2. Create Category nodes with the property name label as c
  3. Create the CATEGORIZED_AS relationship to link Title nodes to Category nodes

* The database will only create unique nodes due to the MERGE command

Then, I did a pivot on the original CSV file to get unique values of category and type.

iFlix type csviFlix type csv

Load the CSV file into Neo4j:-

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///D:/cheet/Downloads/iflix_type.csv" AS csvline
WITH csvline
WHERE csvline.type IS NOT NULL AND csvline.category IS NOT NULL
MERGE (p:Type {name: csvline.type})
MERGE (c:Category {name: csvline.category})
CREATE (c)-[:UNDER]->(p)

iFlix type Neo4jiFlix type Neo4j

This will basically:-

  1. Create Type nodes with the property name label as p
  2. Create Category nodes with the property name label as c
  3. Create the UNDER relationship to link Category nodes to Type nodes

Step 4. Result(s)

Run the following to get some result

MATCH (n) RETURN n

Neo4j MATCH Cypher statementNeo4j MATCH Cypher statement

 

Neo4j resultNeo4j result

Due to the Neo4j browser limitation, I can only make use of my Chrome zooming function to get a better view of the data.

Neo4j resultNeo4j result

Neo4j is so powerful!

The final step is to run the following to get the categories for TV, Movie & Studio

MATCH (c:Category)-[:UNDER]->(p:Type)
RETURN p, c

Visualize iFlix categories in Neo4jVisualize iFlix categories in Neo4j

From the graph, we can see clearly which are the common categories between TV and Movie, which category can be expanded into TV, etc...

I have uploaded the files to https://app.box.com/s/vwp3ixlcyknbkzsikbtedti7zzadtgj2. You may try it out :)

 

Anything to add? Do share with me in the comments :)