DEV Community

Cover image for How an RDBMS works #4: Creating a citation graph with PostgreSQL + Apache AGE
Carla Sanches
Carla Sanches

Posted on • Edited on

How an RDBMS works #4: Creating a citation graph with PostgreSQL + Apache AGE

In the previous post, I presented an example of an academic article database to explain the process of analyzing database requirements. Today, we'll take a more practical approach and explore this example by creating an academic paper database in PostgreSQL and a citation graph using Apache AGE. The queries for building this database are available in this GitHub gist.

1. What are graphs?

A graph G = (V, E) is a data structure composed of a set of vertices (V) and a set of edges (E) (Vilas Boas, 2016). We can represent a graph as shown in Figure 1. The letters from 'a' to 'e' represent the vertices, and the edges are the lines that connect them.

Representation of a graph formed by 5 vertices and 7 edges.


Figure 1 - Representation of a graph formed by five vertices and seven edges. Adapted from Vilas Boas (2016).

In the example of our database, each vertex represents an academic paper. The relationship between these papers is "cited by" since an article references (or cites) other articles. An edge represents a relationship.

2. Requirements

To follow this tutorial, you need to install PostgreSQL and Apache AGE. Below is a list of tutorials to install these programs:

In this tutorial, I created the database using PostgreSQL 13 and Apache AGE 1.3.0. At the time of this post, Apache AGE supports only PostgreSQL 11, 12, and 13. The operating system used is Ubuntu 22.04 LTS.

3. Building the database

Before we begin building the database, let's list some properties of scientific papers:

  • Title;
  • Author(s);
  • Year of publication;
  • Publication venue.

As defined in Section 1, the relationship between two papers can be defined as "cited by" or "cites" for simplicity. Since one paper cites another, we can represent this relationship as shown in Figure 2.

Figure 2 - Relationship between two papers. Elaborated by author.


Figure 2 - Relationship between two papers. Elaborated by author.

Now we can create the database. Assuming that PostgreSQL has just been installed, the following commands will start the PostgreSQL service and call the command-line interface:

sudo service postgresql start
sudo -u postgres psql
Enter fullscreen mode Exit fullscreen mode

Once in the interface, we create a database named papersdb and connect to it.

create database papersdb;
\c papersdb
Enter fullscreen mode Exit fullscreen mode

To use the Apache AGE extension, it is necessary to install it on the server and load it for each session. And to simplify queries, we add ag_catalog to the search_path.

CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
Enter fullscreen mode Exit fullscreen mode

Next, we create a graph called citation_graph:

SELECT * FROM create_graph('citation_graph');
Enter fullscreen mode Exit fullscreen mode

We now create some entries for the academic papers, which contain the properties defined at the beginning of this section. We will also define an ID to make it easier to connect the edges between each paper. The id structure is according to the last name of the first author, publication year, and the beginning of the publication title, as in the example oliveira2009automatic.

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'williams2015affect', title: 'Investigating affect in algorithmic composition systems', publisher: 'Psychology of Music',year: 2015, author:['Duncan Williams', 'Alexis Kirke', 'Eduardo R Miranda', 'Etienne Roesch', 'Ian Daly', 'Slawomir Nasuto']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'williams2013towards', title: 'Towards affective algorithmic composition', publisher: 'University of Jyväskylä, Department of Music', year: 2013, author:['Duncan Williams', 'Alexis Kirke', 'Eduardo R Miranda', 'Etienne Roesch', 'Slawomir Nasuto']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'schubert1999measurement', title: 'Measurement and Time Series Analysis of Emotion in Music', publisher: 'University of New South Wales. Music & Music Education', year: 1999, author:['Emery Schubert']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'scherer2004which', title: 'Which Emotions Can be Induced by Music? What Are the Underlying Mechanisms? And How Can We Measure Them?', publisher: 'Journal of New Music Research', year: 2004, author:['Klaus R. Scherer']})                          
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'gabrielsson2001emotion', title: 'Emotion perceived and emotion felt: Same or different?', publisher: 'Musicae Scientiae', year: 2001, author:['Alf Gabrielsson']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'scherer2001emotional', title: 'Emotional effects of music: Production rules', publisher: 'Oxford University Press', year: 2001, author:['Klaus Scherer', 'Marcel Zentner']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'juslin2001communicating', title: 'Communicating emotion in music performance: A review and a theoretical framework', publisher: 'Oxford University Press', year: 2001, author:['Patrik Juslin']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'williams2015dynamic', title: 'Dynamic Game Soundtrack Generation in Response to a Continuously Varying Emotional Trajectory', publisher: 'Audio Engineering Society Conference: 56th International Conference: Audio for Games', year: 2015, author:['Duncan Williams', 'Alexis Kirke', 'Joel Eaton', 'Eduardo Miranda', 'Ian Daly', 'James Hallowell', 'Etienne Roesch', 'Faustina Hwang', 'Slawomir Nasuto']})            
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$
CREATE (v:Article {id: 'williams2015investigating', title: 'Investigating Perceived Emotional Correlates of Rhythmic Density in Algorithmic Music Composition', publisher: 'Association for Computing Machinery', year: 2015, author:['Duncan Williams', 'Alexis Kirke', 'Joel Eaton', 'Eduardo Miranda', 'Ian Daly', 'James Hallowell', 'James Weaver', 'Asad Malik', 'Etienne Roesch', 'Faustina Hwang', 'Slawomir Nasuto']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'daly2015towards', title: 'Towards human-computer music interaction: Evaluation of an affectively-driven music generator via galvanic skin response measures',  publisher: '2015 7th Computer Science and Electronic Engineering Conference (CEEC)', year: 2015, author:['Duncan Williams', 'Alexis Kirke', 'Eduardo Miranda', 'Ian Daly', 'Faustina Hwang', 'Slawomir Nasuto', 'Asad Malik', 'James Weaver']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'kirke2013artificial', title: 'Artificial affective listening towards a machine learning tool for sound-based emotion therapy and control',  publisher: 'Proceedings of the Sound and Music Computing Conference', year: 2013, author:['Alexis Kirke', 'Eduardo Miranda', 'Slawomir Nasuto']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'kirke2012learningto', title: 'Learning to Make Feelings: Expressive Performance as a part of a machine learning tool for sound-based emotion therapy and control',  publisher: 'the 9th Intl Symp on Computer Music Modeling and Retrieval', year: 2012, author:['Alexis Kirke', 'Eduardo Miranda', 'Slawomir Nasuto']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'lopez2010real', title: 'Real-Time Emotion-Driven Music Engine',  publisher: 'the 9th Intl Symp on Computer Music Modeling and Retrieval', year: 2010, author:['Alex Lopez', 'Antonio Oliveira', 'Amilcar Cardoso']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'oliveira2008affective', title: 'Affective-driven music production: selection and transformation of music',  publisher: 'ARTECH', year: 2008, author:['Antonio Oliveira', 'Amilcar Cardoso']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'oliveira2008modeling', title: 'Modeling affective content of music: A knowledge base approach',  publisher: 'Sound and Music Computing Conference', year: 2008, author:['Antonio Oliveira', 'Amilcar Cardoso']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'livingstone2007controlling', title: 'Controlling musical emotionality: an affective computational architecture for influencing musical emotions',  publisher: 'Digital Creativity', year: 2007, author:['Steven R. Livingstone', 'Ralf Mühlberger', 'Andrew Brown', 'Andrew Loch']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'livingstone2005dynamic', title: 'Dynamic Response: Real-Time Adaptation for Music Emotion',  publisher: 'Creativity & Cognition Studios Press', year: 2005, author:['Steven R. Livingstone','Andrew Brown']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'oliveira2009automatic', title: 'Automatic manipulation of music to express desired emotions',  publisher: 'Proceedings of the 6th Sound and Music Computing Conference', year: 2009, author:['Antonio Oliveira', 'Amilcar Cardoso']})
RETURN v $$) as (v agtype);

SELECT * FROM cypher('citation_graph', $$        
CREATE (v:Article { id: 'russell1980circumplex', title: 'A circumplex model of affect',  publisher: 'American Psychological Association', year: 1980, author:['James Russel']})
RETURN v $$) as (v agtype);
Enter fullscreen mode Exit fullscreen mode

Finally, we define the relationships between articles based on the citations found within each one. We can use the id to retrieve the vertices and connect them:

SELECT * FROM cypher ('citation_graph', $$ 
MATCH (a:Article), (b:Article)
WHERE a.id = 'williams2015affect' and b.id = 'williams2013towards'
CREATE (a)-[e:CITES]->(b)
RETURN e                 
$$) as (e agtype);
Enter fullscreen mode Exit fullscreen mode

To avoid making this post too long, I will not repeat all the queries at this step. Unlike the entries with the information of the papers, which is only one for each paper, the relationships can be "one to many." This is the link to the GitHub gist containing all the queries to create the relationships between the papers. You can copy and paste on your PostgreSQL CLI to run the queries. With the definition of all edges, we have finished building our citation graph, presented in Figure 3:

Figure 3 - Citation graph. Developed by the author and generated with AGE Viewer.


Figure 3 - Citation graph. Elaborated by author and generated with AGE Viewer.

4. Conclusions

In this article, we have learned what graphs are and how to build a graph database using the PostgreSQL extension Apache AGE. In the next post, I will show you how to use the AGE Viewer to display the graph illustrated in Figure 3.

5. Errata

My intention is to provide access to technology information through reliable sources. If you have found any incorrect information, please let your contribution in the comments below 😊.

6. Related Articles

Collaborate to Innovate: How a Social Network Can Help You Find Your Dream Pair Programming Team
Create a Word Guessing Game With Apache AGE
Step-by-Step Guide to Install PostgreSQL via Package Management for Apache AGE on Ubuntu
Easy guide to install and configure PostgreSQL with Apache AGE on Windows
Installing Apache AGE with Docker on Windows
Easy Installation of PostgreSQL, AGE & Age-viewer on Mac

7. References

Vilas Boas, Matheus Guedes. Graph Theory Class Notes - Basic Concepts on Graphs - Definitions and Theorems. Federal University of Ouro Preto. 2016. Unpublished.

Banner designed by Freepik.

8. Contribute to Apache AGE

Apache AGE website: https://age.apache.org/

Apache AGE Github: https://github.com/apache/age

Top comments (0)