The world of ‘big’ graphs: storage and query optimization

Past Event! Note: this event has already taken place.

The world of ‘big’ graphs: storage and query optimization

March 1, 2018 at 2:00 PM to 3:30 PM

Location:	5345 Herzberg Laboratories
Cost:	Free

Abstract:

Graph structured data is ubiquitous even if not conspicuously visible. For instance, Google power-charges its search results with knowledge graphs, DBPedia — a crowd-sourced community effort to extract structured information from Wikipedia — was ones of the databases used in IBM’s DeepQA project to build the Watson computer that went on to win the Jeopardy challenge. The penetration of graphs goes in other science and engineering fields as well. For instance, huge protein interaction networks such as UniProt (by Swiss Institute of Bioinformatics) and Bio2RDF. The US government launched a project called “data.gov”, where over nine thousand datasets are exported in the Resource Description Framework (RDF) graph format. More often these graphs are of the order of a few billion edges and hundreds of millions of nodes. Thus over the past decade there has been a proliferation of commercial and community graph databases. E.g., BitMat, RDF-3X, gStore, Triplebit, S2RDF, TriAD emerged from academic research, and Neo4j, Pregel, Apache Giraph, Oracle Spatial and Graph store, IBM Graph etc have come from commercial and large community efforts.

In this talk, the speaker will focus on the BitMat system that she developed singlehandedly from scratch to handle RDF graph data. She designed BitMat to target “low-selectivity” pattern queries, i.e., queries which require to access a large amount of graph data, that cannot always benefit from the heuristic cost-based optimization. In this talk, she will discuss the details of BitMat’s novel indexing structure for graphs, a 2-phase pattern query processing algorithm, and theoretical and practical extensions of this algorithm for a broader spectrum of the SPARQL query language (a W3C standard). She will also discuss her ongoing work — (1) using modern hardware advances such as multi-core CPUs and GPUs for massively parallel processing of graphs, (2) optimizing “path pattern queries”, and (3) two of her current projects on the combination of machine learning, computer vision, and large scale data management.

Bio:

Medha Atre is currently working as an Assistant Professor in the Computer Science and Engineering department of Indian Institute of Technology Kanpur since March 2016. Her research interests and vision are to consider the holistic view of real-life data science problems with the solutions for the management, indexing, and retrieval of complex data spanning graphs, audio, video, and text, and build systems by taking into account the “vertical cross-section” of these problems. Previously she has worked as a postdoctoral researcher at the University of Pennsylvania (Philadelphia, PA, USA), and she holds a PhD in Computer Science from Rensselaer Polytechnic Institute (Troy, NY, USA). During her PhD she interned at Oracle Semantic Technologies Lab (Nashua, NH, USA) and IBM T. J. Watson (Yorktown Heights, NY, USA). She also has four and half years of experience of working in the software industry prior to her PhD.

Share: Twitter, Facebook
Short URL: https://carleton.ca/cuids/?p=3479