Everything I Know About Web Data Integration for The Non-Expert

February 27, 2018 at 2:00 PM to 3:30 PM

Location:5345 Herzberg Laboratories


Today, there is an abundance of structured data available on the web. This data comes from heterogeneous sources, and therefore requires to be integrated for their informational value to be fully exploited. Due to the scale of the heterogeneous data sets available on the web, integrating them is typically an automatic process. However, automatic approaches are not very accurate due to the large degree of heterogeneity and variance of domains. Therefore, these automatic approaches can be considered a first step to quickly get a good quality output that can be used in issuing queries over the data sources. The second step is refining this output over time while being used. Interacting with these data sets through the data integration systems output and refining this output requires having the necessary expertise in data management problems, which limits using this output to power users almost exclusively, and consequently limits their usability.

In this talk, I focus on helping non expert users to access heterogeneous data sources via the output of data integration systems without having any prior knowledge of the queried data sources or exposing these users to the details of the output of the data integration system. More importantly, the users can give their feedback over the answers to their queries. This feedback can then be used to refine and improve the quality of the output of data integration systems. Specifically, this talk focuses on helping non expert users to query heterogeneous RDF data sets, and utilizing their feedback over query answers to improve the quality of the interlinking between the queried data sets.

Short Bio:

Ahmed El-Roby is a PhD candidate in the Data Systems Group at the David R. Cheriton School of Computer Science, University of Waterloo. His research focus is on web data integration topics. His most recent research focuses on incorporating non-expert users into web data integration systems and utilizing the user’s interaction with these systems to improve their output. During his PhD, Ahmed also worked at Qatar Computing Research Institute on Automatic Linking and at Carnegie Mellon University on Peloton, the self-driving database management system. When he is not glued to a computer screen, he can be found glued to another screen watching a soccer game.

