Insights and Algorithms for Complex Data Domains
February 28, 2018 at 10:30 AM to 12:00 PM
|Location:||5345 Herzberg Laboratories|
Data coupled with the right algorithms offers the potential to save lives, protect the environment and earn money. This potential, however, can be severely inhibited by adverse data properties that break the assumptions that are implicit in many learning algorithms. Real-world classification domains, for example have challenging data properties, such as class overlap, multi-modality, high-dimensionality, noise, and imbalanced class priors. In this talk, I will focus on how adverse data properties impact classifier performance, and on my research into manifold-based synthetic oversampling as a means of improving classifier performance on high-dimensional, imbalanced datasets. In particular, I will focus on using autoencoder models to generate synthetic minority training examples. Furthermore, I will highlight a new technique to generate better synthetic samples by incorporating the majority class distribution into the model, and discuss extending the algorithm to multi-class problems. Finally, I will outline some interdisciplinary applications of my research in health care, security, environmental moderating and machine failure prediction.
Colin Bellinger is a post-doctoral fellow with the Alberta Machine Intelligence Institute at the University of Alberta. He received his PhD from the University of Ottawa. His primary research goal is to develop machine learning and data mining algorithms that are robust on datasets with adverse properties, such as rarity, class imbalance and concept drift. To this end, his research focus is on understanding the affects of properties on predictive performance and developing new algorithms, including active learning, and ensemble one-class SVMs, and generative oversampling with autoencoders. His research has been applied to a variety of domains, including health care, security, environmental moderating and machine failure prediction.