Time: Friday, September 27, 2019
Time: 3:30 – 4:30 (coffee & refreshments starting at 3:00)
Place: HP 4351 (Macphail Room), School of Mathematics & Statistics, Carleton University
Speaker: John Healy
Title: Accelerated Hierarchical Density Clustering
Finding clusters is a powerful tool for understanding and exploring data.
While the task sounds easy, it can be surprisingly difficult to do it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. Our intuitions for what a cluster is are not as clear as we would like, and can easily be lead astray. I will introduce one useful definition of a cluster derived from density based clustering followed by an accelerated algorithm for performing hierarchical density based clustering. This new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This has led to accelerated HDBSCAN* becoming the default choice of many practitioners of data science today.