Fishbein, E., and Patterson, R.T., 1993. “Error weighted maximum likelihood (EWML)” a new statistically valid method to cluster quantitative micropaleontological data. Journal of Paleontology, 67:475-486.

The advent of readily-available computer-based clustering packages has created some controversy in the micropaleontological community concerning the use and interpretation of computer based biofacies discrimination. This is because dramatically different results can be obtained depending on methodology. The analysis of various clustering techniques reveals that in most instances, no statistical hypothesis is contained in the clustering model and that no basis exists for accepting one biofacies partitioning over another. Furthermore most techniques do not consider standard error in species abundances and generate results that are not statistically relevant. When many rare species are present, accumulated statistically insignificant differences in rare species can overshadow the significant differences in the major species leading to biofacies containing members having little in common.

A statistically-based “error-weighted maximum likelihood” (EWML) clustering method is described that determines biofacies by assuming samples from a common biofacies are normally distributed. This method also weights species variability to be inversely proportional to measurement uncertainty. The method has been applied to samples collected from the Fraser River Delta marsh and shows that five distinct biofacies can be resolved in the data. Similar results were obtained from readily available packages when the data set was preprocessed to reduce the number of degrees of freedom. Based on the sample results from the new algorithm, and on tests using a representative micropaleonotological data set, a more conventional iterative processing method is recommended. This method, although not statistical in nature, produces similar results to EWML (not commercially available yet) with readily available analysis packages. Finally, some of the more common clustering techniques are discussed and strategies for their proper utilization are recommended.

pdf