Machine Learning Strategies for Enhancing Bathymetry Extraction from Imbalanced Lidar Point Clouds

TitleMachine Learning Strategies for Enhancing Bathymetry Extraction from Imbalanced Lidar Point Clouds
Publication TypeConference Proceedings
AuthorsLowell, K, Calder, BR
Conference NameOceans 2019
Conference DatesOct 27-31
Conference LocationSeattle, WA
Keywordsbathymetric lidar, confusion matrix decomposition, Extreme Gradient Boosting, imbalanced samples, probability decision threshold

Abstract—Density-based approaches to extract bathymetry from airborne lidar point clouds generally rely on histogram/frequency-based disambiguation rules to separate noise from signal.  The present work targets the improvement of such disambiguation rules by enhancing each pulse with a machine learning-based estimate of its p(Bathy) – i.e., its probability of truly being bathymetry.  Extreme gradient boosting (XGB) is used to assess the strength of bathymetric signal in pulse return metadata.  Because lidar point clouds can be highly imbalanced between Bathymetry and NotBathymetry, three strategies for mitigating the effects of imbalanced samples were examined.  Impacts of an imbalanced lidar point cloud were successfully mitigated by:

  • Applying an “optimal” decision threshold (ODT) that equalizes accuracy for Bathymetry and NotBathymetry to p(Bathy) rather than using a conventional probability decision threshold (PDT) of 0.50, and
  • Using proportional class weighting to fit the XGB model.

However, decomposing a confusion matrix by iteratively discarding misclassified points and re-fitting an XGB model was not successful in improving the strength or detectability of the bathymetric signal in the metadata.  The same was true for iteratively discarding correctly classified points.

The bathymetric signal in the metadata was found to be sufficiently strong to explore the operational incorporation of results into the disambiguation rules of density-based bathymertric extraction methods.