Having used WISDAM to label image datasets from aerial surveys conducted around the world (some of which are showcased on our Projects page) we now have a large training dataset that includes dugongs, whales, dolphins, turtles, sharks, rays, birds, fish and seasnakes.
Frederic Maire has been working with us to explore the use of object detection deep learning models for detecting marine fauna is aerial survey imagery for many years. It’s has been an iterative process where Frederic has updated the model as new object detection frameworks and architectures have been developed.
Our aim, to date, has been to produce a set of predicted animal detections that are then verified manually. The efficacy of these automated systems depends on the recall, which is depicted in the lefthand side of the illustration below (from Axford et al. 2024), and is the proportion of the target animals in the images that are actually detected.
We are aiming for a recall that is as high as possible (ideally above 90%), partly because that’s what manual image reviewers can detect, but also because in some of the areas that we want this system to be applied, animals are occurring in such low numbers we just can’t afford to be missing too many animals.
But we do also want to maintain a reasonable level of precision, which is the proportion of all predictions that were actually true detections, and this to ensure the efficiency and reliability of the manual verification process.
And lastly we are aiming to detect all animals that are visible to the human eye, so that includes animals throughout the water column and is regardless of image complexity.
Our original model was published in 2015 (available here) and was only trained to detect dugongs using images that we had collected and labelled manually during a series of trial drone surveys, and in that we compared Convolutional Neural Network architectures and advocated a simple segmentation and region proposals method using deep CNNs. We reported the recall as being 80% and precision as 27% based on a test set of images randomly extracted from the dataset prior to training, as is the standard process.
Since then we have trained a series of CNN architectures within the Google TensorFlow framework.
Given the impact of transformers in computer vision with respect to efficiency and accuracy, most recently Frederic decided to fine tune a pretrained model called DETER (based on transformers).
AI example detections