Publikationen – DeepBirdDetect

[2025] [2024] [2023]

2025

Brunk, Kristin, H. Kramer, M. Peery, Stefan Kahl, and Connor Wood. “Assessing Spatial Variability and Efficacy of Surrogate Species at an Ecosystem Scale”. Conservation Biology (May 2025). doi:10.1111/cobi.70058.

BibTeXEndNoteDOI

@article{article,
author = {Brunk, Kristin and Kramer, H. and Peery, M. and Kahl, Stefan and Wood, Connor},
journal = {Conservation Biology},
keywords = {deepbirddetect},
month = {05},
title = {Assessing spatial variability and efficacy of surrogate species at an ecosystem scale},
year = 2025
}
%0 Journal Article
%1 article
%A Brunk, Kristin
%A Kramer, H.
%A Peery, M.
%A Kahl, Stefan
%A Wood, Connor
%D 2025
%J Conservation Biology
%R 10.1111/cobi.70058
%T Assessing spatial variability and efficacy of surrogate species at an ecosystem scale
Mann, David, Austin Anderson, Amy Donner, Michael Hall, Stefan Kahl, and Holger Klinck. “Continental-Scale Behavioral Response of Birds to a Total Solar Eclipse”. Scientific Reports 15 (April 2025). doi:10.1038/s41598-025-94901-6.

BibTeXEndNoteDOI

@article{article,
author = {Mann, David and Anderson, Austin and Donner, Amy and Hall, Michael and Kahl, Stefan and Klinck, Holger},
journal = {Scientific Reports},
keywords = {deepbirddetect},
month = {04},
title = {Continental-scale behavioral response of birds to a total solar eclipse},
volume = 15,
year = 2025
}
%0 Journal Article
%1 article
%A Mann, David
%A Anderson, Austin
%A Donner, Amy
%A Hall, Michael
%A Kahl, Stefan
%A Klinck, Holger
%D 2025
%J Scientific Reports
%R 10.1038/s41598-025-94901-6
%T Continental-scale behavioral response of birds to a total solar eclipse
%V 15
Rauch, Lukas, Raphael Schwinger, Moritz Wirth, René Heinrich, Denis Huseljic, Marek Herde, Jonas Lange, et al. “BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics”. In International Conference on Learning Representations (ICLR). ICLR, 2025. https://iclr.cc/.

URLBibTeXEndNote

@inproceedings{rauch2024birdset,
author = {Rauch, Lukas and Schwinger, Raphael and Wirth, Moritz and Heinrich, René and Huseljic, Denis and Herde, Marek and Lange, Jonas and Kahl, Stefan and Sick, Bernhard and Tomforde, Sven and Scholz, Christoph},
booktitle = {International Conference on Learning Representations (ICLR)},
keywords = {deepbirddetect},
publisher = {ICLR},
title = {BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics},
year = 2025
}
%0 Conference Paper
%1 rauch2024birdset
%A Rauch, Lukas
%A Schwinger, Raphael
%A Wirth, Moritz
%A Heinrich, René
%A Huseljic, Denis
%A Herde, Marek
%A Lange, Jonas
%A Kahl, Stefan
%A Sick, Bernhard
%A Tomforde, Sven
%A Scholz, Christoph
%B International Conference on Learning Representations (ICLR)
%D 2025
%I ICLR
%T BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
%U https://iclr.cc/
Rauch, Lukas, Ilyass Moummad, René Heinrich, Alexis Joly, Bernhard Sick, and Christoph Scholz. “Can Masked Autoencoders Also Listen to Birds?”. doi:https://doi.org/10.48550/arXiv.2504.12880.

URLBibTeXEndNoteDOI

@misc{rauch2025maskedautoencoderslistenbirds,
author = {Rauch, Lukas and Moummad, Ilyass and Heinrich, René and Joly, Alexis and Sick, Bernhard and Scholz, Christoph},
keywords = {deepbirddetect},
title = {Can Masked Autoencoders Also Listen to Birds?},
year = 2025
}
%0 Generic
%1 rauch2025maskedautoencoderslistenbirds
%A Rauch, Lukas
%A Moummad, Ilyass
%A Heinrich, René
%A Joly, Alexis
%A Sick, Bernhard
%A Scholz, Christoph
%D 2025
%R https://doi.org/10.48550/arXiv.2504.12880
%T Can Masked Autoencoders Also Listen to Birds?
%U https://arxiv.org/abs/2504.12880
Weidlich-Rau, Melissa, Amanda K. Navine, Patrick T. Chaopricha, Felix Günther, Stefan Kahl, Thomas Wilhelm-Stein, Raymond C. Mack, et al. “Continuous Real-Time Acoustic Monitoring of Endangered Bird Species in Hawai‘i”. Ecological Informatics 87 (2025): 103102. doi:https://doi.org/10.1016/j.ecoinf.2025.103102.

AbstractURLBibTeXEndNoteDOI

The decline of endemic bird species in Hawai‘i requires innovative conservation measures enabled by passive acoustic monitoring (PAM). This paper describes a novel real-time PAM system used in the Pōhakuloa Training Area (PTA) to reduce wildlife collisions and minimize disruptions to military operations while ensuring the protection of endangered bird species such as the Nēnē and ‘Akē‘akē. The system is based on the BirdNET algorithm and was evaluated with over 16,000 soundscape recordings from Hawai‘i. The results show that the model version HI V2.0, based on BirdNET and specifically adapted to Hawaiian bird species, showed the clearest separation between true and false positive detections (average precision 49% to 52%), although this difference was not statistically significant. However, accuracy varied considerably between species and locations, emphasizing the need to adapt the models to the specific conditions of use. A novel web application allows immediate visualization of the predicted bird species, facilitating the implementation of conservation measures. The three acoustic monitoring units installed at the PTA in January 2023 demonstrate the system’s potential for continuous monitoring and protection of Hawaiian endangered bird species.
@article{WEIDLICHRAU2025103102,
abstract = {The decline of endemic bird species in Hawai‘i requires innovative conservation measures enabled by passive acoustic monitoring (PAM). This paper describes a novel real-time PAM system used in the Pōhakuloa Training Area (PTA) to reduce wildlife collisions and minimize disruptions to military operations while ensuring the protection of endangered bird species such as the Nēnē and ‘Akē‘akē. The system is based on the BirdNET algorithm and was evaluated with over 16,000 soundscape recordings from Hawai‘i. The results show that the model version HI V2.0, based on BirdNET and specifically adapted to Hawaiian bird species, showed the clearest separation between true and false positive detections (average precision 49% to 52%), although this difference was not statistically significant. However, accuracy varied considerably between species and locations, emphasizing the need to adapt the models to the specific conditions of use. A novel web application allows immediate visualization of the predicted bird species, facilitating the implementation of conservation measures. The three acoustic monitoring units installed at the PTA in January 2023 demonstrate the system’s potential for continuous monitoring and protection of Hawaiian endangered bird species.},
author = {Weidlich-Rau, Melissa and Navine, Amanda K. and Chaopricha, Patrick T. and Günther, Felix and Kahl, Stefan and Wilhelm-Stein, Thomas and Mack, Raymond C. and Reers, Hendrik and Rice, Aaron N. and Eibl, Maximilian and Hart, Patrick J. and Wolff, Patrick and Klinck, Holger and Schnell, Lena D. and Doratt, Rogelio and Loquet, Michael and Lackey, Tiana},
journal = {Ecological Informatics},
keywords = {deepbirddetect},
pages = 103102,
title = {Continuous Real-Time Acoustic Monitoring of endangered bird species in Hawai‘i},
volume = 87,
year = 2025
}
%0 Journal Article
%1 WEIDLICHRAU2025103102
%A Weidlich-Rau, Melissa
%A Navine, Amanda K.
%A Chaopricha, Patrick T.
%A Günther, Felix
%A Kahl, Stefan
%A Wilhelm-Stein, Thomas
%A Mack, Raymond C.
%A Reers, Hendrik
%A Rice, Aaron N.
%A Eibl, Maximilian
%A Hart, Patrick J.
%A Wolff, Patrick
%A Klinck, Holger
%A Schnell, Lena D.
%A Doratt, Rogelio
%A Loquet, Michael
%A Lackey, Tiana
%D 2025
%J Ecological Informatics
%P 103102
%R https://doi.org/10.1016/j.ecoinf.2025.103102
%T Continuous Real-Time Acoustic Monitoring of endangered bird species in Hawai‘i
%U https://www.sciencedirect.com/science/article/pii/S1574954125001116
%V 87
%X The decline of endemic bird species in Hawai‘i requires innovative conservation measures enabled by passive acoustic monitoring (PAM). This paper describes a novel real-time PAM system used in the Pōhakuloa Training Area (PTA) to reduce wildlife collisions and minimize disruptions to military operations while ensuring the protection of endangered bird species such as the Nēnē and ‘Akē‘akē. The system is based on the BirdNET algorithm and was evaluated with over 16,000 soundscape recordings from Hawai‘i. The results show that the model version HI V2.0, based on BirdNET and specifically adapted to Hawaiian bird species, showed the clearest separation between true and false positive detections (average precision 49% to 52%), although this difference was not statistically significant. However, accuracy varied considerably between species and locations, emphasizing the need to adapt the models to the specific conditions of use. A novel web application allows immediate visualization of the predicted bird species, facilitating the implementation of conservation measures. The three acoustic monitoring units installed at the PTA in January 2023 demonstrate the system’s potential for continuous monitoring and protection of Hawaiian endangered bird species.
Heinrich, René, Lukas Rauch, Bernhard Sick, and Christoph Scholz. “AudioProtoPNet: An Interpretable Deep Learning Model for Bird Sound Classification”. Ecological Informatics (2025): 103081. doi:https://doi.org/10.1016/j.ecoinf.2025.103081.

AbstractURLBibTeXEndNoteDOI

Deep learning models have significantly advanced acoustic bird monitoring by recognizing numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. We introduce AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is inherently interpretable, leveraging a ConvNeXt backbone to extract embeddings and a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species’ vocalizations from spectrograms of instances in the training data. During inference, recordings are classified by comparing them to learned prototypes in the embedding space, providing explanations for the model’s decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9734 bird species and over 6800 h of recordings. Its performance was evaluated on the seven BirdSet test datasets, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art bird sound classification model Perch, which is superior to the more popular BirdNet, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet interpretable deep learning models that provide valuable insights for professionals in ornithology and machine learning.
@article{HEINRICH2025103081,
abstract = {Deep learning models have significantly advanced acoustic bird monitoring by recognizing numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. We introduce AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is inherently interpretable, leveraging a ConvNeXt backbone to extract embeddings and a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species’ vocalizations from spectrograms of instances in the training data. During inference, recordings are classified by comparing them to learned prototypes in the embedding space, providing explanations for the model’s decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9734 bird species and over 6800 h of recordings. Its performance was evaluated on the seven BirdSet test datasets, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art bird sound classification model Perch, which is superior to the more popular BirdNet, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet interpretable deep learning models that provide valuable insights for professionals in ornithology and machine learning.},
author = {Heinrich, René and Rauch, Lukas and Sick, Bernhard and Scholz, Christoph},
journal = {Ecological Informatics},
keywords = 2025,
pages = 103081,
title = {AudioProtoPNet: An interpretable deep learning model for bird sound classification},
year = 2025
}
%0 Journal Article
%1 HEINRICH2025103081
%A Heinrich, René
%A Rauch, Lukas
%A Sick, Bernhard
%A Scholz, Christoph
%D 2025
%J Ecological Informatics
%P 103081
%R https://doi.org/10.1016/j.ecoinf.2025.103081
%T AudioProtoPNet: An interpretable deep learning model for bird sound classification
%U https://www.sciencedirect.com/science/article/pii/S1574954125000901
%X Deep learning models have significantly advanced acoustic bird monitoring by recognizing numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. We introduce AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is inherently interpretable, leveraging a ConvNeXt backbone to extract embeddings and a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species’ vocalizations from spectrograms of instances in the training data. During inference, recordings are classified by comparing them to learned prototypes in the embedding space, providing explanations for the model’s decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9734 bird species and over 6800 h of recordings. Its performance was evaluated on the seven BirdSet test datasets, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art bird sound classification model Perch, which is superior to the more popular BirdNet, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet interpretable deep learning models that provide valuable insights for professionals in ornithology and machine learning.

2024

Wood, Connor M., Felix Günther, Angela Rex, Daniel F. Hofstadter, Hendrik Reers, Stefan Kahl, M. Zachariah Peery, and Holger Klinck. “Real-Time Acoustic Monitoring Facilitates the Proactive Management of Biological Invasions”. Biological Invasions 26, no. 12 (December 1, 2024): 3989–3996. doi:10.1007/s10530-024-03426-y.

AbstractURLBibTeXEndNoteDOI

Biological surveillance at an invasion front is hindered by low population densities and, among animals, high mobility of target species. Using the barred owl (Strix varia) invasion of western North American forests as a test case, we tested real-time autonomous recording units (the ecoPi, OekoFor GbR, Freiburg, Germany) by deploying them in an area known to be occupied by the target species. The ecoPi passively record audio, analyze it onboard with the BirdNET algorithm, and transmit audio clips with identifiable sounds via cellular network to a web interface where users can listen to audio to manually vet the results. We successfully detected and lethally removed three barred owls, demonstrating that real-time acoustic monitoring can be used to support rapid interventions at the forefront of an ongoing invasion in which proactive management may be essential to the protection of an iconic native species, the spotted owl (S. occidentalis). This approach has the potential to make a significant contribution to global biodiversity conservation efforts by massively increasing the speed at which biological invasions by acoustically active species, and other time-sensitive conservation challenges, can be managed.
@article{Wood2024,
abstract = {Biological surveillance at an invasion front is hindered by low population densities and, among animals, high mobility of target species. Using the barred owl (Strix varia) invasion of western North American forests as a test case, we tested real-time autonomous recording units (the ecoPi, OekoFor GbR, Freiburg, Germany) by deploying them in an area known to be occupied by the target species. The ecoPi passively record audio, analyze it onboard with the BirdNET algorithm, and transmit audio clips with identifiable sounds via cellular network to a web interface where users can listen to audio to manually vet the results. We successfully detected and lethally removed three barred owls, demonstrating that real-time acoustic monitoring can be used to support rapid interventions at the forefront of an ongoing invasion in which proactive management may be essential to the protection of an iconic native species, the spotted owl (S. occidentalis). This approach has the potential to make a significant contribution to global biodiversity conservation efforts by massively increasing the speed at which biological invasions by acoustically active species, and other time-sensitive conservation challenges, can be managed.},
author = {Wood, Connor M. and Günther, Felix and Rex, Angela and Hofstadter, Daniel F. and Reers, Hendrik and Kahl, Stefan and Peery, M. Zachariah and Klinck, Holger},
journal = {Biological Invasions},
keywords = {deepbirddetect},
month = 12,
number = 12,
pages = {3989--3996},
title = {Real-time acoustic monitoring facilitates the proactive management of biological invasions},
volume = 26,
year = 2024
}
%0 Journal Article
%1 Wood2024
%A Wood, Connor M.
%A Günther, Felix
%A Rex, Angela
%A Hofstadter, Daniel F.
%A Reers, Hendrik
%A Kahl, Stefan
%A Peery, M. Zachariah
%A Klinck, Holger
%D 2024
%J Biological Invasions
%N 12
%P 3989--3996
%R 10.1007/s10530-024-03426-y
%T Real-time acoustic monitoring facilitates the proactive management of biological invasions
%U https://doi.org/10.1007/s10530-024-03426-y
%V 26
%X Biological surveillance at an invasion front is hindered by low population densities and, among animals, high mobility of target species. Using the barred owl (Strix varia) invasion of western North American forests as a test case, we tested real-time autonomous recording units (the ecoPi, OekoFor GbR, Freiburg, Germany) by deploying them in an area known to be occupied by the target species. The ecoPi passively record audio, analyze it onboard with the BirdNET algorithm, and transmit audio clips with identifiable sounds via cellular network to a web interface where users can listen to audio to manually vet the results. We successfully detected and lethally removed three barred owls, demonstrating that real-time acoustic monitoring can be used to support rapid interventions at the forefront of an ongoing invasion in which proactive management may be essential to the protection of an iconic native species, the spotted owl (S. occidentalis). This approach has the potential to make a significant contribution to global biodiversity conservation efforts by massively increasing the speed at which biological invasions by acoustically active species, and other time-sensitive conservation challenges, can be managed.
Rauch, Lukas, Denis Huseljic, Moritz Wirth, Jens Decke, Bernhard Sick, and Christoph Scholz. “Towards Deep Active Learning in Avian Bioacoustics.”. CoRR abs/2406.18621 (2024). http://dblp.uni-trier.de/db/journals/corr/corr2406.html#abs-2406-18621.

URLBibTeXEndNote

@article{journals/corr/abs-2406-18621,
author = {Rauch, Lukas and Huseljic, Denis and Wirth, Moritz and Decke, Jens and Sick, Bernhard and Scholz, Christoph},
journal = {CoRR},
keywords = {deepbirddetect},
title = {Towards Deep Active Learning in Avian Bioacoustics.},
volume = {abs/2406.18621},
year = 2024
}
%0 Journal Article
%1 journals/corr/abs-2406-18621
%A Rauch, Lukas
%A Huseljic, Denis
%A Wirth, Moritz
%A Decke, Jens
%A Sick, Bernhard
%A Scholz, Christoph
%D 2024
%J CoRR
%T Towards Deep Active Learning in Avian Bioacoustics.
%U http://dblp.uni-trier.de/db/journals/corr/corr2406.html#abs-2406-18621
%V abs/2406.18621
Wood, Connor M., and Stefan Kahl. “Guidelines for Approriate Use of BirdNET Scores and Other Detector Outputs”. Journal of Ornithology (2024). doi:https://doi.org/10.1007/s10336-024-02144-5.

AbstractURLBibTeXEndNoteDOI

Machine learning tools capable of identifying animals by sound have proliferated, making the challenge of interpreting their outputs much more prevalent. These tools, like their predecessors, quantify prediction uncertainty with scores that tend to resemble probabilities but are actually unitless scores that are (generally) positively related to prediction accuracy in species-specific ways. BirdNET is one such tool, a freely available animal sound identification algorithm capable of identifying > 6,000 species, most of them birds. We describe two ways in which BirdNET “confidence scores”—and the output scores of other detector tools—can be used appropriately to interpret BirdNET results (reviewing them down to a user-defined threshold or converting them to probabilities), and provide a step-by-step tutorial to follow these suggestions. These suggestions are complementary to common performance metrics like precision, recall, and receiver operating characteristic. BirdNET can be a powerful tool for acoustic-based biodiversity research, but its utility depends on the careful use and interpretation of its outputs.
@article{wood2024guidelines,
abstract = {Machine learning tools capable of identifying animals by sound have proliferated, making the challenge of interpreting their outputs much more prevalent. These tools, like their predecessors, quantify prediction uncertainty with scores that tend to resemble probabilities but are actually unitless scores that are (generally) positively related to prediction accuracy in species-specific ways. BirdNET is one such tool, a freely available animal sound identification algorithm capable of identifying > 6,000 species, most of them birds. We describe two ways in which BirdNET “confidence scores”—and the output scores of other detector tools—can be used appropriately to interpret BirdNET results (reviewing them down to a user-defined threshold or converting them to probabilities), and provide a step-by-step tutorial to follow these suggestions. These suggestions are complementary to common performance metrics like precision, recall, and receiver operating characteristic. BirdNET can be a powerful tool for acoustic-based biodiversity research, but its utility depends on the careful use and interpretation of its outputs.},
author = {Wood, Connor M. and Kahl, Stefan},
journal = {Journal of Ornithology},
keywords = {deepbirddetect},
title = {Guidelines for approriate use of BirdNET scores and other detector outputs},
year = 2024
}
%0 Journal Article
%1 wood2024guidelines
%A Wood, Connor M.
%A Kahl, Stefan
%D 2024
%J Journal of Ornithology
%R https://doi.org/10.1007/s10336-024-02144-5
%T Guidelines for approriate use of BirdNET scores and other detector outputs
%U https://doi.org/10.1007/s10336-024-02144-5
%X Machine learning tools capable of identifying animals by sound have proliferated, making the challenge of interpreting their outputs much more prevalent. These tools, like their predecessors, quantify prediction uncertainty with scores that tend to resemble probabilities but are actually unitless scores that are (generally) positively related to prediction accuracy in species-specific ways. BirdNET is one such tool, a freely available animal sound identification algorithm capable of identifying > 6,000 species, most of them birds. We describe two ways in which BirdNET “confidence scores”—and the output scores of other detector tools—can be used appropriately to interpret BirdNET results (reviewing them down to a user-defined threshold or converting them to probabilities), and provide a step-by-step tutorial to follow these suggestions. These suggestions are complementary to common performance metrics like precision, recall, and receiver operating characteristic. BirdNET can be a powerful tool for acoustic-based biodiversity research, but its utility depends on the careful use and interpretation of its outputs.
Heinrich, René, Bernhard Sick, and Christoph Scholz. “AudioProtoPNet: An Interpretable Deep Learning Model for Bird Sound Classifiation” (2024). doi:https://doi.org/10.48550/arXiv.2404.10420.

AbstractURLBibTeXEndNoteDOI

Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions.
@article{heinrich2024audioprotopnet,
abstract = {Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions.},
author = {Heinrich, René and Sick, Bernhard and Scholz, Christoph},
keywords = {deepbirddetect},
title = {AudioProtoPNet: An Interpretable Deep Learning Model for Bird Sound Classifiation},
year = 2024
}
%0 Journal Article
%1 heinrich2024audioprotopnet
%A Heinrich, René
%A Sick, Bernhard
%A Scholz, Christoph
%D 2024
%R https://doi.org/10.48550/arXiv.2404.10420
%T AudioProtoPNet: An Interpretable Deep Learning Model for Bird Sound Classifiation
%U https://arxiv.org/abs/2404.10420v1
%X Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions.
Rauch, Lukas, Raphael Schwinger, Moritz Wirth, René Heinrich, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, and Christoph Scholz. “BirdSet: A Multi-Task Benchmark for Classification in Avian Bioacoustics.”. CoRR abs/2403.10380 (2024). http://dblp.uni-trier.de/db/journals/corr/corr2403.html#abs-2403-10380.

URLBibTeXEndNote

@article{journals/corr/abs-2403-10380,
author = {Rauch, Lukas and Schwinger, Raphael and Wirth, Moritz and Heinrich, René and Lange, Jonas and Kahl, Stefan and Sick, Bernhard and Tomforde, Sven and Scholz, Christoph},
journal = {CoRR},
keywords = {deepbirddetect},
title = {BirdSet: A Multi-Task Benchmark for Classification in Avian Bioacoustics.},
volume = {abs/2403.10380},
year = 2024
}
%0 Journal Article
%1 journals/corr/abs-2403-10380
%A Rauch, Lukas
%A Schwinger, Raphael
%A Wirth, Moritz
%A Heinrich, René
%A Lange, Jonas
%A Kahl, Stefan
%A Sick, Bernhard
%A Tomforde, Sven
%A Scholz, Christoph
%D 2024
%J CoRR
%T BirdSet: A Multi-Task Benchmark for Classification in Avian Bioacoustics.
%U http://dblp.uni-trier.de/db/journals/corr/corr2403.html#abs-2403-10380
%V abs/2403.10380

2023

Ghani, Burooj, Tom Denton, Stefan Kahl, and Holger Klinck. “Global Birdsong Embeddings Enable Superior Transfer Learning for Bioacoustic Classification”. Scientific Reports (2023). doi:doi:10.1038/s41598-023-49989-z.

AbstractBibTeXEndNoteDOI

Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for research and decision-making in biodiversity monitoring, animal behaviour studies, and natural resource management. However, deep learning models are often data-hungry and require a significant amount of labeled training data to perform well. While sufficient training data is available for certain taxonomic groups (e.g., common bird species), many classes (such as rare and endangered species, many non-bird taxa, and call-type) lack enough data to train a robust model from scratch. This study investigates the utility of feature embeddings extracted from audio classification models to identify bioacoustic classes other than the ones these models were originally trained on. We evaluate models on diverse datasets, including different bird calls and dialect types, bat calls, marine mammals calls, and amphibians calls. The embeddings extracted from the models trained on bird vocalization data consistently allowed higher quality classification than the embeddings trained on general audio datasets. The results of this study indicate that high-quality feature embeddings from large-scale acoustic bird classifiers can be harnessed for few-shot transfer learning, enabling the learning of new classes from a limited quantity of training data. Our findings reveal the potential for efficient analyses of novel bioacoustic tasks, even in scenarios where available training data is limited to a few samples.
@article{ghani2023global,
abstract = {Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for research and decision-making in biodiversity monitoring, animal behaviour studies, and natural resource management. However, deep learning models are often data-hungry and require a significant amount of labeled training data to perform well. While sufficient training data is available for certain taxonomic groups (e.g., common bird species), many classes (such as rare and endangered species, many non-bird taxa, and call-type) lack enough data to train a robust model from scratch. This study investigates the utility of feature embeddings extracted from audio classification models to identify bioacoustic classes other than the ones these models were originally trained on. We evaluate models on diverse datasets, including different bird calls and dialect types, bat calls, marine mammals calls, and amphibians calls. The embeddings extracted from the models trained on bird vocalization data consistently allowed higher quality classification than the embeddings trained on general audio datasets. The results of this study indicate that high-quality feature embeddings from large-scale acoustic bird classifiers can be harnessed for few-shot transfer learning, enabling the learning of new classes from a limited quantity of training data. Our findings reveal the potential for efficient analyses of novel bioacoustic tasks, even in scenarios where available training data is limited to a few samples.},
author = {Ghani, Burooj and Denton, Tom and Kahl, Stefan and Klinck, Holger},
journal = {Scientific Reports},
keywords = {deepbirddetect},
title = {Global Birdsong Embeddings Enable Superior Transfer Learning for Bioacoustic Classification},
year = 2023
}
%0 Journal Article
%1 ghani2023global
%A Ghani, Burooj
%A Denton, Tom
%A Kahl, Stefan
%A Klinck, Holger
%D 2023
%J Scientific Reports
%R doi:10.1038/s41598-023-49989-z
%T Global Birdsong Embeddings Enable Superior Transfer Learning for Bioacoustic Classification
%X Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for research and decision-making in biodiversity monitoring, animal behaviour studies, and natural resource management. However, deep learning models are often data-hungry and require a significant amount of labeled training data to perform well. While sufficient training data is available for certain taxonomic groups (e.g., common bird species), many classes (such as rare and endangered species, many non-bird taxa, and call-type) lack enough data to train a robust model from scratch. This study investigates the utility of feature embeddings extracted from audio classification models to identify bioacoustic classes other than the ones these models were originally trained on. We evaluate models on diverse datasets, including different bird calls and dialect types, bat calls, marine mammals calls, and amphibians calls. The embeddings extracted from the models trained on bird vocalization data consistently allowed higher quality classification than the embeddings trained on general audio datasets. The results of this study indicate that high-quality feature embeddings from large-scale acoustic bird classifiers can be harnessed for few-shot transfer learning, enabling the learning of new classes from a limited quantity of training data. Our findings reveal the potential for efficient analyses of novel bioacoustic tasks, even in scenarios where available training data is limited to a few samples.
Rauch, Lukas, Raphael Schwinger, Moritz Wirth, Bernhard Sick, Sven Tomforde, and Christoph Scholz. “Active Bird2Vec: Towards End-To-End Bird Sound Monitoring With Transformers” (2023). doi:doi:10.48550/ARXIV.2308.07121.

AbstractURLBibTeXEndNoteDOI

We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ACTIVE BIRD2VEC is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.
@article{rauch2023active,
abstract = {We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ACTIVE BIRD2VEC is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.},
author = {Rauch, Lukas and Schwinger, Raphael and Wirth, Moritz and Sick, Bernhard and Tomforde, Sven and Scholz, Christoph},
keywords = {deepbirddetect},
title = {Active Bird2Vec: Towards End-To-End Bird Sound Monitoring with Transformers},
year = 2023
}
%0 Journal Article
%1 rauch2023active
%A Rauch, Lukas
%A Schwinger, Raphael
%A Wirth, Moritz
%A Sick, Bernhard
%A Tomforde, Sven
%A Scholz, Christoph
%D 2023
%R doi:10.48550/ARXIV.2308.07121
%T Active Bird2Vec: Towards End-To-End Bird Sound Monitoring with Transformers
%U https://arxiv.org/pdf/2308.07121v2.pdf
%X We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ACTIVE BIRD2VEC is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.
Hamer, Jenny, Eleni Triantafillou, Bart van Merriënboer, Stefan Kahl, Holger Klinck, Tom Denton, and Vincent Dumoulin. “BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics” (2023). doi:https://doi.org/10.48550/arXiv.2312.07439.

AbstractURLBibTeXEndNoteDOI

The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.
@article{hamer2023generalization,
abstract = {The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.},
author = {Hamer, Jenny and Triantafillou, Eleni and van Merriënboer, Bart and Kahl, Stefan and Klinck, Holger and Denton, Tom and Dumoulin, Vincent},
keywords = {deepbirddetect},
title = {BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics},
year = 2023
}
%0 Journal Article
%1 hamer2023generalization
%A Hamer, Jenny
%A Triantafillou, Eleni
%A van Merriënboer, Bart
%A Kahl, Stefan
%A Klinck, Holger
%A Denton, Tom
%A Dumoulin, Vincent
%D 2023
%R https://doi.org/10.48550/arXiv.2312.07439
%T BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics
%U https://arxiv.org/abs/2312.07439
%X The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.