Comparing instruments

Slides:

Advertisements

Similar presentations

Christopher O. Tiemann Michael B. Porter Science Applications International Corporation John A. Hildebrand Scripps Institution of Oceanography Automated.

Advertisements

Monitoring Fish Passage with an Automated Imaging System Steve R. Brink, Senior Fisheries Biologist Northwest Hydro Annual Meeting 2014, Seattle.

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

1 Improved fish detection probability in data from split- beam sonars. Helge Balk and Torfinn Lindem. Department of Physics. University of Oslo.

Cetacean click tone logging by PODs Chelonia Limited / Nick Tregenza.

Pegasus Lectures, Inc. Volume II Companion Presentation Frank Miele Pegasus Lectures, Inc. Ultrasound Physics & Instrumentation 4 th Edition.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Virgo Open Data Plans October 27, 2011 Benoit Mours (LAPP-Annecy) On behalf of the Virgo Collaboration.

Testing an individual module

C~POD How it works Nick Tregenza Feb Clicks made by dolphins are not very distinctive sounds, and even ‘typical’ porpoise clicks can come from something.

CPOD.exe … start here The menu is a ‘tabbed dialog box’ that comes and goes when your mouse pointer enters the blue bar This box contains a description.

Bug Localization with Machine Learning Techniques Wujie Zheng

Time Management Personal and Project. Why is it important Time management is directly relevant to Project Management If we cannot manage our own time.

1 Snow depth distribution Neumann et al. (2006). 2.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

1 One goal in most meta-analyses is to examine overall, or typical effects We might wish to estimate and test the value of an overall (general) parameter.

SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.

Analysis of AP Exam Scores

C-POD & C-POD-F.

Project Data Flow.

Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

7. Performance Measurement

Reading and Viewing POD data

automated detection errors validation

Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.

Robert Anderson SAS JMP

SE40 Meeting on Pseudolites JRC test results

Choice of Filters.

Hardware & Software Reliability

MSA / Gage Capability (GR&R)

Sonar and Echolocation

Walk through a file Ilha_Grande CPOD1372 file01.CP1

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

MECH 373 Instrumentation and Measurements

Tracking Objects with Dynamics

WUTS weak unknown train sources

Static Acoustic Monitoring and Noise

a critical approach for collecting and analysing data

INVENTORY CONTROL MODELS –A CASE STUDY. LAL BAHADUR GUPTA Admission no

Actuaries Climate Index™

Introduction to Data Mining, 2nd Edition by

CSE 4705 Artificial Intelligence

Data collection with Internet

Introduction to Data Mining, 2nd Edition by

Volume I Companion Presentation Frank R. Miele Pegasus Lectures, Inc.

NanoBPM Status and Multibunch Mark Slater, Cambridge University

An Introduction to Visual Fields

Exploring and Understanding ChIP-Seq data

Chapter 13 Quality Management

Using statistics to evaluate your test Gerard Seinhorst

RAID Redundant Array of Inexpensive (Independent) Disks

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

Lecture 2: Frequency & Time Domains presented by David Shires

Speech Perception (acoustic cues)

Honors Statistics Review Chapters 4 - 5

Data collection with Internet

Machine Learning for Visual Scene Classification with EEG Data

Clicks and recording.

Model Enhanced Classification of Serious Adverse Events

Comparing echo-location monitoring instruments

Sample Sizes for IE Power Calculations.

Mosaic artifacts reduction in SIPS Backscatter

Data collection with Internet

Data collection with Internet

Validity Scales: Patterns Observed across Veteran Affairs Settings

Presentation transcript:

Comparing instruments Nick Tregenza, Chelonia Ltd / Exeter University

False positive detections, by species group Cost of frequent servicing Major issues in static acoustic monitoring: False positive detections, by species group Cost of frequent servicing Cost of data analysis Unstandardised performance Sensitivity

Species group: Relatively easy: NBHF, beaked whales, boat sonars Much harder: dolphins (BBTs)

False positives: Reliable comparisons include adverse conditions: sand movement boat sonars at the 'wrong' frequencies dolphins in porpoise areas propellers, shrimps, etc If these are not included the applicability of the comparison is very restricted. False positive rates are most useful if they are expressed in terms of the incidence of the sounds that give rise to them. Failing that, as a rate per day.

Click by click identification is unreliable for porpoises, Narrow Band High Frequency clicks from C-POD-F. All these are well inside the PAMGuard basic criteria: < sand sand > < HP HP > … also shrimps bubbles propellers sonars … < sand dolphin > Click by click identification is unreliable for porpoises, and even worse for dolphins!

probably false possibly true Vaquita Monitoring Trials, 2008. Towed hydrophone, PAMGuard basic NBHF detector: probably false possibly true NOAA-TM-NMFS-SWFSC-459 Technical Memorandum Armando Jaramillo-Legoretta, Gustavo Cardenas, Edwyna Nieto, Paloma Ladron de Guevara, Barbara Taylor, Jay Barlow, Tim Gerrodette, Annette Henry, Nick Tregenza, Rene Swift, Tomonari Akamatsu

False positives: Reliable comparisons are made at similar false positive levels, or a seriously over-sensitive detector can appear best.

C B Which is better? A Reliable comparisons are at similar false positive levels in adverse conditions: 100% B Which is better? A true positives false positives B - twice as sensitive as A, but has 5 times as many false positives. C - is even more 'sensitive', but is counting multipath, and has 10 times as many false positives. What sensitivity do you need? Most SAM projects have plenty of detections and reducing false positives is much more valuable than getting more sensitivity. multipath = echoes and longer refracted pathways (like the twinkling of stars)

Cost of frequent servicing : time for an instrument be downloaded, recharged, and replaced at sea. If one of these is slow, do you need a second set of instruments? number of sea trips per year of data? mooring cost - can this be reduced by acoustic releases? cost of lost instruments and data.

cost of data analysis: (post download) file handling time - moving, copying, archiving, distributing and managing file sets. automated analysis time (train detection, FFTs, Long term spectral averages) validation time (quality control, no editing) editing (analysis) time - this can be huge. time cost of analyst calibration - testing if analysts give the same results.

unstandardised performance: Reliable comparisons include: availability of a standard metric - true click count, DPM, sum of train durations etc. performance by different analysts calibration of analysis. For C-PODs there is a visual validation guidance document online. For PAMGuard no guidelines are currently available. calibration of recording: hydrophone and system noise.

Click counts: need to exclude multipath: 40 dolphin clicks found by the KERNO classifier among 608 dolphin clicks: Method of excluding multipath?

Algorithm or brain? If data from instrument A is analysed by eye, a comparison needs to be made with the data from B, when also analysed by eye. Because: Analysis by eye has potentially, and often, better performance than an algorithm. The level of agreement is likely to be higher. The time cost of visual validation may differ greatly. A and B may offer different strategies for limiting the workload with big differences in time costs and exposure to analyst differences. e.g. automated analysis may be good enough to direct the analyst to a very small subset of the total recorded time. Both methods will always be black boxes. A published algorithm is still generally a black box by virtue of both its complexity and its sensitivity to uncertain input features, so it's performance cannot be predicted from the published code. Brains are always black boxes.

a test case: Examination of this data shows that big differences can exist in the results of analysis by different analysts. Some of the input data from the SoundTrap is presented here:

Comparing the performance of C-PODs and SoundTrap/PAMGUARD. POMA paper SoundTrap / PAMGuard data: The amplitude plot shows typical click train amplitude envelopes going up to 160dB with weaker multipath clicks below. The Wigner plot shows a slight downsweep and matches the typical C-POD-F finding: Typical waveform All aspects of these detections match C-POD detections of porpoises:

This data from the POMA paper is very different: This is a very long weak train. Such trains are very rarely captured from swimming animals. It shows no very distinct amplitude envelopes, and no distinctly weaker multipath clicks. The Wigner plot shows an initial rise and then a slight fall as seen in C-POD-F data from sand noise (previous slide) poor waveform: ragged amplitude profile, background of similar frequency noise. The peak amplitude profile matches this C-POD data from sand noise (elsewhere)

Comparing the performance of C-PODs and SoundTrap/PAMGUARD. POMA paper: The second set of clicks from the SoundTrap is typical of sand noise - the 'blue grass' that is more easily recognised in CPOD.exe click amplitude displays where the lack of train structure is apparent: It is currently difficult to know which results from wav file recorders plus PAMGuard to trust. No guidelines exist on how to ‘calibrate’ analysts. Critical study of PAMGuard + human filtering is essential: across analysts, including adverse conditions, and giving explicit identification of failure modes. Several international workshops have looked at these issues for C-POD data and characterised failure modes for NBHF species: dolphins, boat sonars, sediment transport noise, other noise, chink sounds, 'mini-bursts' and playbacks, and all these have been used in the KERNO and encounter classifiers. The classifiers give warnings where too many classifications are of marginal quality, or sources of false positives are relatively numerous. Note: Refereeing of papers will often fail on pattern recognition results. The scientific community needs to make progress on how to handle such data in publications

Comparing the performance of C-PODs and SoundTrap/PAMGUARD Comparing the performance of C-PODs and SoundTrap/PAMGUARD. POMA Fig 2 (b): Shows two distinct distributions: 1. Matching results ( porpoises) 2. Non-matching results C-POD + visual observation: not one of multiple studies with sighting effort ever saw the highest levels of porpoise activity entirely missed. These very high counts from PG/SoundTrap are probably false positives as above.

Comparing the performance of C-PODs and SoundTrap/PAMGUARD Comparing the performance of C-PODs and SoundTrap/PAMGUARD. POMA Fig 2 (b): Where do the differences come from in the matching distribution? 1. ‘Uncertain’ trains rejected by C-POD? 2. Multipath replicates (reflections and refractions)? Overall: 340 clicks in raw data, 11 in train One train missed, one detected, by C-POD. (1) explains part of the difference. (2) explains more. The lower number from the C-POD is more useful, as it is not determined primarily by range and propagation conditions, but more by the actual number of clicks producing the received sound.

Valuable papers on performance include: false positives in adverse conditions the time cost of analysis the consistency of analysis overall costs … and do not express false positives as a % of true positives. (because false positives still exist when true positives are zero)

Thanks