Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline.

Similar presentations


Presentation on theme: "A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline."— Presentation transcript:

1 A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline A Walk through a candidate pipeline Requirements: Issues Proposal for a plan of work

2 The functions of a pipeline Why have one? –Understanding a new feature or establishing confidence in detection will require a fair amount of manual work (human intensive). –Large data rate (main+auxiliary channels) implies that an automated tool that helps in focussing our attention is essential. Definition: An automated tool to point out “interesting” segments. – Not meant for detector commissioning stage data. –Types: Data/detector Characterization, Data preparation or conditioning. –May not be possible to cleanly separate the design process. –Byproducts: routine, uninteresting information (data summaries) to support data mining tasks. Open Issue : What is interesting? – Automated tool means precise definition of interesting features required. – Example: Change in PSD, Transients, Change in cross-couplings, …

3 Pipeline: Not just a sum of its parts Simple Example –Transient test characterized without studying effect on/of line noise. – Line removal tool characterized without studying effect on/of transients. –When real data is passed through the line removal tool followed by the transient test, the result will be different from transient test followed by line removal. There can exist other “cross-couplings” which will affect the overall performance of a pipeline. Computational costs need not be a simple sum of parts. Pipeline design and characterization will involve more than the study of tools in isolation.

4 Analyzing pipeline performance Basic criteria: The pipeline should not make too many mistakes. On the other hand, it should not lose interesting segments. – Extremely reliable statistical characterization will be required. Open issue: Metrics for pipeline performance (or pipeline calibration). –Metric must include: False alarm and Detection, dependence on a priori modeling of data, Computational costs, … –For data preparation pipeline: Calibrate by injecting GW signals into input. –For data/detector characterization pipeline: ? Bottom Line: Lot of experience with simulated and real data is required.

5 A Candidate Pipeline Design Status: At the stage of a blueprint that can be implemented. –Several new tools identified that need to be developed. (e.g., need a line removal method which is unaffected by transients.) –The blueprint is concrete enough to begin computational cost studies and statistical characterization studies. Origins –The word “pipeline” has been used on several occasions (e.g., LSC Data Analysis White Paper) but this is the first concrete design. –1999: SDM Commissioned to design one as part of the 40m/TAMA coincidence analysis project. Important: A pipeline will affect planning for other data analysis components. – Examples: Software/hardware environment, User interfaces, A sophisticated database or simple sequential files, Interfaces to DAQ,...

6 Data/Detector Characterization Pipeline

7 Requirements: Issues Computing. –Should work online. –Memory requirements might be non-trivial if database access overheads turn out to be large. Implementation Language and environment. Within LDAS (adapted to GEO)? Language: C++ TRIANA? JAVA DMT? VEGA? C++ Database. Not an issue confined to this pipeline alone. –Need depends on what kind of data mining tasks will be required. –Examples : (1) Collect data with a particular type of transient (2) Store information about new types of features. Others. –Lots of ideas and guidelines from users required for the design phase. – Code writing and testing phase will be manpower intensive.

8 Proposal for a plan of work (fastest) Almost all components available in MATLAB. Use sequential files instead of relational database. Implement as a large MATLAB program. Come up with some metrics of performance. Test against simulated and some real data. If (coincidence run with LIGO), aim to produce X hours of characterized data using this MATLAB code. In the meantime, work on related issues and requirements definition.

9 Conclusions Large amount of data makes it necessary to have a Pipeline in order to direct our attention to where it is really required. Pipeline design and characterization requires more than listing tools and studying them in isolation. Pipeline designing can identify missing features. A concrete design now exists. Several candidate pipelines must be generated and compared. What is interesting? Guidelines, Ideas and experience with real data required to evolve an answer.


Download ppt "A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline."

Similar presentations


Ads by Google