Mining Massive Datasets from the Allen Telescope Array Proposal for Stanford CS341 students to use IBM Cloud services to analyze massive datasets from.

Slides:



Advertisements
Similar presentations
Mining the MACHO dataset Markus Hegland, Mathematical Sciences Institute, ANU Margaret Kahn, ANU Supercomputer Facility.
Advertisements

A PowerPoint Presentation
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Dakota Johnson, Tildon Johnson, Kyle Barker Rowan County Senior High School Mentor: Mrs. Jennifer Carter Abstract Data Analysis Acknowledgements Radio.
Big Data A big step towards innovation, competition and productivity.
The Observatory delivers better spend visibility to public sector and higher education procurement and finance teams through a managed service which includes.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #12 Intelligent Digital Forensics September 30, 2009.
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
Space Technology Telescopes Chapter 18 Section 2.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
Tyson Condie.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
PSR J1400 – 1410 Jessica Pal Rowan County Senior High School Introduction Data Analysis Summary Acknowledgements Results A pulsar is a rapidly rotating.
1 1 Slide Introduction to Data Mining and Business Intelligence.
An RFI Mitigation Strategy for the Allen Telescope Array Geoffrey C. Bower UC Berkeley.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Service Computation 2010November 21-26, Lisbon.
Dr. Michael D. Featherstone Summer 2013 Introduction to e-Commerce Web Analytics.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
[ Search for Extraterrestrial Intelligence ] grUz6w/s1600/SETI_logo_CMYK.jpg Sadie.
Institute for Personal Robots in Education (IPRE)‏ CSC 170 Computing: Science and Creativity.
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Transforming video & photo collections into valuable resources John Waugaman President - Tygart Technology, Inc.
Machine Learning as a Service
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
© 2006 Pearson Education Canada Inc. 3-1 Chapter 3 Database Management PowerPoint Presentation Jack Van Deventer Ward M. Eagen.
Big Data Yuan Xue CS 292 Special topics on.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Date of download: 7/8/2016 Copyright © 2016 SPIE. All rights reserved. A scalable platform for learning and evaluating a real-time vehicle detection system.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Cloud Analytics Platforms Christian Frey. About AIDA Our mission is to advance knowledge in data analytics through research, education and outreach Our.
Using the GAVRT Radio Telescope: The SETI Project
Data mining in web applications
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Big Data Analytics and HPC Platforms
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
MIS2502: Data Analytics Advanced Analytics - Introduction
Metis Data Science Meetup:
David P. Anderson Space Sciences Lab UC Berkeley LASER
Data Mining, Distributed Computing and Event Detection at BPA
Using the GAVRT Radio Telescope: The SETI Project
Data Mining 101 with Scikit-Learn
An Arecibo HI 21-cm Absorption Survey of Rich Abell Clusters
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
What is Pattern Recognition?
DESIGN & IMPLEMENTATION
Radio Propagation Simulation Based on Automatic 3D Environment Reconstruction D. He A novel method to simulate radio propagation is presented. The method.
Presented by: Prof. Ali Jaoua
Aline Martin ECE738 Project – Spring 2005
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Course Introduction CSC 576: Data Mining.
Data Mining (Don’t worry, I am not presenting these slides; just for your reading pleasure)
Understanding Customer Behaviors with Information Technologies
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Data Mining, Distributed Computing and Event Detection at BPA
Big DATA.
Computer Services Business challenge
Big-Data Analytics with Azure HDInsight
Computer Services Business challenge
Presentation transcript:

Mining Massive Datasets from the Allen Telescope Array Proposal for Stanford CS341 students to use IBM Cloud services to analyze massive datasets from the Allen Telescope Array at Hat Creek Radio Observatory Contact: Graham Mackintosh,

IBM Backgrounder  The SETI Institute operates the Allen Telescope Array (ATA) to observe star systems for radio signals which may provide evidence of extraterrestrial intelligence.  IBM collaborating with the SETI Institute to use IBM Apache Spark services to analyze radio signal data from the ATA.  Astronomers and data scientists from around the world are using the environment to analyze millions of signal events collected from the ATA over 10 years.  Apache Spark with Jupyter Notebooks are proving to be highly effective in enabling experimental approaches to the analysis of the SETI Institute’s archive of signal data.  IBM is offering to make the entire environment available to Stanford CS341 students during the Spring 2016 quarter.

 Benefits of this proposal include:  Simple access to the Apache Spark platform to help expand the scope of CS341 projects beyond MapReduce computational methologies.  Use of the IBM Spark service at no charge, accessible on the IBM Cloud using a standard web browser.  Unlimited access to the ATA data archives, stored on the IBM cloud. for CS341

Students using for their CS341 projects will receive:  A free account on the IBM BlueMix preconfigured with Spark services and direct connections to the ATA data repository.  A “kick start” portfolio of Jupyter Notebooks that provide all the foundational Python code to query the ATA databases and read the recorded binary signals for analysis.  Access to a 200 million row relational database of ATA signal events  Access to approximately 15 million binary “complex amplitude” files that store the raw signal data from the ATA at the moment that a signal was detected  Support from the team for issues relating to platform usage (e.g. dropped Python kernels) CS341 Students and

 CS341 projects using should not restrict project goals strictly to identifying a signal of interest.  Much of the ongoing work in the initiative is focused on the inverse problem: that of improving signal classification methods to eliminate signals that are human radio frequency interference (RFI).  CS341 students may produce novel and valuable analytic results, which will be added to the repository of Spark notebooks for use by astronomers and data scientist from around the world.  Example: Igor Nikitin at the Fraunhofer Institute for Algorithms and Scientific Computing used ATA data to demonstrate how to apply the Radon transform to greatly improve the signal-to-noise ratio of narrow-band signals, thereby permitting the detection and classification of many new signals. CS341 Project Goals

Examples of potential CS341 student projects include:  Supervised machine learning of spectrogram images to classify signals according to known categories (e.g. radar interference, narrow-band zero drift, aircraft RFI).  Analysis of the 200 million signal event database for targets that show unusual consistency of signal features (e.g. consistent power) over long periods of time. Outliers could be reviewed by the radio telescope operations staff for potential re-observation.  Extracting scalar features from signal recordings with PCA and/or MDA results that indicate these features will help to spread and segment signals in useful ways for further analysis by scientists. CS341 - Examples