Einführung in Web- und Data-Science

Slides:



Advertisements
Similar presentations
Tests of Static Asset Pricing Models
Advertisements

Statistical Inference Statistical Inference is the process of making judgments about a population based on properties of the sample Statistical Inference.
Logistic Regression.
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
CSCI 3 Introduction to Computer Science. CSCI 3 Course Description: –An overview of the fundamentals of computer science. Topics covered include number.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
ACCURATE TELEMONITORING OF PARKINSON’S DISEASE SYMPTOM SEVERITY USING SPEECH SIGNALS Schematic representation of the UPDRS estimation process Athanasios.
Hypothesis testing. Classical hypothesis testing is a statistical method that appeared in the first third of the 20 th Century, alongside the “modern”
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Artificial Intelligence: Its Roots and Scope
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Service Teaching, a Drop-in Help Centre and Statistics CDC Steele University of Manchester.
Artificial Intelligence
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
1 CS 385 Fall 2006 Chapter 1 AI: Early History and Applications.
Computer Science 210 Computer Organization Course Introduction.
Question paper 1997.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
ICORD, Rome Promises and risks of Bayesian analyses in trials of rare diseases Dr Simon Day
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
1 CSI5388 Practical Recommendations. 2 Context for our Recommendations I This discussion will take place in the context of the following three questions:
Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9
Impact of the New ASA Undergraduate Curriculum Guidelines on the Hiring of Future Undergraduates Robert Vierkant Mayo Clinic, Rochester, MN.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Prof. Roman Matousek Dr. Thao Nguyen Dr. Chris Stewart
Knowledge Representation
Measurement, Quantification and Analysis
By Arijit Chatterjee Dr
TRD 3 Bayesian engine for reproducible NMR computation
Modern Likelihood-Frequentist Inference
Lecture8 Test forcomparison of proportion
Chapter 1 The Science of Biology.
Rocky K. C. Chang September 4, 2017
Biostatistics 760 Random Thoughts.
Spring 2003 Dr. Susan Bridges
Biostatistics?.
Biostatistics 760 Random Thoughts.
Data Science Process Chapter 2 Rich's Training 11/13/2018.
What is Pattern Recognition?
Optimization Techniques for Natural Resources SEFS 540 / ESRM 490 B
CSCI 5822 Probabilistic Models of Human and Machine Learning
Statistics Branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. Practice or science of.
Introductory Econometrics
STAT120C: Final Review.
CSE 515 Statistical Methods in Computer Science
בדיקת השערות - שלב 1: השערות
(or why should we learn this stuff?)
Lecture 4: Econometric Foundations
TIGP: Computing biology II - Analysis of Categorical Data
Rick Hoyle Duke Dept. of Psychology & Neuroscience
MA171 Introduction to Probability and Statistics
Introduction to Artificial Intelligence Instructor: Dr. Eduardo Urbina
Logistic Regression.
Information Retrieval and Web Design
Why Study MCA. What is MCA? Master of Computer Applications (MCA) is a three-year (six semesters) professional Master's Degree Course in India. The course.
k-sample problems, k>2
Statistical Power.
Fractional-Random-Weight Bootstrap
Detecting Treatment by Biomarker Interaction with Binary Endpoints
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Computer Science 210 Computer Organization
Presentation transcript:

Einführung in Web- und Data-Science Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Übungen)

Statistics and Data Science [CASI 2017, p. 446 ff.] Statistical Inference: Deals with the why Mathematical foundations Algorithmics & Data Science as CASI sees it: Deals with the how Just pragmatism? ( side blow at computer science) Decision problems clearly identified in computer science w.r.t. semantics of representation formalisms Correctness of algorithms (the why) is very well an issue in computer science (and data science as subfield) Tractability issues added by CS

From Statistical Inference to Data Science…

From Statistical Inference to Data Science 1900 Karl Pearson’s chi-square paper Applied a new mathematical tool, matrix theory, in the service of statistical methodology. Pearson and Weldon went on to found Biometrika in 1901, the first recognizably modern statistics journal. Pearson’s paper, and Biometrika, launched the statistics discipline on a fifty-year march toward the mathematics pole of the triangle

From Statistical Inference to Data Science 1908 Student’s t statistic Crucial first result in small-sample “exact” inference Major influence on statistical thinking

From Statistical Inference to Data Science 1925 Fisher’s estimation paper Fundamental ideas: sufficiency, efficiency, Fisher information, maximum likelihood theory, and the notion of optimal estimation Optimality is a mark of maturity in mathematics, … ... making 1925 the year statistical inference went from a collection of ingenious techniques to a coherent discipline

From Statistical Inference to Data Science 1933 Neyman and Pearson’s paper on optimal hypothesis testing. Logical completion of Fisher’s program, it nevertheless aroused his strong antipathy (concern that mathematization was squeezing intuitive correctness out of statistical thinking)

From Statistical Inference to Data Science 1937 Neyman’s seminal paper on confidence intervals Mathematical treatment of statistical inference was a predecessor of decision theory 1950 Wald’s Statistical Decision Functions Decision theory completed the full mathematization of statistical inference

From Statistical Inference to Data Science 1962 Tukey’s paper “The future of data analysis” argued for a more application- and computation-oriented discipline Mosteller and Tukey later suggested changing the field’s name to data analysis, a prescient hint of today’s data science 1972 Cox’s proportional hazards paper Growing interest in biostatistical applications and particularly survival analysis

From Statistical Inference to Data Science 1979 The bootstrap, and later the widespread use of MCMC Electronic computation used for the extension of classic statistical inference.

From Statistical Inference to Data Science 1995 This stands for false-discovery rates and, a year later, the lasso Both are computer-intensive algorithms, firmly rooted in the ethos of statistical inference

From Statistical Inference to Data Science 2000 Microarray technology inspires enormous interest in large-scale inference, both in theory and as applied to the analysis of microbiological data. 2001 Random forests Joins boosting and the resurgence of neural nets in the ranks of machine learning prediction algorithms

From Statistical Inference to Data Science Data science: a more popular successor to Tukey and Mosteller’s “data analysis” At one extreme it seems to represent a statistics discipline without parametric probability models or formal inference. Data Science Association defines a practitioner as one who “. . . uses scientific methods to liberate and create meaning from raw data” In practice the emphasis is on algorithmic processing of large data sets for the extraction of useful information, with prediction algorithms as exemplars

From Statistical Inference to Data Science 2016b This represents the traditional line of statistical thinking, but now energized with a renewed focus on applications Of particular applied interest are biology and genetics Genome-wide association studies (GWAS) show a different face of big data. Prediction is important here, but not sufficient for the scientific understanding of disease

Computer Science and Data Science Since 1950 Logic Probability Theory Representation and Query Language Databases Algorithms and Data Structures Programming Systems (HW/SW) … Computer Science Efficient algorithms and suitable data structures Application Mathematics Computation (Aufgaben als Optimierungsprobleme dargestellt)