CS324e - Elements of Graphics and Visualization Java Intro / Review.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

Experiments and Variables
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CS324e - Elements of Graphics and Visualization A Little Java A Little Python.
PrasadL07IndexCompression1 Index Compression Adapted from Lectures by Prabhakar Raghavan (Yahoo, Stanford) and Christopher Manning.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
1 1.Protein structure study via residue environment – Residues Solvent Accessibility Environment in Globins Protein Family 2.Statistical linguistic study.
CSci 143 Sets & Maps Adapted from Marty Stepp, University of Washington
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved L16 (Chapter 22) Java Collections.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Review of Matrix Algebra
CSE331: Introduction to Networks and Security Lecture 17 Fall 2002.
IN350: Text properties, Zipf’s Law,and Heap’s Law. Judith A. Molka-Danielsen September 12, 2002 Notes are based on Chapter 6 of the Article Collection.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Chapter 2 – Simple Linear Regression - How. Here is a perfect scenario of what we want reality to look like for simple linear regression. Our two variables.
CSE 143 Lecture 7 Sets and Maps reading: ; 13.2 slides created by Marty Stepp
Abstract Data Types (ADTs) Data Structures The Java Collections API
Cryptography Programming Lab
Dr. Serhat Eren DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you.
Linear Trend Lines Y t = b 0 + b 1 X t Where Y t is the dependent variable being forecasted X t is the independent variable being used to explain Y. In.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Chapter 8: Regression Analysis PowerPoint Slides Prepared By: Alan Olinsky Bryant University Management Science: The Art of Modeling with Spreadsheets,
Chapter 16: Time-Series Analysis
1 What Is Forecasting? Sales will be $200 Million!
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
CSE 143 Lecture 11 Maps Grammars slides created by Alyssa Harding
Announcements: Please pass in Assignment 1 now. Please pass in Assignment 1 now. Assignment 2 posted (when due?) Assignment 2 posted (when due?)Questions?
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Building Java Programs Chapter 11 Lecture 11-1: Sets and Maps reading:
Chapter 18 Java Collections Framework
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Computer Science 209 Software Development Java Collections.
Time series Decomposition Farideh Dehkordi-Vakil.
CSC 211 Data Structures Lecture 13
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
CSE 143 Lecture 14 (B) Maps and Grammars reading: 11.3 slides created by Marty Stepp
C.Watterscsci64031 Term Frequency and IR. C.Watterscsci64032 What is a good index term Occurs only in some documents ??? –Too often – get whole doc set.
Hashing Hashing is another method for sorting and searching data.
Investigating Patterns Cornell Notes & Additional Activities.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Statistical Properties of Text
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
Dynamics of Binary Search Trees under batch insertions and deletions with duplicates ╛ BACKGROUND The complexity of many operations on Binary Search Trees.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
CPS 100e 5.1 Inheritance and Interfaces l Inheritance models an "is-a" relationship  A dog is a mammal, an ArrayList is a List, a square is a shape, …
CSE 143 Lecture 11: Sets and Maps reading:
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
AP Statistics From Randomness to Probability Chapter 14.
AP National Conference, AP CS A and AB: New/Experienced A Tall Order? Mark Stehlik
Plan for Today’s Lecture(s)
STATISTICAL ORBIT DETERMINATION Kalman (sequential) filter
BAE 5333 Applied Water Resources Statistics
Linear Regression.
This Week Review of estimation and hypothesis testing
Road Map CS Concepts Data Structures Java Language Java Collections
Building Java Programs
Quantifying the Structure of Language and Music
6-1 Introduction To Empirical Models
Maps Grammars Based on slides by Marty Stepp & Alyssa Harding
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
The Language of Physics
Building Java Programs
slides created by Alyssa Harding
Hash Maps Implementation and Applications
Presentation transcript:

CS324e - Elements of Graphics and Visualization Java Intro / Review

A1 Demo Demo of A1 expected behavior Crack a substitution cipher assumes only letters encrypted and assumes upper and lower case substitutions the same initial key based on standard frequencies allow changes to be made

Java Intro / Review Instead of going over syntax of language we will write a program to solve a non trivial problem and discuss the syntax and semantics as we go

Zipf's Law Empirical observation - word frequency Named after George Zipf, a linguist Zipf's Law: The frequency of a word is inversely proportional to its rank among all words in the body of work

Zipf's Law Example Assume the is the most frequent word in a text and it occurs 10,000 times 2 nd most frequent word expected to occur 5,000 times (if top ranked word's frequency is as expected) ½ * 10,000 = 5,000 3 rd most frequent word expected to occur 3,333 times 1/3 * 10,000 = 3,333 Expected number of occurrences of 100 th most frequent word?

Zipf's Law Out of a work with N distinct words, the predicated probability of the word with rank k is: s is constant based on distribution. In classic version of Zipf's law s = 1

Zipf's Law Assume 35,000 words – N = 35,000 assume s = 1 35,000 th harmonic number is about 11 expected frequency of 10 th word, k = 10 Assume 1,000,000 words 1,000,000 / 10 / 11 = 9,090

Alternate Formula Probability of a given word being the word with rank r R = number of distinct words Multiply by total number of words in word to get expected number of words

Approach Read "words" from a file determine frequency of each word sort words by frequency Compare actual frequency to expected frequency – many ways to define expected frequency – freq * rank = constant – estimate constant, simple – or use formulas

Java Program Eclipse IDE Create Project Create Class(es) – procedural approach – object based approach – object oriented approach

Calculating Frequencies Reading from a file – Scanner class – built in classes – documentation – exceptions Try reading into native array Try reading into ArrayList – show some of "words" better delimiter: "[^a-zA-Z']+" – regular expressions

Calculate Frequencies Don't need to store multiple copies of every word Just the number of times a given word appears Another class / data structure is useful – A Map, aka a Dictionary – key, value pairs – HashMap or TreeMap, order of keys

Using the Map Read in words, count frequencies – "wrapper" classes Read in and print out some of the map TreeMap – ordered by keys HashMap – seemingly Random order We want sorted by frequency – why can't we use another map?

Sorting by Frequency Create another class, WordPair Have the class implement the Comparable interface – define compareTo method – 2 objects / variables involved Add to ArrayList, use Collections.sort Now list start of ArrayList

Does Zipf's Law Hold? plot rank vs. frequency on a log - log scale – should be a near straight line recall freq * rank = constant Estimate constant – simple average of first 1000 terms? – simple average of all words with freq > 10? – Simple linear regression, best fit line to log - log plot

Viewing Results Compare predicted frequency and actual frequency of top 100 words and % error