Week 6 Discussion Word Cloud.

Slides:



Advertisements
Similar presentations
Character Arrays (Single-Dimensional Arrays) A char data type is needed to hold a single character. To store a string we have to use a single-dimensional.
Advertisements

Today’s lecture Review of Chapter 1 Go over homework exercises for chapter 1.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Programmer-defined classes Part 2. Topics Returning objects from methods The this keyword Overloading methods Class methods Packaging classes Javadoc.
Chapter 7 Strings F To process strings using the String class, the StringBuffer class, and the StringTokenizer class. F To use the String class to process.
1 Strings and Text I/O. 2 Motivations Often you encounter the problems that involve string processing and file input and output. Suppose you need to write.
IT151: Introduction to Programming
A Review. a review of lessons learned so far… ( 2 steps forward - 1 step back) Software Development Cycle: design, implement, test, debug, document Large.
CS 206 Introduction to Computer Science II 01 / 21 / 2009 Instructor: Michael Eckmann.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Declaring Variables You must first declare a variable before you can use it! Declaring involves: – Establishing the variable’s spot in memory – Specifying.
CMT Programming Software Applications
CS1061 C Programming Lecture 2: A Few Simple Programs A. O’Riordan, 2004.
Testing a program Remove syntax and link errors: Look at compiler comments where errors occurred and check program around these lines Run time errors:
Chapter 3: Introduction to C Programming Language C development environment A simple program example Characters and tokens Structure of a C program –comment.
Java Applications & Program Design
Week 4-5 Java Programming. Loops What is a loop? Loop is code that repeats itself a certain number of times There are two types of loops: For loop Used.
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
Iteration. Adding CDs to Vic Stack In many of the programs you write, you would like to have a CD on the stack before the program runs. To do this, you.
The Java Programming Language
CSE 143 Lecture 11 Maps Grammars slides created by Alyssa Harding
Computer Programming 2 Lab(1) I.Fatimah Alzahrani.
Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein.
6/3/2016 CSI Chapter 02 1 Introduction of Flow of Control There are times when you need to vary the way your program executes based on given input.
Chapter 3 Introduction To Java. OBJECTIVES Packages & Libraries Statements Comments Bytecode, compiler, interpreter Outputting print() & println() Formatting.
Winter 2006CISC121 - Prof. McLeod1 Last Time Wrapper classes JFileChooser (along with JOptionPane, and JColorChooser !) Text File Output.
FUNCTIONS. Midterm questions (1-10) review 1. Every line in a C program should end with a semicolon. 2. In C language lowercase letters are significant.
Variables in C Topics  Naming Variables  Declaring Variables  Using Variables  The Assignment Statement Reading  Sections
CMSC 104, Version 8/061L09VariablesInC.ppt Variables in C Topics Naming Variables Declaring Variables Using Variables The Assignment Statement Reading.
1 Flow of Control Chapter 5. 2 Objectives You will be able to: Use the Java "if" statement to control flow of control within your program.  Use the Java.
CS 115 OBJECT ORIENTED PROGRAMMING I LECTURE 11 GEORGE KOUTSOGIANNAKIS 1 Copyright: 2015 Illinois Institute of Technology_ George Koutsogiannakis.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
More on Arrays Review of Arrays of ints, doubles, chars
Computer Programming Your First Java Program: HelloWorld.java.
C++ Memory Management – Homework Exercises
Chapter VII: Arrays.
Information and Computer Sciences University of Hawaii, Manoa
Introduction to programming in java
Strings CSCI 112: Programming in C.
Working with Java.
Lecture 19 Strings and Regular Expressions
COMP 170 – Introduction to Object Oriented Programming
Introduction to Computer Science / Procedural – 67130
Ch 16: Data Structures - Set and Map
Lecture Note Set 1 Thursday 12-May-05
CompSci 230 Software Construction
Intro to PHP & Variables
Engineering Innovation Center
Detailed File I/O Examples Focus on Input
Lecture 4B More Repetition Richard Gesick
An Introduction to Java – Part I
Subroutines Idea: useful code can be saved and re-used, with different data values Example: Our function to find the largest element of an array might.
Advanced Programming in Java
Statements, Comments & Simple Arithmetic
Topics Introduction to File Input and Output
Arrays We often want to organize objects or primitive data in a way that makes them easy to access and change. An array is simple but powerful way to.
Unit 6 Working with files. Unit 6 Working with files.
MSIS 655 Advanced Business Applications Programming
Maps Grammars Based on slides by Marty Stepp & Alyssa Harding
Coding Concepts (Data Structures)
OBJECT ORIENTED PROGRAMMING I LECTURE 11 GEORGE KOUTSOGIANNAKIS
slides created by Alyssa Harding
CISC101 Reminders All assignments are now posted.
Creating and Modifying Text part 3
Unit 3: Variables in Java
Variables in C Topics Naming Variables Declaring Variables
Topics Introduction to File Input and Output
Chapter 2: Java Fundamentals cont’d
Presentation transcript:

Week 6 Discussion Word Cloud

Word Cloud Steps to completing Word Cloud: Read in command line args Parse the common.txt file Parse the input file and record word frequencies Account for “common” words Sort list Print output

String[] args Have you ever wondered what the formal parameter “String[] args” in the main method did? EX: public static void main(String[] args) This array holds String values of what the user passed in from the command line, allowing us to get user input without having to use a Scanner. java WordCloud poems.txt 10 In this case args[0] is the filename to read from (poems.txt) args[1] is the number of words to print (10)

Command Line Arguments These are arguments that can be passed in via the command line. WordCloud will take two command line arguments: the input file and the number of words to print out java WordCloud input_file.txt 10 this will print out the 10 most common words in input_file.txt These arguments are stored into an array of Strings, and is passed into the main method as String[] args. args[0] = “input_file.txt”, args[1] = “10” NOTE THAT THE NUMBER IS BEING STORED AS AN STRING.

C.L.A. for WordCloud WordCloud will be expecting 2 command line arguments, the filename and the # of words. You should save these values in a local variable in order to use them. The filename should be saved as a string, but the #words should be saved as an integer. But wait… all command line arguments belong in a String array… Get the numeric value of the #words by using the Integer.parseInt() method. EX: String str = “45”; int i = Integer.parseInt(str); // i = 45

Parsing the Input File By now, you all should be familiar with using filestreams/Scanners. This assignment will require reading in and parsing two files, one of which will be passed in from the command line. The other file will always be common.txt, which is located in the directory: /home/linux/ieng6/cs11wb/public/HW6/ Scanner scnr = new Scanner(new File(“___insert file name here___”)); Must be in a try/catch block, or will not compile Use Scanner.next() to get each String delimited by a whitespace Use Scanner.nextLine() to get each String delimited by newline

Formatting the String Scanner.next() reads in every character until it reaches whitespace (spaces, tabs, newlines), so it will also read in punctuation. For our program, we don’t want to keep anything that is not an alphabetic symbol. Also, we only want to use lowercase letters. The following methods found in the String class will be helpful for formatting: replaceAll() toLowerCase()

replaceAll() and toLowerCase() replaceAll(String s1, String s2) - replaces all characters in s2 defined by the string pattern s1 String str = “ToDAY!?/@$”; String str2 =s.replaceAll(“[^a-zA-Z]”, “”); //str2 = “ToDAY” toLowerCase(String s) - changes all letters in s to lowercase letters String str3 = str2.toLowerCase(); //str3 = “today”

Ignoring “Common” Words There are 3 strategies your program can ignore words found in common.txt: Ignore common words when you are scanning the input file - RECOMMENDED Remove all common words from the HashMap after you scan the input file Don’t print them

common.txt However, in this program, we will be ignoring “common” words (like “the”, “an”, “is” etc.). We have given you a file named “common.txt” in the public directory which contains a list of all these words. We must therefore hold read through “common.txt” and save all the word in a data structure ArrayList HashSet - RECOMMENDED Remember, we want to go for SPEED

HashSets What are HashSets? HashSets are a data structure similar to ArrayLists and arrays. There are a few differences however. Each element of a HashSet is known as a KEY A HashSet has no order. You cannot iterate through it like you would an array. HashSets are really fast. Data can be access from them in O(1) time.

Reading/Storing “common.txt” Only care about whether word EXISTS, and not about anything else Therefore use HashSet add() to add elements contains() to check if element exists HashSet<String> common = new HashSet<String>(); Go through each word in “common.txt.” and store it in a HashSet

Reading/Storing Poems You will need to both store the word you parsed from the file AND the number of times that word appeared in the file. We recommend putting this logic in its own method. You can use two arrays or ArrayLists (one for words, one for frequencies), but it is very annoying to retrieve a word’s frequency from the other list We recommend you use a HashMap for this. (You could also use an array or ArrayList of Pairs. However, we will be discussing HashMaps today). Info on HashMaps can be found @ https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html

HashMap, Keys, and Values What are HashMaps? HashMaps are a data structure similar to ArrayLists and arrays. There are a few differences however. Each element of a HashMap is a Pair value, containing a Key and Value. A HashMap has no order. You cannot iterate through it like you would an array. HashMaps are really fast. Data can be access from them in O(1) time.

Using HashMaps You can create a HashMap using the constructor like so: HashMap<String, Integer> map = new HashMap<String, Integer>(); Integer num = new Integer(42); To put things into the map, you can use the put() method. If you wanted to store the pair (“Blaze”, 42) into the map, it would look like: map.put(“Blaze”, 42); // this stores the pair in map You can later change the value that a key is associated with using put() as well. map.put(“Blaze”, 420); // now “Blaze” is mapped to 420

Accessing Elements of the HashMap The most important part of HashMaps is getting back the value that is associated with the key. For example, let’s say we want to get the value for the key “Blaze”. int i = map.get(“Blaze”); // i = 42 You can also check to see if a key object is already in the map. boolean b = map.contains(“Blaze”); // b = true

HashMaps in WordCloud In order to associate each word with its frequency, we recommend you all use HashMaps. Not only will they make your program faster, but also easier to write. Your HashMap should map a String to an Int (aka it should look like: HashMap<String, int>). Make it a class variable so all your methods will have access to it. As you are reading in from the input file, check if the current word you scanned is already in the map via the contains() method. If it isn’t in the map, put it in there! Its frequency should be 1, since it’s the first occurrence of that word in the file. If it is already in the map, then you want to update the frequency. Increase the value that the word is associated with by 1.

Printing Output map.keySet() will give you a Set object containing all of the keys (Strings) Use this Set to construct an ArrayList to hold all the Strings Iterate through this ArrayList to find out which word has the highest frequency (you can check this using your map) . Print the most frequent word and its frequency. Then remove this word from the ArrayList. Repeat steps 1 and 2 #words times (#words is the command line argument passed in.

Pseudo-code ArrayList<String> words= new ArrayList<String>(map.keySet()); for (i=0 to #words) { int maxFrequency = 0; String mostFrequent = null; for (each word in the array list named words) { if (word isn’t common && frequency of word > maxFrequency) { maxFrequency = frequency of word; mostFrequent = current word; } print mostFrequent and maxFrequency remove mostFrequent from the array list words

Runtime of Other Solutions Printing algorithm we gave you is O(n^2) Can you make it faster? HINT: Most sorts take O(nlogn) Look up binary sort/merge sort ALTERNATIVE SOLUTION: Create a class that has custom sorting

Style: Split Your Logic into Methods Code looks nicer when it’s readable. Help make it readable by using methods. Pls Method for printing the words, removing the common words, parsing the input file, parsing common.txt, etc.

Style: Other Guidelines (again) Have class/file/method headers (method headers MUST be in javadoc format. look it up) Have inline comments (comments explaining how your code works) Have meaningful variable names (don’t just name them all a,b,c,d…) Don’t use “magic numbers” (Use constants. If you’re not sure if a number is magic, make it a constant anyways) Have logical blank space separators (space out your code so it’s readable) Don’t go over 80 characters in ANY line for ANY FILE Indent properly (use curly braces to make sure your indentations are correct)

Questions?