Calculating Word Frequency in a Document.   11/6( 四 ) 這個星期四小考, 5. Threaded Binary.

Slides:

Advertisements

Similar presentations

Advertisements

1 生物計算期末作業暨南大學資訊工程系 2003/05/13. 2 compare f1 f2  只比較兩個檔案 f1 與 f2 ，比完後將結果輸出。 compare directory  以兩兩比對的方式，比對一個目錄下所有檔案的相似程度。  將相似度很高的檔案做成報表輸出，報表中至少要.

Divide-and-Conquer. 什麼是 divide-and-conquer ？ Divide 就是把問題分割 Conquer 則是把答案結合起來.

: Arrange the Numbers ★★★☆☆ 題組： Contest Archive with Online Judge 題號： 11481: Arrange the Numbers 解題者：李重儀解題日期： 2008 年 9 月 13 日題意：將數列 {1,2,3, …,N}

建立使用案例敘述 --Use Case Narrative

:Word Morphing ★★☆☆☆ 題組： Problem Set Archive with Online Judge 題號： 10508:word morphing 解題者：楊家豪解題日期： 2006 年 5 月 21 日題意：第一行給你兩個正整數, 第一個代表下面會出現幾個字串,

1 Q10276: Hanoi Tower Troubles Again! 星級 : ★★★ 題組： Online-judge.uva.es PROBLEM SET Volume CII 題號： Q10276: Hanoi Tower Troubles Again! 解題者：薛祖淵解題日期： 2006.

YOSHI Package and Import. 加油 ! 陳欣怡 ! 世界上有多少陳欣怡 ? 2004 的大學聯考，榜單上叫「陳怡君」的，總共有 64 人，有些人考上了好學校，有些卻名落孫山，同樣姓名卻有不一樣的際遇。雅婷．勝.

期末專題 - 吊人頭遊戲第 35 組組員 : 電機系 49841XXXX XXX 電機系 49841OOOO OOO.

創意思考學－－不斷找尋出路的衝動. 我們活在創意史 1/5 從前，在遠古時期，我們的祖先猿人，總要外出覓食；但外面的世界充滿危險，尤其是猛獸遍布，所以猿人外出總是凶多吉少。

What is static?. Static? 靜態 ? class Test { static int staticX; int instanceX; public Test(int var1, int var2) { this.staticX = var1; this.instanceX =

OAQ 的期中考寶石奇兵容許的排序演算法： Θ(n lg n) ， ex: merge sort, quick sort, heap sort…. 排序的標準： (1) 分數愈大愈前面 (2) 時間愈短愈前面 (3) 字典排序愈小愈前面最後再循序找名次就可以了。

交易的動機. 討論：  為什麼人要進行交易？討論：  試想像你走進一間唱片店，付了＄１００給店主，買下你最喜愛歌手的最新唱片。  誰會得益 ?  誰又會有所損失？

: Determine it 星等：★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 10520: Determine it 解題者：林祺光解題日期： 2006 年 5 月 22 日題意：給兩個值 a 1,n 和 n a i,j.

五小專案黃詩晴章乃云. 目錄計算機智慧盤拼圖記憶大挑戰數學題庫心得參考文獻.

Graph V(G 1 )={0, 1, 2, 3, 4, 5, 6, 7, 8, 9} E(G 1 )={(0, 2), (0, 3), (1, 4), (2, 3), (2, 5), (2, 6), (3, 6), (3, 7), (4, 7), (5, 6), (5,

: OPENING DOORS ? 題組： Problem Set Archive with Online Judge 題號： 10606: OPENING DOORS 解題者：侯沛彣解題日期： 2006 年 6 月 11 日題意： - 某間學校有 N 個學生，每個學生都有自己的衣物櫃.

STAT0_sampling Random Sampling  母體： Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣本，若每一樣本被抽出的機率是一樣的，這樣本稱為隨機樣本 (random sample)

: Matrix Decompressing ★★★★☆ 題組： Contest Volumes with Online Judge 題號： 11082: Matrix Decompressing 解題者：蔡權昱、劉洙愷解題日期： 2008 年 4 月 18 日題意：假設有一矩陣 R*C,

Lecture Note of 9/29 jinnjy. Outline Remark of “Central Concepts of Automata Theory” (Page 1 of handout) The properties of DFA, NFA,  -NFA.

平均值檢定假設檢定 One Sample 平均值是否為 u. One Sample—1 工廠甲過去向 A 公司購買原料, 平均交貨日約為 4.94 日, 標準差現在 A 公司改組, 甲工廠繼續向 A 公司購買, 隨機抽取 8 次採購, 平均日數為 4.29 日, 請問 A 公.

1 網路同步學習如何使用中山大學管理學院知識管理平台愷中製作. 2 如何登入中山大學網路學習平台 1. 首先, 請輸入 2. 點選申請帳號, 依照螢幕所示, 輸入個人資訊.

Hint of Homework 4 jinnjy. Outline Hint of exercise 3.18.

Network Connections ★★★☆☆ 題組： Contest Archive with Online Judge 題號： Network Connections 解題者：蔡宗翰解題日期： 2008 年 10 月 20 日題意：給你電腦之間互相連線的狀況後，題.

Introduction to Java Programming Lecture 17 Abstract Classes & Interfaces.

: The largest Clique ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11324: The largest Clique 解題者：李重儀解題日期： 2008 年 11 月 24 日題意：簡單來說，給你一個 directed.

Introduction to Java Programming Lecture 10 Array I Declaring, Creating, and Initializing Arrays.

Matlab Assignment Due Assignment 兩個 matlab 程式 : Eigenface ： Eigenvector 和 eigenvalue 的應用. Fractal ： Affine transform( rotation, translation,

: War on Weather ★★☆☆☆ 題組： Contest Volumes Archive with Online Judge 題號： 10915: War on Weather 解題者：陳明凱題意：題目總共會給你 k 個點座標代表殺手衛星的位置，距離地球表面最少 50 公里以上，並且會給你.

Ubiquitous News(Unews) 的設計與實作指導教授：黃毅然教授學生：葉雅琳系別：資訊工程學系.

: Happy Number ★ ? 題組： Problem Set Archive with Online Judge 題號： 10591: Happy Number 解題者：陳瀅文解題日期： 2006 年 6 月 6 日題意：判斷一個正整數 N 是否為 Happy Number.

行政院國家科學委員會工程技術發展處自動化學門 * 試以國立成功大學製造工程研究所鄭芳田教授產學合作計畫 : 智慧預測保養系統之設計與實作成果報告盤點為範例國科會工程處專題計畫成果典藏自動化學門成果報告盤點範例.

: Playing War ★★★★☆ 題組： Problem Set Archive with Online Judge 題號： 11061: Playing War 解題者：陳盈村解題日期： 2008 年 3 月 14 日題意：在此遊戲中，有一類玩家一旦開始攻擊，就會不停攻擊同一對手，直到全滅對方或無法再.

: Count DePrimes ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11408: Count DePrimes 解題者：李育賢解題日期： 2008 年 9 月 2 日題意：題目會給你二個數字 a,b( 2 ≦ a ≦ 5,000,000,a.

North Point Government Primary PM School 北角官立下午小學應用 ‘ 基本能力學生評估 ’ 及 ‘ 網上學與教支援系統 ’ 經驗分享.

:Nuts for nuts..Nuts for nuts.. ★★★★☆ 題組： Problem Set Archive with Online Judge 題號： 10944:Nuts for nuts.. 解題者：楊家豪解題日期： 2006 年 2 月題意：給定兩個正整數 x,y.

資料結構實習-一參數傳遞.

6-2 認識元件庫與內建元件庫 Flash 的元件庫分兩種, 一種是每個動畫專屬的元件庫 (Library) ；另一種則是內建元件庫 (Common Libraries), 兩者皆可透過『視窗』功能表來開啟, 以下即為您說明。

法律系系所科助之血淚辛酸史劉蕙綺. 系上推行困難處 ( 學期初 ) 傳統習慣：法律系以教科書為主很多老師沒有電子檔案專、兼任老師使用平台的意願因老師多為資深老師，因此在使用電腦部份可能比較需要幫助通常學生知道訊息的來源是藉由 BBS 或者是系上的系板，使用意願會降低.

1 Introduction to Java Programming Lecture 2: Basics of Java Programming Spring 2008.

公用品.  該物品的數量不會因一人的消費而受到影響，它可以同時地被多人享用。角色分配  兩位同學當我的助手，負責：  其餘各人是投資者，每人擁有 $100 ，可以投資在兩種資產上。  記錄  計算  協助同學討論.

: Problem G e-Coins ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 10306: Problem G e-Coins 解題者：陳瀅文解題日期： 2006 年 5 月 2 日題意：給定一個正整數 S (0

函式 Function Part.2 東海大學物理系‧資訊教育施奇廷. 遞迴（ Recursion ）函式可以「呼叫自己」，這種動作稱為「遞迴」此程式的執行結果相當於陷入無窮迴圈，無法停止（只能按 Ctrl-C ）這給我們一個暗示：函式的遞迴呼叫可以達到部分迴圈的效果.

: GCD - Extreme II ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11426: GCD - Extreme II 解題者：蔡宗翰解題日期： 2008 年 9 月 19 日題意：最多 20,000 組測資，題目會給一個數字.

JAVA 程式設計與資料結構第二十章 Searching. Sequential Searching Sequential Searching 是最簡單的一種搜尋法，此演算法可應用在 Array 或是 Linked List 此等資料結構。 Sequential Searching 的 worst-case.

資料結構實習-二.

演算法 8-1 最大數及最小數找法 8-2 排序 8-3 二元搜尋法.

2010 MCML introduction 製作日期： 2010/9/10 製作人 : 胡名霞.

資訊理論授課老師 : 陳建源研究室 : 法 401 網站

: Flip Sort ★★☆☆☆ 題組： Problem Set Archive with Online Judge 題號： 10327: Flip Sort 解題者：歐子揚解題日期： 2010 年 2 月 26 日題意：在這個問題中使用一種排序方式 (Flip) ，意思就是只能交換相鄰的.

845: Gas Station Numbers ★★★ 題組： Problem Set Archive with Online Judge 題號： 845: Gas Station Numbers. 解題者：張維珊解題日期： 2006 年 2 月題意：將輸入的數字，經過重新排列組合或旋轉數字，得到比原先的數字大，

國立中央大學. 資訊管理系范錚強 Inception 公司治理模擬.

Linguistics phonetic symbols. 先下載 IPA 字型檔案，執行安裝。由於這個程式的字型目錄設定錯誤，所以等重新開機時就會發現字型消失。所以必須根據以下步驟來讓 Windows 加入 IPA 字型。

File I/O 範例講解授課：林哲嘉日期： 2009/04/29. 範例：上機考第三題 Add 部分 1. 將檔案的資料顯示在畫面 2. 將要加入檔案加到資料的尾端.

: Function Overloading ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 11032:Function Overloading 解題者：許智祺解題日期： 2007 年 5 月 8 日題意：判對輸入之數字是否為.

Extreme Discrete Summation ★★★★☆ 題組： Contest Archive with Online Judge 題號： Extreme Discrete Summation 解題者：蔡宗翰解題日期： 2008 年 10 月 13 日.

論文研討 2 學分授課教師：吳俊概. 第一節論文發表的目的第二節論文發表的歷程第三節投稿過程第四節退稿處理學術期刊論文的製作與發表.

1 Introduction to Java Programming Lecture 2: Basics of Java Programming Spring 2009.

Teacher : Ing-Jer Huang TA : Chien-Hung Chen 2015/6/30 Course Embedded Systems : Principles and Implementations Weekly Preview Question CH7.1~CH /12/26.

:Stupid Sequence ★★★☆☆ 題組： Contest Archive with Online Judge 題號： 11319: Stupid Sequence 解題者：李育賢解題日期： 2008 年 11 月 23 日題意：一個公式 f(x)=a 0 +a 1 x+a.

資料結構實習-六.

1 Introduction to Java Programming Lecture 2: Basics of Java Programming Spring 2010.

: SAM I AM ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11419: SAM I AM 解題者：李重儀解題日期： 2008 年 9 月 11 日題意：簡單的說，就是一個長方形的廟裡面有敵人，然後可以橫的方向開砲或縱向開砲，每次開砲可以.

:Count the Trees ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 10007:Count the Trees 解題者：楊家豪解題日期： 2006 年 3 月題意：給 n 個點, 每一個點有自己的 Label,

: Finding Paths in Grid ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11486: Finding Paths in Grid 解題者：李重儀解題日期： 2008 年 10 月 14 日題意：給一個 7 個 column.

:Problem E.Stone Game ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 10165: Problem E.Stone Game 解題者：李濟宇解題日期： 2006 年 3 月 26 日題意： Jack 與 Jim.

幼兒行為觀察與記錄第八章事件取樣法.

Chapter 12 Estimation 統計估計. Inferential statistics Parametric statistics 母數統計 ( 母體為常態或大樣本 ) 假設檢定 hypothesis testing  對有關母體參數的假設，利用樣本資料，決定接受或不接受該假設的方法.

: How many 0's? ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 11038: How many 0’s? 解題者：楊鵬宇解題日期： 2007 年 5 月 15 日題意：寫下題目給的 m 與 n(m

! !美洲華語李雅莉老師製作 TextVocabularyidiomStoryChallenge $100 $200 $300 $400 $500 $600 $100 $200 $300 $400 $500 $600 $100 $200 $300 $400 $500 $600 $100 $200.

Presentation transcript:

Calculating Word Frequency in a Document

  11/6( 四 ) 這個星期四小考, 5. Threaded Binary Tree 不考  11/15( 六 ) 10:10~12:00 期中考！

 有關多一行的問題..  >> version ◦ ifstream input(argv[1]); ◦ while (!input.eof() && input.peek() > 0) { ◦ input >> buf; ◦ cout << buf ; ◦ input >> buf; ◦ input.get(); /* 拿走 ‘\n’ 這個 character */ ◦ cout << " " << buf << endl; ◦ }

 Getline version ◦ ifstream input(argv[1]); ◦ while (!input.eof()) { ◦ input.getline(buf, 500); ◦ if (input.gcount() > 0) /* 判斷是不是有拿到東西了 */ ◦ cout << buf << endl; ◦ }  Another one ◦ ifstream input(argv[1]); ◦ while (input.getline(buf, 500)) { ◦ cout << buf << endl; ◦ }

 有關於出現的問題 ◦ 看到 demo 時候出現就是你把 ‘\0’ ( 就是 0) output 到檔案中了.. ◦ 以後多出這種 demo 程式就不會過, 就以錯誤計算  How to fix ? ◦ 最常發生的就是沒有計算好 buffer/string 長度就 output 到檔案中. ◦ int i; FILE* fw; char *a = "123"; ◦ fw = fopen(argv[1], "w"); ◦ /* 這樣不會 output 出 */ ◦ for(i=0; i<3; i++) fprintf(fw, "%c", a[i]); ◦ /* 這樣就會 output 出 */ ◦ for(i=0; i<4; i++) fprintf(fw, "%c", a[i]); ◦ fclose(fw); 123\0

 補 demo project 1 請先 upload code ftp://mpc.cs.nctu.edu.tw, 開一個自己學號的目錄. ftp://mpc.cs.nctu.edu.tw  第一次 demo 成績 :

 Input: a text file and a stop words list ◦ Using argc and argv ◦./a.out stopword textfile  Output: pairs of word and the number of their occurrence ◦ To stdout (the screen)

 Text file (without stop word) Hello, I ’ m Billy, not bi|ly or 6illy or b.  Output ◦ Hello,:1 ◦ I’m:1 ◦ Billy,: ◦ not:1 ◦ bi|ly: 1 ◦ or: 2 ◦ 6illy: 1 ◦ b.: 1

 Text file (same)  Stop word list ◦ and ◦ not ◦ or  Output ◦ Hello,:1 ◦ I’m:1 ◦ Billy,: ◦ bi|ly: 1 ◦ 6illy: 1 ◦ b.: 1

 Text file ◦ a b c d e f g h i j a b c d e  Stop words list ◦ a b c d  Output ◦ e:2 ; f:1 ; g:1 ; h:1 ; i:1 ; j:1

 Input ◦ Text file  Every words are spited by ‘ ‘, ’ \t ’, or ‘ \n ’.  Case sensitive.  Do and do are different words  There ’ s at most 2000 chars in one line.  There will be no Chinese input.  Not only one line in a text file.  There might be consecutive ‘\t’ or ‘ ‘ or ‘\n’.  Program executive time are limited.

 Input ◦ Stop words list  One word one line  No space, ’ \t ’ in one line  No more than 2000 chars one line  Correct ◦ Haha ◦ Hehe ◦ kerker  Incorrect ◦ 囧 oo ◦ A b

 Word occurrence ◦ String+ ’ ‘ +number+ ’’ \n ’ A 3 B 5  String orders won’t matter. B 5 A 3

 You can use any data structure to store the pair (word, occurrence), such like an array. (watch out about the large case)  One array for your string, another for the occurrence  Your data structure must be fast in insertion and selection (search).

 We ’ ll use program to judge your homework ◦ Please take care about the I/O format  You can not read the whole file in one time ◦ You have to read at most one line in one time  We ’ ll release some test data.  Due: 11/21  Your bonus will depend on the efficiency of your program

 Large case ◦ A lot of different words (more than ) ◦ A lot of words in a text file ◦ 30% ◦ One of them will be released  10% per test case  We will release 2 normal test case and 1 large test case for testing.

 Some simple algorithm  Assume STOPWORD has N word, TEXTFILE has M word.  We build SW_LIST to store stop words, TXT_LIST to store text file words.

 Read in STOPWORD, store it as SW_LIST  foreach ( word read from TEXTFILE )  {  if ( the word is in SW_LIST )  then continue to read another word.  else ( the word is not in SW_LIST )  then  if ( the word is in TXT_LIST )  then add count of the word 1  else ( the word is not in TXT_LIST )  then insert word into TXT_LIST  } O(N) O(M) O(N)

 這個作業寫的比較快的會有 Bonus.  到時候會把大家的程式拿到某台神秘的工作站上面跑, 看誰快誰慢.  如果對於加分部份的公平性有疑問請在 11/6( 四 ) 上課前提出.

 先到 ftp://mpc.cs.nctu.edu.tw 建立自己學號的資料夾.ftp://mpc.cs.nctu.edu.tw  上傳可 compile, run 的 C/C++ source code 檔案到 ftp://mpc.cs.nctu.edu.twftp://mpc.cs.nctu.edu.tw

 Any questions ?