Presentation is loading. Please wait.

Presentation is loading. Please wait.

GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics.

Similar presentations


Presentation on theme: "GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics."— Presentation transcript:

1

2 GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics

3 2  n-gram viewer  http://books.google.com/ngrams/info http://books.google.com/ngrams/info  n-gram datasets  http://storage.googleapis.com/books/ngrams/books/ datasetsv2.html http://storage.googleapis.com/books/ngrams/books/ datasetsv2.html Google Books N-Grams

4 3  Data is compressed  Fields are separated by tabs ('\t')  One record per line  newline character ('\n') ends record  N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram File Format for Google’s N-Grams

5 4  Data created July 2012  Version 2 file format N-gram \t year \t match_count \t volume_count \n  N-gram:1gram, 2gram, 3gram, 4gram, 5gram  year: publication year  match_count: occurrences for that year  volume_count: number of books where n-gram occurred Version 2

6 This is the end of part two. Please proceed to part three. End of Part Two 5


Download ppt "GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics."

Similar presentations


Ads by Google