Download presentation
Presentation is loading. Please wait.
Published bySylvain Adam St-Denis Modified over 5 years ago
1
What is Serialization? Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing to persistent storage. Deserialization is the reverse process of turning a byte stream back into a series of structured objects.
2
Advantages of Serialization
Compact - A compact format makes the best use of network bandwidth, which is the most scarce resource in a data center. Fast - Interprocess communication forms the backbone for a distributed system, so it is essential that there is as little performance overhead as possible for the serialization and deserialization process. Extensible - Protocols change over time to meet new requirements, so it should be straightforward to evolve the protocol in a controlled manner for clients and servers. Interoperable - For some systems, it is desirable to be able to support clients that are written in different languages to the server, so the format needs to be designed to make this possible.
3
What is Writable? Hadoop defines its own ‘box classes’ for strings, integers and so on – IntWritable for ints – LongWritable for longs – FloatWritable for floats – DoubleWritable for doubles – Text for strings – Etc. The Writable interface makes serialization quick and easy for Hadoop Any value’s type must implement the Writable interface
4
What is WritableComparable?
A WritableComparable is a Writable which is also Comparable – Two WritableComparables can be compared against each other to determine their ‘order’ – Keys must be WritableComparables because they are passed to the Reducer in sorted order Note that despite their names, all Hadoop box classes implement both Writable and WritableComparable – For example, IntWritable is actually a WritableComparable
5
Instruction
6
HDFS상에서 주어진 경로상에 있는 컨텐츠 보는 명령어
hadoop fs HDFS상에서 주어진 경로상에 있는 컨텐츠 보는 명령어 hadoop fs -ls / hadoop fs -ls /user hadoop fs -ls /user/training HDFS에 파일 적재 로컬에 있는 Input파일을 HDFS상에 올린다. hadoop fs -put localpath/inputfile HDFS/ HDFS 디렉토리 생성 hadoop fs -mkdir weblog
7
Hadoop 구동 방법 Extract and upload the file in one step
gunzip -c access_log.gz \ | hadoop fs -put - weblog/access_log tar zxvf shakespeare.tar.gz | hadoop fs -put shakespeare input Hadoop 구동 방법 Input 파일을 HDFS상에 올린다. Java 코드를 eclipse상에서 import한 후에 jar파일로 expor한다. 터미널 명령어로 컴파일 후 jar파일 만드는 법(주어진 vm에서는 안됨) javac -classpath `hadoop classpath` *.java jar cvf wc.jar *.class 터미널에서 Hadoop 명령어를 사용하여 실행 Hadoop jar [jar file] [Driver Class name] [hdfs-inputpath] [hdfs- outputpath] hadoop jar wc.jar WordCount shakespeare wordcounts
9
WordCount
10
Goal Input Output
14
데이터 업로드와 실행 Output $ cd ~/training_materials/developer/data
$ tar zxvf shakespeare.tar.gz | hadoop fs -put shakespeare input $ hadoop jar WordCount.jar WordCount input/shakespeare/* /user/shakesOut Output
15
Inverted Index
16
Goal Input Output abominably hamlet@2787 abomination
abominations abortive abortives
20
데이터 업로드와 실행 Output $ cd ~/training_materials/developer/data
$ tar zxvf invertedIndexInput.tgz $ hadoop fs –put invertedIndexInput invertedIndexInput $ hadoop jar InvertedIndex.jar InvertedIndex invertedIndexInput output Output
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.