The Chinese University of Hong Kong. Research on Private cloud : Eucalyptus Research on Hadoop MapReduce & HDFS.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
UC Berkeley Online System Problem Detection by Mining Console Logs Wei Xu* Ling Huang † Armando Fox* David Patterson* Michael Jordan* *UC Berkeley † Intel.
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Spark: Cluster Computing with Working Sets
Search at Scale Hadoop, Katta & Solr the smores of Search.
Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.
ADVISE: Advanced Digital Video Information Segmentation Engine
Fractal Image Compression
Parallelizing Compilers Presented by Yiwei Zhang.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
BLOCK DIAGRAM OF COMPUTER
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
An Architecture for Video Surveillance Service based on P2P and Cloud Computing Yu-Sheng Wu, Yue-Shan Chang, Tong-Ying Juang, Jing-Shyang Yen speaker:
Wrangling Customer Usage Data with Hadoop Clearwire – Thursday, June 27 th Carmen Hall – IT Director Mathew Johnson – Sr. IT Manager.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He.
1 CloudVS: Enabling Version Control for Virtual Machines in an Open- Source Cloud under Commodity Settings Chung-Pan Tang, Tsz-Yeung Wong, Patrick P. C.
S-Paxos: Eliminating the Leader Bottleneck
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
HPC In The Cloud Case Study: Proteomics Workflow
Experience Report: System Log Analysis for Anomaly Detection
A Tale of Two Erasure Codes in HDFS
Presented by: Omar Alqahtani Fall 2016
Optimizing Parallel Algorithms for All Pairs Similarity Search
Spatio-temporal Segmentation of Video by Hierarchical Mean Shift Analysis Daniel DeMenthon SMVP 2002.
Rule Induction for Classification Using
Edinburgh Napier University
Experience of Lustre at a Tier-2 site
Introduction to MapReduce and Hadoop
Applying Twister to Scientific Applications
湖南大学-信息科学与工程学院-计算机与科学系
Objective of This Course
INFO 344 Web Tools And Development
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Presentation transcript:

The Chinese University of Hong Kong

Research on Private cloud : Eucalyptus Research on Hadoop MapReduce & HDFS

Technologies & Applications Use the technologies to solve real problems

What’s the problem Advertisement searching What’s our solution Hadoop MapReduce & HDFS Global view of the whole system major components demo Detail of the algorithm IVS comparison Conclusion Future development Q & A

Find out how many, at what time the advertisements being broadcast

* Considering one channel

Computational Power Storage

TV recording Video Preprocess Frames comparison and mining.IVS.mpg advertisements IIS Web Server.flv.JPG O(1) O(n) Hadoop used here

Invariant Video Signature Developed by VIEW lab Convert each frame to a vector with 1024 attributes Save storage Save computation power

Find the longest common subsequence O(n 2 ) n= 60*60*25=90000 n 2 =8.1e9 e.g. compare one hour with another one hour ~ 1 hour/ comparison Do something to reduce the n

Step 1 : Group similar frames ~ 750 frames become ~10-20 groups(segments)

Step 2: Compare the first frame of each group To locate the possible starting point

Step 3 : Compare two IVSs frame by frame Until a pair of frames are different

We compare the frame by finding dissimilarity Usually identical frames with dissimilarity <100 Two different frames with ~ Computation time : ~10s

The algorithm is provided by VIEW lab We put great effort on optimization of this algorithm. More accuracy Much faster

Fade in / out prevent us from finding a correct starting point. Usually has 1-2 frame shift

Not match Wrong starting point Match Not MatchMatch... Wrong continuous segment 1 Wrong continuous segment 2 Correct continuous segment Larger dissimilarity with frame shift Another starting point found

Dissimilarity : best frame here Compare more than one frame until a acceptable frame found

Old algorithm can not handle the case if some frames crashed or with some noise ?

Instead of finding the common segments only, we group all possible segments together. Advertisement can be identify even with noise Preprocessing is required ?

Improved algorithm Previous algorithm False negative

S Some other improvement has been done Computation time is ~10 time faster

In stead of directly find the common length, we further group the common segment into to larger groups to greatly reduce the size of n. We handle the groups one by one. Divide n to many small n. Some coding style improvement.

We learnt the cloud computing technologies Cloud Computing MapReduce & HDFS We improved & solved the scale up problem By what we learnt IVS algorithm We developed a MapReduce work flow Provided huge flexibilities Could be applied to general applications 7*24 working

Fault tolerant Self healing Data replicates Scalability Processing power Storage

Improve the I/O performance Improve the MapReduce application Simplify the work flow Enhance the Web application By changing the core component, we have another similar application. Surveillance video searching

Q & A

Poor CPU of the processing unit, makes the system complex. Hadoop cluster machine (Pentium 4 with 1GB RAM and 60GB HHD) With 2GB RAM, the system can be two time faster With 100MBytes/s switch, 3 time faster. With dual core CPU, run two job at the same time and eliminate the overhead time of Hadoop.