Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROJECT. Topics  Theoretical: Error Performance Analysis for Partitioned Sketch Data Structures  Survey: Security and Privacy for Big Data: A Survey.

Similar presentations


Presentation on theme: "PROJECT. Topics  Theoretical: Error Performance Analysis for Partitioned Sketch Data Structures  Survey: Security and Privacy for Big Data: A Survey."— Presentation transcript:

1 PROJECT

2 Topics  Theoretical: Error Performance Analysis for Partitioned Sketch Data Structures  Survey: Security and Privacy for Big Data: A Survey and Future Directions  Experiments: Citizen Behavior of 7-21 Storm in Beijing, 2012 Music Knowledge Mining Hadoop for Video Streaming on the Web MapReduce Jobs For Video Conversion  Your proposed one…

3 1. Error Performance Analysis for Partitioned Sketch Data Structures  We talked about the time complexity already (in terms of update time)  TASK: What about error performance? How to optimally allocate the depth of each sketch (zipfian)?  Start to learn from how CM sketch analyzes its error performance (Theorem 1 and alike) http://dimacs.rutgers.edu/~graham/pubs/papers/cm- full.pdf  Learn about P(d)-CU http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber= 6574663

4 How to determine this?

5 Result  Analysis (e.g., mathematical derivations)  Some initial simulation (correctness)

6 2. Survey  Write a good survey in English on Security and Privacy for Big Data: A Survey and Future Directions  Cite at least 40+ references (IEEEXplore and ACM Digital Lib)  Paper organization Classify these works in different categories, from different angles Extensive comparisons Identify future directions (i.e., what are the missing pieces?)

7 Some Materials  http://www-03.ibm.com/security/solution/intelligence-big-data/ http://www-03.ibm.com/security/solution/intelligence-big-data/  https://ssl.www8.hp.com/ww/en/secure/pdf/4aa4-4051enw.pdf https://ssl.www8.hp.com/ww/en/secure/pdf/4aa4-4051enw.pdf  http://www.emc.com/collateral/industry-overview/big-data-fuels- intelligence-driven-security-io.pdf http://www.emc.com/collateral/industry-overview/big-data-fuels- intelligence-driven-security-io.pdf  http://www.isaca.org/Groups/Professional-English/big- data/GroupDocuments/Big_Data_Top_Ten_v1.pdf http://www.isaca.org/Groups/Professional-English/big- data/GroupDocuments/Big_Data_Top_Ten_v1.pdf  http://www.trendmicro.com/cloud-content/us/pdfs/business/white- papers/wp_addressing-big-data-security-challenges.pdf http://www.trendmicro.com/cloud-content/us/pdfs/business/white- papers/wp_addressing-big-data-security-challenges.pdf  http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1/ http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1/  Think about: Storage Analysis Applications Cloud, Internet-of-Things

8 3. Analyze Citizen Behaviors of 7-21 Storm in Beijing, 2012  The Power of Social Networks and Public Crowd  http://v.youku.com/v_show/id_XNDM5NjY1Mzc2.html  Using social network APIs like Sina Weibo open.weibo.com/wiki  Use the keyword search to retrieve all related data  # 望京人赴机场免费救援 # , # 双闪车队 # (100+)  菠菜 X6 , @ 望京网

9 4. Music Knowledge Mining  Million Song Dataset http://labrosa.ee.columbia.edu/millionsong  For Example: to calculate music density http://musicmachinery.com/2011/09/04/how-to- process-a-million-songs-in-20-minutes/ http://musicmachinery.com/2011/09/04/how-to- process-a-million-songs-in-20-minutes/  YOUR TASK: Predict which songs a user will listen to http://www.kaggle.com/c/msdchallenge

10 5. Video Streaming on the Web  Store your video as chunks in HDFS  Case: user suddenly move to a specific part of the video  Seek in the file to position the cursor at a specific location  HDFS can only be accessed through a Hadoop client, Apache server is not.  Apache/FUSE: all file system operations (dir browsing, file opening and content access) are enabled over HDFS content through the FUSE interface.  http://internetmemory.org/en/index.php/synapse/using_had oop_for_video_streaming/ http://internetmemory.org/en/index.php/synapse/using_had oop_for_video_streaming/

11 Result  A demo Choose a least 1 type of video format (e.g., flv) A client to play video A web server (with Apache FUSE) HDFS to store your videos

12 6. MapReduce For Video Conversion  Convert huge number of video files from one format to another.  using the open source video converter FFMPEG (http://ffmpeg.org/download.html).  Data stored on HDFS  Create an app doing it (running on Google AppEngine)

13 Mechanism  Working in group: 3-5 students, clear roles  Email me (ase_bit@yahoo.com) by this Friday (Nov 22) Team leader, Team members Topic  Deadline: 28 December 2013!  Deliverable: project report in Chinese Introduction (motivation, WHY?) Related Work (What others have done) Your proposal (HOW?) Performance Evaluation Conclusion  Presentation

14 Suggested Arrangement  Week-1: Define your roles and start literature research  Week-2 and 3: Propose solutions  Week-4 and 5: Implementation and obtain results  Week-6: Write report


Download ppt "PROJECT. Topics  Theoretical: Error Performance Analysis for Partitioned Sketch Data Structures  Survey: Security and Privacy for Big Data: A Survey."

Similar presentations


Ads by Google