Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta.

Slides:



Advertisements
Similar presentations
Abstract There is significant need to improve existing techniques for clustering multivariate network traffic flow record and quickly infer underlying.
Advertisements

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Empowering visual categorization with the GPU Present by 陳群元 我是強壯 !
CSCI 317 Mike Heroux1 Class Introduction CSCI 317 Mike Heroux.
CSE881 project Diabetes Risk Classification Jingshu Chen Qingpeng Zhang Ming Wu STATE UNIVERSITY.
HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,
Mining Baseball Statistics
Fishery View Improvement of Time Series Line Chart Visualization of Fishery Data CPSC 533C Final Presentation Ying Zhang & Lan Wu December 19 th 2005.
CS 491B Project Web Galaxy Wendy Tan Web Galaxy Project Introduction Demo Analysis.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
VMware vCenter Server Module 4.
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Personalized QoS-Aware Web Service Recommendation and Visualization.
1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Introduction to Information Retrieval CS 5604: Information Storage and Retrieval ProjCINETViz by Maksudul Alam, S M Arifuzzaman, and Md Hasanuzzaman Bhuiyan.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
Appendix: The WEKA Data Mining Software
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Machine Learning for Language Technology Introduction to Weka: Arff format and Preprocessing.
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
3SAQS Technical Workshop February 25 th, 2015 Data Warehouse Current and Future Development Intermountain West Data Warehouse.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Facilitating Document Annotation using Content and Querying Value.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
TI-84 and TI Connect Sammi Lindstrom. Agenda Why so many cables and what are they used for? (10 minutes) What software should be on my desktop? (5 minutes)
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Ibraheem Osama Mohamed Mobile Developer-Rashdan IT.
Clustering, performance evaluation, and Term Project 1.Term Project 2.Resource for review.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Facilitating Document Annotation Using Content and Querying Value.
A WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEB SITES.
Android App Development Presented By, Handy Apps
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Technologies For Creating Rich Internet Applications Presenter's name
Automatic License Plate Recognition for Electronic Payment system Chiu Wing Cheung d.
Mixed Reality Augmented Reality (AR) Augmented Virtuality (AV)
Experience Report: System Log Analysis for Anomaly Detection
TensorFlow The Deep Learning Library You Should Be Using.
Sentiment Analysis of Twitter Data(using HadoopMapreduce)
Distributed Network Traffic Feature Extraction for a Real-time IDS
W3 Status Analyzer.
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets Ashok Sharma, Robert Podolsky, Jieping.
Future-oriented Benchmarking Through Social Media Analysis
Hansheng Xue School of Computer Science and Technology
Training 9/20/2018.
Waikato Environment for Knowledge Analysis
Verilog to Routing CAD Tool Optimization
Semantic Soccer: Implementation on Semantic Wiki Platform
Tutorial for WEKA Heejun Kim June 19, 2018.
Supervisor: Yury Nikulin Key research questions:
Alan Jovic1, Kresimir Jozic2, Davor Kukolja1,
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Korea Software HRD Center
Presentation transcript:

Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta

2 Overview Introduction Data Preprocessing Data Mining Data Visualization Experiment Conclusion

3 Responsibility Data Preprocessing : Farhana & Ken Data Mining : Ken Data Visualization: Shweta

4 Overview A Linear Clustering Algorithm Applications 1. Feature selection – Choose features based on information gain 2. Discretization – Partition based on data set characteristics

5 Data Preprocessing Data Ferret(Federated Electronic Research,Review,Extraction & Tabulation Tool) Install the software Web-version

6 Data Pre-processing : Step Extracted data from CPS (Current Population Survey) Pre-processing Number of features 43 Year ,000/month rows over 50 states After preprocessing 23 Normalization

Data Mining Algorithm Choose an ordinal attribute (X) Order data points based on attribute List potential partition points (between successive values of X) For each potential partition point P Calculate distance of data points where X P Results Can partition data points Order data points by information gain

Data Mining Test dataset

Data Mining Test dataset 2

10 Experimental Setup Environment 1. Data Ferret : Data Pre-processing 2. Java Platform : Implement the Data Mining Algorithm 3. Data Visualization 1. Google App Engine Datastore API Python, javascript and Django Framework 2. Google Chart API Hardware: Windows XP laptop Core GHz 2.00 GB RAM (that hurt)

11 Visualization Demo Link for the web-site

12 Conclusions Preliminary results are encouraging Discretization was successful Lessons learnt and future work Comparison with other methods on well known datasets Evaluate performance in feature selection OPTIMIZE Don't pick a novel dataset & novel algorithm at the same time

Thank you Questions