Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Slides:



Advertisements
Similar presentations
SERVICE MANAGER 9.2 VIEWS AND REPORTS July, 2011.
Advertisements

COS 461 Fall 1997 Network Objects u first good implementation: DEC SRC Network Objects for Modula-3 u recent implementation: Java RMI (Remote Method Invocation)
Transaction.
1 Chapter 12 Working With Access 2000 on the Internet.
1 Frameworks. 2 Framework Set of cooperating classes/interfaces –Structure essential mechanisms of a problem domain –Programmer can extend framework classes,
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
CS 582 / CMPE 481 Distributed Systems
Page: 1 Director 1.0 TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak.
Wireless LAN Topology Visualiser Project Supervisor: Dr Arkady Zaslavsky Project Team Members: Jignesh Rambhia Robert Mark Bram Tejas Magia.
(NHA) The Laboratory of Computer Communication and Networking Network Host Analyzer.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
3D Object Retrieval Client-Server Project
Fundamentals of Python: From First Programs Through Data Structures
1 Chapter Overview Understanding Windows Name Resolution Using WINS.
Linux Operations and Administration
NORTEL NETWORKS CONFIDENTIAL CallPilot 150 Modem Access Jan 03, 2005 Version 1.5.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Server Design Discuss Design issues for Servers Review Server Creation in Linux.
Distributed Deadlocks and Transaction Recovery.
Module 7: Configuring TCP/IP Addressing and Name Resolution.
NETWORK CENTRIC COMPUTING (With included EMBEDDED SYSTEMS)
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Web Application Architecture and Communication. Displaying a Web page in a Browser
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 Computer Communication & Networks Lecture 28 Application Layer: HTTP & WWW p Waleed Ejaz
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Cognos TM1 Satya Mobile:
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Introduction to Interprocess communication SE-2811 Dr. Mark L. Hornick 1.
COMP 321 Week 7. Overview HTML and HTTP Basics Dynamic Web Content ServletsMVC Tomcat in Eclipse Demonstration Lab 7-1 Introduction.
J2EE Structure & Definitions Catie Welsh CSE 432
INSTALLATION HANDS-ON. Page 2 About the Hands-On This hands-on section is structured in a way, that it allows you to work independently, but still giving.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Distributed Data Mining System in Java Group Member D 王春笙 D 林俊甫 D 王慧芬.
LAB 1CSIS04021 Briefing on Assignment One & RMI Programming February 13, 2007.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Systems Management Server 2.0: Backup and Recovery Overview SMS Recovery Web Site location: Updated.
1 Chapter 28 Networking. 2 Objectives F To comprehend socket-based communication in Java (§28.2). F To understand client/server computing (§28.2). F To.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
Collaborate Lesson 4C / Slide 1 of 22 Collaborate Knowledge Byte In this section, you will learn about: The EJB timer service Message linking in EJB 2.1.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
WWW: an Internet application Bill Chu. © Bei-Tseng Chu Aug 2000 WWW Web and HTTP WWW web is an interconnected information servers each server maintains.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Dynamic Host Configuration Protocol (DHCP)
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
CSI 3125, Preliminaries, page 1 Networking. CSI 3125, Preliminaries, page 2 Networking A network represents interconnection of computers that is capable.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Linux Operations and Administration
Field Programmable Port Extender (FPX) 1 Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
Securing Web Applications Lesson 4B / Slide 1 of 34 J2EE Web Components Pre-assessment Questions 1. Identify the correct return type returned by the doStartTag()
SPL/2010 Reactor Design Pattern 1. SPL/2010 Overview ● blocking sockets - impact on server scalability. ● non-blocking IO in Java - java.niopackage ●
Distributed Computing & Embedded Systems Chapter 4: Remote Method Invocation Dr. Umair Ali Khan.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
IST 201 Chapter 11 Lecture 2. Ports Used by TCP & UDP Keep track of different types of transmissions crossing the network simultaneously. Combination.
Chapter Overview Understanding Windows Name Resolution Using WINS.
CHAPTER 3 Architectures for Distributed Systems
#01 Client/Server Computing
Database management concepts
Programming in Java Text Books :
Chapter 10 IGMP Prof. Choong Seon HONG.
Database management concepts
#01 Client/Server Computing
Presentation transcript:

Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Overview of Project Project participants – 王春笙,林俊甫,王慧芬

Project Programming Tasks D 林俊甫 –Polling and reply Multicast between client and server –Client/Server Socket programming –Client dynamic join and leave mechanism –Multi-thread programming –Synchronization mechanism –Data chunks maintenance and dispatching mechanism –Client/Server communication link control

Project Programming Tasks(cont’d) –Client failure handling Reassign backup server, if failure client is backup Restore failure client works (with 王春笙 ) –Server failure handling Backup Server designate mechanism and logic design –RMI mechanism (with 王春笙 ) –Basic GUI

System Infrastructure System diagram LAN Server/Coordinator Client ... Mining data chunk Mining result

Basic Operation Server Client 1. Polling on port 4444 Group who is server? 2. Servername: I am the server 3. Connect to 4. Client do: filechunk# 5. ok 6. Client do: next filechunk# 7….. 8….. …. Time Listen multicast Group query and reply Server found; Connect to the Server Fork thread to Handle client connection Receive server’s Instruction, ivoke RMI to get file chunk Wait for client’s Processed result, Order client to get Another file chunk

Port Assignment Port 4444: for multicast Port 4445: for TCP/IP socket connection Port 4446: for RMI services

Finding A Server Once a client start up, it will query periodically every 3 sec. over the multicast group port 4444 by sending 1 byte string to locating the server host. Once a server start up, it will fork a thread to dealing with the query 2. Listen for server response 3.Connect to Server on port Use RMI Get file chunk from Server 5. Process data mining and return result to server 6. Server failure detect -> if I am backup go to backup server procedure, otherwise go to step.1. 1.Client Query: who is the Server now?

File Dispatching Server maintain a file chunk pool. Server will find a available file chunk for client, set it to 1 and order client to get this file chunk by RMI file chunk will be update to 2 when client return result. Recovery: When server detects client’s link-broken, it will restore file chunk allocate to client to 0. File chunk class is declared as Serializable for RMI message passing to backup server File chunk class use Synchronization for concurrent control FileChunks ………… -1: empty, 0: available, 1: using, 2:used

Backup Server Selection Server maintains and assigns unique id for each individual client. Unique id is incremented as serial number. Client with smallest id is assigned as backup server When client failure, server will check if it is the backup server to restart the selection process or not.

Nodes Maintenance Server maintain connected client’s records in an ArrayList ArrayList is compound with class Nodes, which records client’s detail information. KeyValue IdAddressPortWork onStatus ArrayList: ht Nodes

RMI Services RMI services is written in independent program because server and client (which acts as backup server) will use it. RMI services provides: –Backup server data to backup-server. –Get file chunk from server –Return mining result to server –Receive nodes information from server

Client Failure Server’s action took: –Recovery –Reassignment –Redo backup server selection if failure nodes is backup Client’s action –Do nothing except one is told by server to act as backup

Server Failure Server SClient B Time Server run backup Selection choose A As backup Time Client A 1.A is told by S that It is the backup A invoke RMI to get all Server data A: Do backup RMI Get file RMI reply 2. A periodically Get server services, File chunk data do reply Client do # do reply 1. B receives instruction as discuss before Server Crash X X 3. Comm.link broken Is detected, start ServerAction class 2. Comm.Link Broken is detected, multicast query who is the server now? B who is server? 4. Create server Socket at 4445, fork thread To listen to query And wait for connection A reply: I am the server 3. B know A is the backup, re- connect to A Connect to A:4445

Server/Client Life Cycle ServerClient Server Normal/Abnormal Termination Normal/Abnormal Termination evolve

Project Programming Tasks D 王春笙 –Web log file preprocessing and separating –Web pages traversal sequences parsing –Page items transferring and mapping –Web pages sequential patterns mining –Mining results maintenance –RMI mining results transfer –Mining results lookup and display

Project Programming Tasks(cont’d) –Backup mechanism Separate thread backup server files and memory data Restore failure client works (with 林俊甫 ) –RMI mechanism (with 林俊甫 ) –GUI global states refreshment –System integration Testing and debugging

Web Log File Format User IP Date Time Web pages URL

Web File Preprocessing Select *.htm and *.html pages First sort by user ID Second sort by time Pages sequences separated by time –more than 30 seconds

Chunk Data Files Part*.ppp Items.ppp /~visualdep/htm/p5b.htm 168 /~businessdep/student/picture.html 169 /~comedu/inde.htm 170 /~account/91tuition.htm 171 /~stuaffair/life/procedure-17.htm 172 /~stuaffair/life/procedure-25.htm 173

Apriori algorithm 1:find all L1 2:generate C2 from L1 3:count C2 and find all L2 4:k=3 5:generate & prune Ck from Lk-1 6:count Ck and find all Lk 7:if Lk not empty then k++, goto 5

Apriori algorithm (cont’d) join phase:s1 join s2 if s1(drop first) = s2(drop last) – s1 join s2 => prune phase:delete a k candidate if any k- 1 sub sequence not large C & L are stored in hash data structure

Mining Result Display Client frequent patterns –Web page ID –Support –Saved as *.pppl files Client frequent patterns –Web page ID –Support –Web page name

Backup Mechanism When backup server selected, that client start a backup thread Backup thread loop every 0.5 second RMI data transfer –Chunk data file(part*.ppp,items.ppp) –Client information –File chunk information determine MaxID and set “in use” to “available” –Frequent patterns information

System Integration Java class integration –Server component –Client component –Data mining component –GUI component Testing Debugging

Project Programming Tasks D 王慧芬 –Graphical User Interface Since this is a system working on data mining task in a distributed way, its GUI provides four panels : –A system console –A result window –A connection table –A graphical network configuration

GUI The system console shows how system proceeds

GUI (cont’d) The result window displays the progress and results of data mining

GUI (cont’d) A connection table lists all of the on-line client connection information

GUI (cont’d) A connection table consists of 5 fields –NO : client-server connection id –IP address : client’s IP address –Port : client’s port number –Status : connection status, it could be 0: offline1: online 2: file transfer from server to client 3: client is doing data mining 4: client returns value back to server if data mining finished 5: client is doing the backup and data mining at the same time –# chunk works on : if data mining and backup, it indicates the chuck number that the connection works on

GUI (cont’d) A graphical network configuration follows the connection table to depict the dynamic network configuration

GUI (cont’d) In the dynamic network configuration, we use different client GIFs to express the status : –OfflineOn-line –Data mining –Backup and mining

GUI interface mw.showMsg() –provided by GUI for server/client module to show the console message mw.showResultString() –provided by GUI for server/client module to show the results of data mining Connection table –modified by server/client module for connection information –read by GUI every 0.01 second to depict the dynamic network configuration

GUI design Java swing is used to generate label, text, scrollbar, and table, etc.. Java AWT 2D painting is used to form the animation of the connection lines in the dynamic configuration panel ‘Photo Impact’ and ‘GIF animator’ are used to generate the node icons EasyRGB used to tune the color harmonies.

GUI design (cont’d) A new thread is forked from the GUI task to work on the animation of the connection lines in the dynamic configuration panel, –to read the table every 0.03 second and to show the connection status with a moving ball.

Installation 以執行一個 server ,兩個 client 為例 – 建立三個資料夾,此三資料夾 Ser(Server),Cli(Client1),Cli2(Client 2) – 將附檔解壓至 Ser 資料夾,此資料夾內要下載 weblog10.zip 檔,並 解壓 – 將附檔解壓至 Cli 與 Cli2 的空資料夾 – 開啟二個 dos 視窗 (1,2 號視窗 ) ,進入 Ser 資料夾 – 開啟三個 dos 視窗 (3,4,5 號視窗 ) , 3,4 號進入 Cli 資料夾, 5 號進入 Cli2 資料夾 –1 號視窗執行 compile.bat 批次檔,再執行 rmi.bat –2 號視窗執行 server.bat 批次檔 –3 號視窗執行 compile.bat 批次檔,再執行 rmi.bat –4 號視窗執行 client.bat 批次檔 –5 號視窗執行 compile.bat 批次檔,再執行 client.bat 批次檔