Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status report of the new NA60 “cluster” Our OpenMosix farm will increase our computing power, using the DAQ/monitoring computers. NA60 weekly meetings.

Similar presentations


Presentation on theme: "Status report of the new NA60 “cluster” Our OpenMosix farm will increase our computing power, using the DAQ/monitoring computers. NA60 weekly meetings."— Presentation transcript:

1 Status report of the new NA60 “cluster” Our OpenMosix farm will increase our computing power, using the DAQ/monitoring computers. NA60 weekly meetings Pedro Martins 16/06/2005

2 Goals presented/reached from the last presentation I.Create an heterogeneous cluster using monitoring + DAQ PCs. I.The system is ready for “mass production”. This has not yet been done due to many constraints that will be presented later. II.Optimize the network access of these machines with the “teras” PC I.Ready. We simply need to change the na60tera1’s network interface from 100Mbit to the 1Gbit (physically installed).

3 Review of the Technical Objectives The user should see only one machine, like in a supercomputer. The maintainance should be kept to a minimum. The system should be flexible, able to easily add and remove nodes (PCs). Na60pc08, “job queuing” tool in preparation. Partially done. Done, but without the “on- the-fly” option.

4 Openmosix Easy implementation, Node discovery tool, Migration of processes A single process, like a macro, is NOT going to be shared among all computers. Each process goes to one CPU only. In principle we can run as many processes as the number of CPUs we have. Gentoo The Scientific Linux does not allow (to the non-IT user) the creation of clusters, Simple maintenance, using Portage. CERN/IT does not support Gentoo based OS. Our IT experts will be the only ones solving the problems. Advantage? The previous data from the machines is kept, meaning that we always have a fallback option. Remote boot, diskless support Na60pc08 is our cluster's disk server: It provides network information to all nodes, controls the process migration, makes the interface with the user.

5 na60pc08 Node 1 (diskless) Node 10 (OS on local disk) NAT Boot and network information, OS data. Network information CASTOR, na60tera1, na60tera2... nfs OpenMosix migrator vnc ssh user Original Idea

6 na60pc08 Node 1 Node 10 Network information CASTOR, na60tera1, na60tera2... nfs vnc ssh user New version What has changed? NAT is no longer used. The usefulness of the migration between nodes is still to be discussed Diskless nodes are still to be tested (are they needed?).

7 NA60ROOT and OpenMosix A complex system like NA60root is very sensitive to migration. NA60root depends (obviously) strongly on ROOT and this leads us to 2 important aspects of this framework: – ROOT can use Python, OpenMosix can’t migrate python processes (you’ve been warned...) – The migration of shared libs is still in an experimental fase on OM. We are “mounting” the offline account on all nodes, thus having exactly the same files on every node. Hopefully this will minimize (end?) all the problems with shared libs.

8 Present status 4 nodes with local system disk are fully working. Therefore, we have 5 PCs working on the farm. The addition of new PCs to the farm depends strongly on the heat dissipation of the room where the cluster will be installed. Are we buying a new A/C? Can we get the new room at the end of the corridor? Can we put the meeting room’s A/C on this new room? Recently (today), I found out that the OpenMosix filesystem has problems working together with NFS. Therefore, I’m going to change to a newer OM kernel, where OMFS has become deprecated.

9 To do list Queuing tool, like auson: – It will use the “cpujob” and “iojob” commands that send the processes to the adequate nodes, Debug the NFS vs OMFS implementation, See how many extra machines we can connect. AFS support is now available for 2.4 (namely OM kernels) and 2.6 (all Gentoo laptops at NA60). Write a Howto/FAQ for the offliners.


Download ppt "Status report of the new NA60 “cluster” Our OpenMosix farm will increase our computing power, using the DAQ/monitoring computers. NA60 weekly meetings."

Similar presentations


Ads by Google