Presentation is loading. Please wait.

Presentation is loading. Please wait.

December 07, 2006Parag Mhashilkar, Fermilab1 Samgrid – OSG Interoperability Parag Mhashilkar, Fermi National Accelerator Laboratory.

Similar presentations


Presentation on theme: "December 07, 2006Parag Mhashilkar, Fermilab1 Samgrid – OSG Interoperability Parag Mhashilkar, Fermi National Accelerator Laboratory."— Presentation transcript:

1 December 07, 2006Parag Mhashilkar, Fermilab1 Samgrid – OSG Interoperability Parag Mhashilkar, Fermi National Accelerator Laboratory

2 December 07, 2006Parag Mhashilkar, Fermilab2 Overview Basic Architecture Job Types and Data Flow Basic Debugging

3 December 07, 2006Parag Mhashilkar, Fermilab3 Basic Architecture for Reprocessing Samgri d OSG SAM-Grid / OSG Forwarding Node Flow of Job Submission Offers Services Samgrid Client Site: d0mino0x.fnal.gov Job Forwarding: d0srv047.fnal.gov OSG Sites: Fermilab, USCMS Farm, Oklahoma University, Indiana University, University of Nebraska – Lincoln, … SAM Services OSG Station: osg-ouhep on d0srvo47.fnal.gov Station Caches: ouhep00.nhn.ou.edu, d0srv015.fnal.gov, d0rsam01.fnal.gov Durable Location: ouhep00.nhn.ou.edu, d0srv063.fnal.gov, d0srv065.fnal.gov

4 December 07, 2006Parag Mhashilkar, Fermilab4 Job Types and Data Flow Production Fetching job files: Fermilab worker nodes (USCMS farm) fetch from Fermilab caches Non Fermilab worker nodes fetch from OU and any other caches that might come up in future (IU) Fetching Raw files: Every worker node fetches from Fermilab caches Storing Unmerged thumbnails: Every worker node stores the unmerged thumbnail to durable location at Fermilab Merging Done on Fermilab worker nodes

5 December 07, 2006Parag Mhashilkar, Fermilab5 Basic Debugging Grid Job Monitoring shows grid job is held. Reasons should be available on monitoring page of that job. Gridjob is idle for a long time. Look the resource monitoring page and verify that Samgrid-OSG resource is available. http://samgrid.fnal.gov:8080/list_of_resources.php? http://samgrid.fnal.gov:8080/list_of_resources.php Local Jobs Local Jobs show status failed. Usually this means that there was problem starting your local job on the OSG resource. Local Jobs are done but there are some problems. There could be several reasons. Download the log files from the monitoring of grid job. Extract the logs and look for more info in files std_out * and std_err *, where n = local job number Reporting Problems to experts *Always* include the grid job id in your mail. In case of local jobs, also include the local jobs numbers or id’s if possible. If all the local jobs fail with similar symptoms, include the error messages, if you can identify them.


Download ppt "December 07, 2006Parag Mhashilkar, Fermilab1 Samgrid – OSG Interoperability Parag Mhashilkar, Fermi National Accelerator Laboratory."

Similar presentations


Ads by Google