Improving the OFED Development Process
2 Panelists Betsy Zeller, QLogic (Moderator) Tziporet Koren, Mellanox Cristoph Raisch, IBM Dave Sommers, NetEffect John Jolly, Novell Many people, via , and … You, out there in the audience!
3 Improving the OFED Development Process Focus for this session is on OFED on Linux, rather than on Unix(s) or Windows Focus is on process, rather than ways to improve features of release Presentation based on input from panelists, OFED phone sessions, s to EWG list, and private s Goals of session: Make sure major issues are identified Identify solutions where possible Plan for followup
4 Agenda Original OFED goals from 2006 How are we doing on meeting all these goals? Process improvements since 2006 What issues have been raised? Includes question of kernel.org/OFED! Handling support – bug fixes and point releases Open the floor to issues Smoothing the way for distro integration Next steps
5 Original OFED Goals (March, 2006) Move away from each vendor providing their own unique snapshot of software. Enable vendors to create a common and supportable enterprise-grade distribution of OpenFabrics SW - something they can stand behind and support 24x7. Note: Expectation was that eventually distros (RH and SUSE) would ramp up to provide support for enterprise customers.
6 How are we doing? 4 major OFED releases over last two years, installed on many sites. Vendors are no longer pulling from top-of- trunk of SVN trees, delivering random disparate SW releases. RH integrated OFED 1.3 into RHEL 5 U2 Novell SLES 10 SP2 to ship with OFED 1.3
7 Process improvements since ‘06 All developers now use git Bi-weekly or weekly teleconferences, with detailed minutes Feature list and schedule are discussed in advance, posted on OF website, and updated regularly Processes are documented on website Nightly builds allow vendors to test their changes before more public releases Process is evolving for doing point releases But …
8 Some issues have been raised Release process is flexible, but: Some major features have been integrated long after feature freeze, and even after the RC process has started. There’s a delicate balance between “holding to the rules”, while still meeting vendors needs for critical features and bug fixes. OFED release dates tend to slide a little, not unlike many SW release dates! Should OFED patches have to go through additional review process, especially after RC?
9 Issues (continued) Issue: How do you get a fast bug fix turnaround (eg 2-3 days) ? Proposed Solution: For a kernel issue, add a patch to existing release using OFED patch script For kernel or userland issue, send to list requesting sub-minor release (eg ). For vendor specific problem, this can be turned around fast. This gets rolled into the next release cycle, where it will be tested by all vendors.
10 Issue – Balance with kernel.org Issue: Currently, there is kernel SW in OFED which didn’t come in through kernel.org SDP – no plan to get it in kernel.org – is this a problem? In other components, there are fixes/changes which “missed the kernel.org train”. Proposed Solution: Topic for discussion
11 Issues (continued) Issue: What’s the process if a vendor delivers new HW off-cycle, and needs SW? Has been a major issue for other vendors when a newer kernel version was required Proposed Solution: Topic for discussion
12 Issues (continued) Issue: Interoperability events use GA OFED SW. If a vendor fails to pass testing, they are off the Integrators List for a minimum of six months. Proposed Solution: Run initial interoperability testing with RC candidate. Do final “real” testing with GA version, which should have no surprises. Vendors can choose to be present at one or both of these events. Issue: “Final” versions of tests/test plans are not available until the event, which makes it a bit difficult to prepare in advance. Proposed Solution: Make “final” test plans available at least two weeks before event.
13 Issues (continued) Issue: There’s occasionally a difference between what’s in OFED, and what’s in latest package from maintainer. (eg, changes in verbs API to support XRC didn’t go into Roland’s released verbs package). Proposed Solution: Topic for discussion
14 Other Issues Are there other issues about the Linux OFED development process you’d like to raise?
15 Proposed Solutions Clearly publish (and reiterate) Feature Freeze dates from early on in the release. As part of this, clearly differentiate between “feature” and “bug fix”. (No, it’s not a bug that your feature is missing!) Review new feature proposals, to understand implications on others Clear discussion/negotiation of implied API changes related to new features. Showcase process documents on OpenFabrics website, so they can all be easily found and accessed As a community, either accept that some vendors will miss the opportunity to submit a change they care about, or accept that release dates will slip. Run initial interoperability testing with RC candidate. Do final “real” testing with GA version, which should have no surprises. Vendors can choose to be present at one or both of these events.
16 Novell – Build Constraints Packages That work well with 'quilt' open source patch management utility Use OpenSUSE Build Service Consistent backport implementation ofa-kernel uses kernel versions ib-bonding uses specific distributions #include_next can break distro use Packaging
17 Issues – Packaging Original plan for OFED was that it would go away when distros were ramped up to deliver and support Open Fabrics SW. What would have to be true before the distros can handle everything? Is there a better solution than backport patches? Backport patches required so users don’t have to compile kernels SUSE and RH can’t directly use OFED backport patches because “include_next” is not transparent.
18 Packaging Proposal Kernel distribution Aggregate kernel patches/modules in one package Userspace distribution Tar-balls + sample RPM spec files for releases/RCs Use git (for daily builds) + pull script Solution needs to meet the needs of distributors, vendors, and those who want to roll their own Comments?
19 Point Releases Release frequency: Between two to three months Can be more frequent if a critical bug is found Change guidelines: 1.Use the same kernel base of the major OFED release 2.No API changes (both in kernel and in user libs) 3.Core and ULPs (including MPI): Critical and high priority bug fixes only 4.Low level driver changes: responsibility of the HW vendor 5.Add backports to support a new OS (e.g. SLES10 SP2, FC8, etc.)
20 Point Releases (con’t) Release verification: All vendors should run at least basic QA/verification cycle Full QA by any vendor who changes their low level driver Release process: Release manager will publish the release target date 4 weeks prior to the release Patches will be sent against the major release git repositories A release will be built and tested by all companies in the usual method
21 Next Steps Collect a clear statement of the issues and proposed solutions. Send these out to the EWG mailing list, as pending decisions from Sonoma Workshop. Deal with any issues which are raised. Summarize feedback. Execute!