IT Metrics/Dashboards at Duke: Curation, Automation, Aggregation? CSG January 11, 2012
2 Background Initiated metrics effort in 2008 (1 FTE dedicated) Aligned with finance initially, now integrated with service management team Focused on availability, capacity, service usage/demand, internal resources Consult with units on what metrics they should capture Help with collection and analysis of metrics Ensure consistent, universal reporting of data
3 Initial scope and progress Monthly reporting process for managers and community Internal: Data collection, report development, management review, publication of detailed report External: availability, usage, performance summaries Limited range of services historically , network/voice, paging, HR, IT security, telepresence
4
5
6 The challenges of our “curation” process Labor-intensive Monthly periodicity limits immediacy Fine for long-term trending, but less immediate Reliance on local units to self-report Leveraging Duke’s post-incident review (PIR) process, but remaining data still sui generis Curation: work by content specialists immersed in a specialized discipline and imbued with analysis
7 Metrics and newsgathering: Classic curation processes?
8 Duke’s metrics curation challenges No one wants to read “yesterday’s news” Hard to avoid with curation The stories are too long New monthly exec-summary focused on trends has helped (like the WSJ’s “What’s News” box) to a point Focus on putting out a daily paper takes away from the “longreads” Time required to produce manual reports takes away time expected to be spent on consulting with units to help them be primary data-gatherers
9 Curation vs. automation
10 Automation: leveraging monitoring data System/network monitoring produces thousands of data points a day – how can we use them? Left: Daily low-level alert analysis (SPC methodology) Not shown: Loss of redundancy reports
11 Automation: Leveraging monitoring data
12 Duke’s metrics automation challenges Automation a great start, but plenty of “curation” still occurs! Monitoring’s “service dashboard” useful, but not ready to be directly published Monitoring events don’t/can’t catch everything Human adjustment still needed to raw data Low-level alert reports are automated… and reviewed by daily operational staff Ultimately, there’s infinite ways to leverage data, but what do you care about?
13 Curation, automation, aggregation
14 Duke’s aggregation questions What data mix should appear? Curated data from monthly metrics, live report on high- priority user tickets, performance graphs, availability measures against targets on one screen Broad enough to be watched; specific enough to be useful What platform should we use? SharePoint? Improve collection, visualization ServiceNow? Leverage dashboards, APIs, data structure; easy access to support tickets Javascript/JSON components for easily customized dashboards? – node.js and d3.js