Data Mining and the Innovation of the Crowds Jeff Lynn 21 October 2011
Text/data mining already flourishes, but only the owners of the material and their licensees can participate 2 That creates two problems o It means less material can be mined If relevant material is owned by a multitude of rights-holders, each may only be able to mine a portion This is bad, for obvious reasons o It means fewer people can do the mining Only those with a direct connection to the rights-holder can get involved I see this as even worse, but the reasons may not be obvious
Instead, they turned to crowdsourcing and developed a programmed called Connect & Develop. Proctor & Gamble had long relied on their internal product development staff of over 7,000 people. ◦ In 2000, they realised that 7,000 would not be nearly enough to innovate fast enough to meet customer demand ◦ The traditional approach would have been to hire more internal staff 3
P&G posts product development tasks to a public website ◦ Includes the price they will pay for the project to be completed ◦ Members of P&G’s extended dev team respond with proposals ◦ The work is awarded to the best solution 4 As a result: ◦ There are now 1.5 million people in P&G’s extended network ◦ Over 50% of P&G’s product initiatives involve significant collaboration with outside innovators ◦ P&G remains one of the most successful consumer goods companies in the world, with its share price increasing ~150% since the programme started
P&G’s a great story, but at the end of the day is just about making tastier Pringles 5 Think about what would happen if you had a pool of 1.5 million people using different techniques to mine data from thousands of: o Biomedical research papers o Historical newspaper articles o Analyses of public sentiments o Endless other data sources that are already in the public domain
As much as anything, crowdsourced data mining is what the digital economy is supposed to be about ◦ Utilising the low costs of communication to tap the talents of lots of people ◦ Improving collective human knowledge by taking advantage of the individual knowledge of people spread around the world 6 And it’s also what copyright is supposed to foster o We have IP laws solely to promote innovation o If the IP laws don’t allow crowdsourced data mining, then what are they for?