What’s the Big Deal About R? Tom Tiedeman, OCIO July 21, 2015
Typical Patent /Trademark Questions Is my idea actually new? How much innovation comes from our state university? Has state support paid off? How can I easily track new patent grants and applications in my interest area? How can actual use of the newest technology be increased?
Making sense of big data Diverse user interests Interest in particular needle not the haystack Inference / judgment are key Continued monitoring for new developments Possible huge economic impacts – or not :o) Total volume of complex questions could be extreme Several data sets needed for an answer
USPTO’s Challenge: What Good is Open Data if People Can’t Use It? Terabytes of data Fast-changing (~ 30 – 50 GB per week) Complicated data structure (XML / relational) Fuzzy information (images, non-standard text)
USPTO Data and Existing Tools USPTO web downloads very constrained Page scraping is insufficient XML is not just rows and columns Formats like PDF are non-trivial Data scale is much too large for tools like Excel and Access What USPTO provides / does will change
Why “R”?: Loose fit for a wide range of problems Statistical / graphical computing focus Free PC-based open source software Links with other languages Growing power, application, user base Online download capability Tools for XML, API’s, JSON, other data formats 6,900 packages, plus framework for more Many training courses, academic base Just Google “R”
Learn “R” MOOCS and courses to learn R EdX.org: Explore Statistics with R 0/ Coursera.org: Data Science Specialization