Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing ACM SIGKDD Explorations Volume 11, Issue 1, July 2009 報告人:黃啟智 學號:
Outline Introduction Interoperability and Open Standards Putting Models to Work Performance Conclusion 2
Introduction Deployment and practical application of predictive model: – Limited choice of options – Often takes months for models to be integrated and deployment( 時間冗長 ) – Custom coding or proprietary process( 成本昂貴 ) Open standards and Internet-based technologies are available to provide a more effective end-to- end solution for the deployment. 3
Introduction SOA : Service Oriented Architecture – For the design of loosely coupled IT systems(e.g. based on Web Services) SaaS : Software-as-a-Service – A license model – Vendors deliver software solutions as a cost-effect service PMML : Predictive Model Markup Language – A open standard that allows users to exchange predictive models among various software tools 4
Interoperability and Open Standards Cloud Computing Web Services SaaS, IaaS, PaaS Cloud Computing (an computing architecture) SOAP WSDL UDDI RPC SOA REST (access) (SOA-related standards) 5
Interoperability and Open Standards Cloud Computing – Reduce cost and management overhead for IT – Shift in the geography of computation – The Internet as a platform – A set of services that provide computing resources – A variety of services: Storage capacity, processing power, business application… – Cloud infrastructures Amazon Web Service(AWS) Sector/Sphere Hadoop … – The OCC, Open Cloud Consortium ( 6
Interoperability and Open Standards Web Service – W3C definition – Providing the foundation of SOA – Use XML to code and decode data – Use SOAP(Simple Object Access Protocol) standard to transport data – Data can be easily exchanged between different applications and platforms – Can be described by a WSDL(Web Service Description Language) file – UDDI(Universal Description, Discovery, and Integration):a platform independent XML-based registry for business to list themselvs on the Internet 7
Interoperability and Open Standards A SOAP request for PMML file (The file/model was previously uploaded to the service provider.) 8 A JDM(Java Data Mining) call
Interoperability and Open Standards SaaS – Software as a Service – A license model, users may access software via the Internet(not actually “buy and install”) – Users only pay for the right for a certain time period(e.g. NT$100 for an hour) – No upfront costs in setting up servers or software – Minimizing the risk of purchasing costly software that may not provide adequate return of investment – E.g. Salesforce.com, Google Apps. 9
Interoperability and Open Standards PMML-Predictive Model Markup Language – Developed by the Data Mining Group( – An open standard for representing data mining models – An XML-based language – Can describe data preprocessing and predictive algorithms – Can represent input data and data transformations 10
Interoperability and Open Standards PMML Structure examples(a test data file) Required (active)data fields Predicted data field 11
Interoperability and Open Standards PMML Structure examples 12
Interoperability and Open Standards PMML Structure examples Array of counts of different field values under different class labels 13
Interoperability and Open Standards PMML Model specifics (parameters, architecture) are defined under different model elements, including: – Neural Networks – Support Vector Machines – Regressions Models – Decision Trees – Association Rules – Clustering – Sequences – Naïve Bayes – Text Models – Rules 14
Interoperability and Open Standards PMML On-The-Go – PMML 4.0 Time series, boolean data types, model segmentation, lift/gain charts, expanded range of built-in functions… – More applications support export and import functionality in PMML – Open-source environments: KNIME( The R project( 15
Putting Models to Work Amazon EC2 – Elastic Compute Cloud – powered by Amazon Web Services ADAPA scoring engine – uses JDM(Java Data Mining) Web Service calls and therefore – allows for automatic decisions to be virtually embedded into enterprise systems and applications – available as a service to minimize total cost 16
Model Verification and Execution Typical tasks in the life cycle of a data mining project: – Building, deploying, testing and using data mining models (A cross-platform and multi-vendor environment) Putting Models to Work 17
Model Verification and Execution – Model testing/verification To ensure that both the scoring engine and the model development environment produce exactly the same result It allows for a test file containing any number of records with all the necessary input variables and the expected result for each record to be upload for score matching Putting Models to Work 18
Model Verification and Execution – Model execution Batch mode: via the web console,uploading a data file containing records (in CSV format or zipped) Real-Time mode: via web services, embedded calls (SOAP request) Putting Models to Work instance 19
Demo Excel-addin Putting Models to Work 20
Demo Excel-addin Putting Models to Work 21
Security on the Cloud – Uploading proprietary information to 3rd party service → security and control questions – The engine should not store any data – An instance shares nothing with other instances – And instance is Private (via authentication) – Access to an instance only via HTTPS – Models and data are deleted after an instance is terminated Putting Models to Work 22
Performance Instance type reference : 23
Performance 24
Conclusion Cloud computing It offers a powerful and revolutionizing way for putting data mining models to work. Open standard(PMML) It helps predictive models to be easily accessed from anywhere in the enterprise (web-service calls or uploading data files). The combination of both accelerates the deployment of predictive models and makes it more affordable. 25
Questions Security (transmission via Internet, to a 3rd party vendors) 、 privacy High-dimensionality / Large database transmission time + processing time 26
Biocep-R within the Technology Environment 27