Data-Driven Inference of API Mappings Department of Computer Science Rutgers University Amruta Gokhale- Daeyoung Kim- Vinod Ganapathy- PROMOTO 2014
Personal story: Change in environment is hard! PROMOTO 2014Data-Driven Inference of API Mappings1 Nagpur, IndiaNew Jersey, USA
Personal story: Change in environment is hard! PROMOTO 2014Data-Driven Inference of API Mappings2 Nagpur, IndiaNew Jersey, USA 45° C - 10° C
Mobile app for a single platform PROMOTO 2014Data-Driven Inference of API Mappings3 iPhone app
PROMOTO 2014Data-Driven Inference of API Mappings4 BlackBerry 10 Android Windows Phone Challenge: Porting apps across multiple mobile platforms Windows Phone app BlackBerr y app Android app iPhone app
Porting assistance Porting to Windows Phone: –Developer guides for porting –Discussion forums on porting PROMOTO 2014Data-Driven Inference of API Mappings5
Challenges in porting apps PROMOTO 2014Data-Driven Inference of API Mappings6 Different SDKs for app development Different programming languages Different development environments Different debugging aids Every mobile platform exposes its own programming API Every mobile platform exposes its own programming API PlatformLanguageDevelopment Tools AndroidJavaEclipse iOSObjective CXCode Windows PhoneC#Visual Studio
iOS classiOS method name CGGeometryCGRect CGRectMake(CGFloat x, y, width, height) Returns a rectangle with the specified coordinate and size values. CGGeometrybool CGRectContainsPoint(CG Rect rect, CGPoint point) Returns whether a rectangle contains a specified point. Using API documentation to write app PROMOTO 2014 Data-Driven Inference of API Mappings7 iPhone App iPhone App
Using API documentation to write app PROMOTO 2014Data-Driven Inference of API Mappings8 Android class Android method name android.gra phics void drawRect(Rect r, Paint paint) Draws the specified Rect using specified Paint android.gra phics bool contains(int x, int y) Returns true if (x,y) is inside the rectangle. Android App Android App Android phone
Can we do better than searching API documentation for each new platform? PROMOTO 2014Data-Driven Inference of API Mappings9
APIs often have similar functionality PROMOTO 2014Data-Driven Inference of API Mappings10 Android class nameAndroid method name android.graphicsvoid drawRect(Rect r, Paint paint) android.graphicsbool contains(int x, int y) iOS class nameiOS method name CGGeometryCGRect CGRectMake (CGFloat x, y, width, height) CGGeometrybool CGRectContainsPoint (CGRect rect, CGPoint point)
API mapping databases PROMOTO 2014Data-Driven Inference of API Mappings11 API mapping databases map methods in a source API to methods in a target API iOS MethodAndroid Method CGGeometry.CGRectMake()android.graphics.drawRect() CGGeometry.CGRectContainsPoint()android.graphics.contains()
Platform APIs ~ Natural languages PROMOTO 2014Data-Driven Inference of API Mappings12 Source API Target API Unknown source language Unknown target language
PROMOTO 2014Data-Driven Inference of API Mappings13 English language text Spanish language text NLP Toolkit Word mappings English word Spanish word northnorte exitsalida Word mappings
Mappings between English and Spanish words PROMOTO 2014Data-Driven Inference of API Mappings14 enlarge- ment society state control import- ance amplifi- cacion estado sociedad import- ancia control
PROMOTO 2014Data-Driven Inference of API Mappings15 iOS API methods “text” Android API methods “text” NLP Toolkit API mappings iPhone API method Android API method CGRectMakedrawRect CGRect- ContainsPoint contains API method mappings
iOS and Android API methods’ mappings PROMOTO 2014Data-Driven Inference of API Mappings16 CGRectGet- Height CGRectGet- Width CGRectMake CGRectCont ainsPoint CGContext FillRect height drawRect width setStyle contains
API mapping tools PROMOTO 2014Data-Driven Inference of API Mappings17 windowsphone.interoperabilitybridges.com/porting API mappings from Android, iPhone to Windows Phone
Creating API mapping databases PROMOTO 2014Data-Driven Inference of API Mappings18 Mapping databases are populated manually by domain experts Painstaking, error-prone and expensive –Hard to evolve API mapping databases as the corresponding APIs evolve
Our contribution PROMOTO 2014Data-Driven Inference of API Mappings19 We propose to automatically create API mapping databases We propose to automatically create API mapping databases Prototyped in a tool called DDR (Data- Driven Rosetta) –Creates mappings between iOS API and Android API Leverages NLP approach to identify likely API mappings
Workflow of DDR PROMOTO 2014Data-Driven Inference of API Mappings20 Source Program Path Extraction Target Program Path Extraction NLP Inference Engine Source method Target method PR CGRect- Make() drawRect( ) 0.60 GetWidth()width()0.45 GetWidth() GetHeight() RectMake() ……… height() width() setStyle() drawRect() ……… Source Apps Target Apps Source Program Paths Target Program Paths Output Mappings
Program path extraction PROMOTO 2014Data-Driven Inference of API Mappings21 Dis- assembler Control flow graph constructor Program path extractor Mobile app binary Intermediate code representation Control flow graph Program paths
NLP Inference engine Matching Canonical Correlation Analysis (MCCA) [ACL `08*] 1.Define a generative model 2.Inference on the model done via Expectation-Maximization (EM) algorithm * Learning Bilingual Lexicons from Monolingual Corpora Haghighi et. al., ACL `08 PROMOTO 2014Data-Driven Inference of API Mappings22
Generative model PROMOTO 2014Data-Driven Inference of API Mappings23 Target feature extraction Source feature extraction Source word features Target word features Generative Model Seed Mappings
Generative model Features computed from individual languages: 1.Frequency of words 2.Substring properties 3.Context counts Features form the observed data explained via a generative process PROMOTO 2014Data-Driven Inference of API Mappings24
Relating a pair of mapped methods drawRect CGRectMake Common, hidden concept behind the generation processes 25 Generative model PROMOTO 2014Data-Driven Inference of API Mappings Target method features Source method features
Inference algorithm E-step: Find the maximum weighted (partial) bipartite matching M-step: Find the best parameters of the model by performing canonical correlation analysis (CCA) PROMOTO 2014Data-Driven Inference of API Mappings26
Our modifications to inference algorithm String similarity function: method names instead of method signatures Output: a list of top 10 mappings sorted in decreasing order of edge weights PROMOTO 2014Data-Driven Inference of API Mappings27
Implementation PROMOTO 2014Data-Driven Inference of API Mappings28 Collected 50 Android apps and 50 iOS apps 3,414 unique iOS API methods 2,229 unique Android API methods Evaluation under progress!
Conclusion It is becoming increasingly important to port apps to a variety of platforms Key challenge: Different platforms use different programming APIs API mapping databases help, but they are created manually by domain experts PROMOTO 2014Data-Driven Inference of API Mappings29 We presented a methodology to automate the creation of API mapping databases