CIC Identifying smart contract users by analyzing their coding style

Slides:



Advertisements
Similar presentations
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Advertisements

Data Mining For Credit Card Fraud: A Comparative Study
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Computer Security Lab Concordia Institute for Information Systems Engineering Concordia University Montreal, Canada A Novel Approach of Mining Write-Prints.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
Program Representations. Representing programs Goals.
Graph Drawing Zsuzsanna Hollander. Reviewed Papers Effective Graph Visualization via Node Grouping Janet M. Six and Ioannis G. Tollis. Proc InfoVis 2001.
1 Static Testing: defect prevention SIM objectives Able to list various type of structured group examinations (manual checking) Able to statically.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Automated malware classification based on network behavior
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
A.C. Chen ADL M Zubair Rafique Muhammad Khurram Khan Khaled Alghathbar Muddassar Farooq The 8th FTRA International Conference on Secure and.
Software Metrics *** state of the art, weak points and possible improvements Gordana Rakić, Zoran Budimac Department of Mathematics and Informatics, Faculty.
Automatic methods for functional annotation of sequences Petri Törönen.
STYLOMETRY IN IR SYSTEMS Leyla BİLGE Büşra ÇELİKKAYA Kardelen HATUN.
Statistical analysis of Skype conversations: recognizing individuals by their chatting style Candidato : Cristina Segalin Relatore: Dr. Marco Cristani.
Detecting Promotional Content in Wikipedia Shruti Bhosale Heath Vinicombe Ray Mooney University of Texas at Austin 1.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Towards the better software metrics tool motivation and the first experiences Gordana Rakić Zoran Budimac.
Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
ISQS 7342 Dr. zhangxi Lin By: Tej Pulapa. DT in Forecasting Targeted Marketing - Know before hand what an online customer loves to see or hear about.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Data -Data is the raw materials from which information is generated. -Data are raw facts or observations typically about physical phenomena or business.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
DeepWalk: Online Learning of Social Representations
NVivo Software – A Qualitative Research
A Simple Approach for Author Profiling in MapReduce
Learning to Detect and Classify Malicious Executables in the Wild by J
Compiler Design (40-414) Main Text Book:
Introduction to Compiler Construction
A Straightforward Author Profiling Approach in MapReduce
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
Control Flow Testing Handouts
Clustering of Web pages
Modeling Adversarial Activity (MAA)
SOFTWARE DESIGN AND ARCHITECTURE
Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico.
Research in Computational Molecular Biology , Vol (2008)
Online Training Course
Main Project total points: 500
Task-based assessment of students’ computational thinking skills developed through visual programming or tangible coding environments Takam Djambong.
Dieudo Mulamba November 2017
Unknown Malware Detection Using Network Traffic Classification
Postdoc, School of Information, University of Arizona
CSc4730/6730 Scientific Visualization
Advanced Algorithms Analysis and Design
2. An overview of SDMX (What is SDMX? Part I)
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Word embeddings based mapping
Word embeddings based mapping
CIC Attacking developer’s identity in open-source projects
Online Training Course
Data Structure and Algorithms
Reengineering the Audit with Blockchain and Smart Contracts
Analysing Vulnerabilities in Smart Contracts
Huffman Coding Greedy Algorithm
Modeling IDS using hybrid intelligent systems
Unrooted neighbor-joining tree of 16S rRNA gene sequences from low-G+C-content gram-positive bacteria, obtained from clone libraries. Unrooted neighbor-joining.
Presentation transcript:

CIC Identifying smart contract users by analyzing their coding style Alina Matyukhina, Shlomi Linoy, Nguyen Cong Van, Rongxing Lu, Natalia Stakhanova Canadian Institute for Cybersecurity, University of New Brunswick CIC ABSTRACT In blockchain, users are identified by user accounts (account address only). An attacker wishing to de-anonymize its users will attempt to construct a one-to-many mapping between a user and an account addresses and associate information external to the system with the users. Blockchain tries to prevent this attack by storing the mapping of a user to their account addresses only where each user can generate as many account addresses as required. This project seeks to better understand the traceability of smart contracts owners (authors) and, through this understanding, explore the possibility of de-anonymizing the smart contract owners by their coding style using authorship attribution techniques. If the likability of two different smart contract addresses to the same user is possible, the adversary can use such techniques to link all the agreements, transactions that these addresses participate in, therefore it is a serious threat on smart contract users anonymity. Research Problem Previous research Related work Number of authors Number of features Accuracy Source code attribution Dauber et al. 106 451,368 73% Caliskan et al 1600 120,000 93% Binary code attribution Alrabaee et al. 10 6,500 80% Caliskan-Islam et al. 600 4,500 83% Rosenblum et al. 190 10,000 95% Dataset Contents Our Approach Dataset Keys Contracts Av. contracts/key Min contracts/key LOC Set A 585 4834 8 4 394.67 Set B 5086 65624 11 124.59 Description of feature set LEVEL FEATURE DESCRIPTION Source code TF unigrams Term frequency of word unigrams in source code after tokenization the code AST features Derived from abstract-syntax tree (max depth of AST, etc.) Layout features Type of comments, type brackets, spaces (tabs), lines Bytecode (opcode) Idioms Short sequences of instructions intended for capturing stylistic characteristics CFG graphlets 3-node subgraphs of the CFG (control-flow graph) CFG supergraphlets Obtained by collapsing and merging neighbor nodes of the CFG Libcalls Function names of imported libraries N-grams Short sequences of opcode of length N . Results Data Number of keys Number of contracts Type of features Number of features After info gain Classifier Accuracy Source code 585 4834 TF unigrams 143200 1275 Random Forest 75.88% Contract ABI 18944 31 58.56% Contract opcode 44504 44 62.7% Conclusion and Future Works: We obtain more than 75% accuracy after classification authors of Solidity source code and more than 60% on bytecode by easy-to-extract features- TF unigrams Further study on the features extracted from AST and CFG Clustering contract accounts by their users