Chasing those perfect Hash functions… by DANIEL “3ICE” BEREZVAI ELTE.3ICE.HU/2013-2014-2/ALG2/GY/ALG2 GY 2 HF.

Slides:



Advertisements
Similar presentations
By Wild King. Generally speaking, a rainbow table is a lookup table which is used to recover the plain-text password that derives from a hashing or cryptographic.
Advertisements

The Assembly Language Level
Chapter Chapter 4. Think back to any very difficult quantitative problem that you had to solve in some science class How long did it take? How many times.
Software Engineering and Design Principles Chapter 1.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
CS503: First Lecture, Fall 2008 Michael Barnathan.
CS503: Tenth Lecture, Fall 2008 Review Michael Barnathan.
Complexity (Running Time)
2/9/2007EECS150 Lab Lecture #41 Debugging EECS150 Spring2007 – Lab Lecture #4 Laura Pelton Greg Gibeling.
CS 240: Data Structures Thursday, July 12 th Sorting – Bubble, Insertion, Quicksort, Mergesort, Analysis, STL.
EMB1006 Boolean Logic Jonathan-Lee Jones.
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
Algebra Problems… Solutions
Introducing Java.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
Recursion, Complexity, and Searching and Sorting By Andrew Zeng.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 17: Recursion.
Lists in Python.
Files COP3275 – PROGRAMMING USING C DIEGO J. RIVERA-GUTIERREZ.
Welcome to my conference! February th grade Guadalupe.
Prof. Matthew Hertz WTC 207D /
Phoenix Software Projects Larry Beaty © 2007 Larry Beaty. Copying and distribution of this document is permitted in any medium, provided this notice is.
1 Project Information and Acceptance Testing Integrating Your Code Final Code Submission Acceptance Testing Other Advice and Reminders.
Recursion, Complexity, and Sorting By Andrew Zeng.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
October, 2006 © Copyright 2006, Larry A. Beaty. Copying and distribution of this document is permitted in any medium, provided this notice is preserved.
CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
1 Turgay Korkmaz Office: NPB Phone: (210) Fax: (210) web:
Linked List. Background Arrays has certain disadvantages as data storage structures. ▫In an unordered array, searching is slow ▫In an ordered array, insertion.
Plagiarism. Doing research puts you in a position to present views relevant to your topic other than your own. You will discover many interesting ideas.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Documentation. Your documentation must fit the needs of your audience. It’s always better to say one thing that is useful, as opposed to many things that.
1 Project Management Example Solving Sudoku 2 What is Sudoku?  Sudoku is a game with 9 columns and 9 rows and 9 “boxes” composed of a 3 x 3 Grid  Numbers.
1 System Clock and Clock Synchronization.. System Clock Background Although modern computers are quite fast and getting faster all the time, they still.
Opening Slide You’re About to Discover the One Secret “__________” That Makes it Super- Easy to ____________________ That Allows You to __________________and.
Intermediate 2 Computing Unit 2 - Software Development.
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
M1G Introduction to Programming 2 3. Creating Classes: Room and Item.
By Mr. Putnam. In Catfall, the goal of the game is to touch the falling cats with the mouse. Every time you touch a cat, your score goes up by one point.
Chapter 15: Recursion. Objectives In this chapter, you will: – Learn about recursive definitions – Explore the base case and the general case of a recursive.
GCSE Computing: Programming GCSE Programming Remembering Python.
Intro To Algorithms Searching and Sorting. Searching A common task for a computer is to find a block of data A common task for a computer is to find a.
© The McGraw-Hill Companies, 2006 Chapter 3 Iteration.
ECE297 TA GUIDE Project supervision. Agenda M0 feedback Project overview M1 overview Project supervision.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
BIT 115: Introduction To Programming Professor: Dr. Baba Kofi Weusijana (say Doc-tor Way-oo-see-jah-nah, Doc-tor, or Bah-bah)
Mergesort and Quicksort Opening Discussion zWhat did we talk about last class? zDo you have any questions about assignment #4? Have you thought.
An Interview Dialogue Name: Period:. Step Five Interview- An Interview Dialogue You are going to read the question and pick the best response. The person.
BIT 115: Introduction To Programming Professor: Dr. Baba Kofi Weusijana (say Doc-tor Way-oo-see-jah-nah, Doc-tor, or Bah-bah)
Chapter 15: Recursion. Recursive Definitions Recursion: solving a problem by reducing it to smaller versions of itself – Provides a powerful way to solve.
Chapter 15: Recursion. Objectives In this chapter, you will: – Learn about recursive definitions – Explore the base case and the general case of a recursive.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
Computer Programming 12 Lesson 6 – Loop structure By: Dan Lunney.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Recursion Version 1.0.
Repeating code We could repeat code we need more than once: i = 1 print (i) i += 1 print (i) #… stop when i == 9 But each line means an extra line we might.
CSE Social Media & Text Analytics
Looping and Random Numbers
We’re moving on to more recap from other programming languages
Coding Concepts (Basics)
Augmented Data Structures and the Test
EECS150 Fall 2007 – Lab Lecture #4 Shah Bawany
SUPER SUCCESS SERIES TIME MANAGEMENT VOL. 1
Asymptotic complexity Searching/sorting
Fundamentals of Functional Programming
Software Development Techniques
CSE 326: Data Structures Lecture #14
Presentation transcript:

Chasing those perfect Hash functions… by DANIEL “3ICE” BEREZVAI ELTE.3ICE.HU/ /ALG2/GY/ALG2 GY 2 HF

Presentations are not allowed “PowerPointos prezentációt erre a feladatra nem lehet készíteni, csak programot.” I immediately thought – Lets make a presentation then! Of course I wrote 6 apps too. DANIEL "3ICE" BEREZVAI 2

I don’t want to be a copy cat… This assignment has been unchanged for years, I checked. Everyone did the same boring thing last year: (97*N1–79*N2+5*N3*N3–67*N4+99*N5–26*N6)%17 Or improvised only a little: √(A/4)+B/4−4*C+D+4*(4/E)−F So I set out to do something unique instead. Excel macros, Focusing on speed, Cheating, Minimalist hashers that exclude more than half of the input characters, Coding it all in pure ANSI C for speed, Experimenting with different invariants, And more! DANIEL "3ICE" BEREZVAI 3

Excel: Finding min. collisions 3ICE-Minimal-Hash.xlsx DANIEL "3ICE" BEREZVAI 4

Excel demo points I explain every column extensively via comments – If you are interested, download the spreadsheet from my website and read it for yourself: Even documented the helper column! Excel - Comments on every column DANIEL "3ICE" BEREZVAI 5

Competition – What I’m up against from last year… Only does 7000 checks per second DANIEL "3ICE" BEREZVAI 6 This is a beautiful, fully feature-complete app. But it’s obviously very, very slow…

What I achieved Even my lazy random java implementation is relatively fast (1 minute to find 3 solutions) That’s checks per second! This is on my quad core gaming Laptop of course, with 2 GPUs helping out… (CUDA assist technology) DANIEL "3ICE" BEREZVAI 7

8

Finding 30 solutions in 11 minutes Solution: (14a+72b+859c+89d+517e+660f) (mod 767) (mod 19) Solution: (242a+454b+675c+732d+486e+685f) (mod 312) (mod 19) Solution: (701a+407b+562c+707d+539e+946f) (mod 724) (mod 19) Solution: (747a+732b+975c+316d+767e+701f) (mod 538) (mod 19) Solution: (782a+940b+801c+267d+835e+405f) (mod 327) (mod 19) Solution: (5a+525b+65c+431d+104e+802f) (mod 175) (mod 19) Tested random combinations. Solution: (690a+932b+172c+533d+847e+199f) (mod 848) (mod 19) Solution: (154a+505b+49c+170d+304e+66f) (mod 158) (mod 19) DANIEL "3ICE" BEREZVAI 9

At this point it has checked over 200 million combinations Solution: (753a+377b+866c+734d+226e+65f) (mod 214) (mod 19) Solution: (422a+104b+103c+864d+707e+44f) (mod 29) (mod 19) Solution: (405a+4b+537c+235d+452e+487f) (mod 544) (mod 19) Tested random combinations. Solution: (857a+567b+630c+914d+909e+448f) (mod 541) (mod 19) Solution: (44a+147b+424c+157d+403e+664f) (mod 81) (mod 19) Solution: (594a+882b+610c+148d+53e+352f) (mod 733) (mod 19) Solution: (95a+826b+173c+124d+756e+210f) (mod 826) (mod 19) Solution: (501a+913b+736c+580d+397e+412f) (mod 5) (mod 19) DANIEL "3ICE" BEREZVAI 10

We are half way done after only 5 minutes of work. Solution: (381a+395b+206c+733d+359e+248f) (mod 173) (mod 19) Solution: (481a+29b+482c+303d+490e+503f) (mod 290) (mod 19) Tested random combinations. Solution: (768a+154b+580c+135d+279e+332f) (mod 699) (mod 19) Solution: (518a+591b+493c+622d+534e+556f) (mod 560) (mod 19) Solution: (972a+386b+519c+686d+443e+610f) (mod 107) (mod 19) Solution: (66a+377b+83c+377d+735e+980f) (mod 863) (mod 19) Solution: (816a+411b+87c+978d+757e+705f) (mod 567) (mod 19) Solution: (866a+43b+921c+0d+422e+310f) (mod 793) (mod 19) DANIEL "3ICE" BEREZVAI 11

Finding 30 solutions took 11 minutes, with over 500 million generated hash functions checked Tested random combinations. Solution: (126a+208b+625c+509d+165e+976f) (mod 532) (mod 19) Solution: (288a+492b+640c+178d+688e+392f) (mod 127) (mod 19) Solution: (830a+447b+859c+222d+42e+41f) (mod 966) (mod 19) Solution: (772a+643b+255c+962d+732e+947f) (mod 50) (mod 19) Solution: (325a+77b+494c+152d+976e+308f) (mod 326) (mod 19) Tested random combinations. Solution: (552a+854b+857c+426d+886e+262f) (mod 145) (mod 19) Done. Found 30 solutions after iterations. BUILD SUCCESSFUL (total time: 11 minutes 56 seconds) DANIEL "3ICE" BEREZVAI 12

Even faster solve speed in a pure ANSI c implementation DANIEL "3ICE" BEREZVAI That is solutions tested per second 13

DANIEL "3ICE" BEREZVAI Here is a failed run on a small pool of possible combinations (477 million tested in 50 seconds… ≈10 million per second) 14 This is of course on the full code pool (19), while the previous screenshot shows a reduced pool of just 18 codes hashed.

How did I code the C version? Unconventionally… DANIEL "3ICE" BEREZVAI It all starts with a 7 levels deep nested loop… (See the code in main.c for commented version) 15

Main program logic showing just one code being checked DANIEL "3ICE" BEREZVAI 16

DANIEL "3ICE" BEREZVAI A “loop” without iteration overhead or array pointer arithmetic, using hardcoded values: 17

DANIEL "3ICE" BEREZVAI Repeat code would look so much better in a loop, wouldn’t it? But I’d lose more performance! This would be the nested loop’s 8 th level. Even 7 is far too deep, we are only supposed to use 1. 18

Computation time goes up ridiculously fast! This is worse than n 2 ! Exponential doesn’t compare… 16 or less codes? <1 second! 17 codes: 1 minute 18 codes: 50 minutes 19 codes: 70+ hours! DANIEL "3ICE" BEREZVAI 19

Therefore, some groups had an easier job than others :) Group 7 were only given 13 codes. They had an easy time, spending 1 second to calculate all their required solutions. Group “E” had 18 codes to work with, their apps were done in an hour. Group 8 – my group – had 19 codes, however. That means I’d have to leave my computer on for days if I wanted to calculate solutions the usual way… Except I didn’t have that kind of time! I only started working on this project the previous night and finished just before the deadline. Also, I’d be burning quite a bit of life out of my CPU and GPUs if I ran them at 100%, or at least near-peak performance for days. So I had to improvise. DANIEL "3ICE" BEREZVAI The end. 20