Hashing THEN AND NOW MIKE SMORUL – ADAPT PROJECT
Commodity Storage Performance 2003 JetStor III IDE-FC 62MB/s large block 2013 218MB/s workstation SSD Perc 6/MD1000, 400MB/s+
Chip Speed 2003: Pentium 4 3.2Ghz 2013: Core i7 Extreme 3.5Ghz
Hashing Performance SHA-256 Hashing Java: 85MB/s Crypto++: MB/s Real World Penalty Java: 20-40% penalty on slow seek disk
Implications Flipped bottlenecks
How to overcome Faster/weaker digests Simultaneous transfers Data locality, tape? Improve single stream performance
Parallelize Single Stream Independent IO and digest threads Always have work for the digest algorithm. Large files saw over 95% of algorithm potential. Small files unchanged.
Securing Data in Motion ?
Where to apply fixity Internal integrity services At Transfer via manifests End to End?
Operational Integrity Internal Auditing Prove your hardware Error, not malice detection Peer-Auditing Prove your friends
Transporting Integrity Manifest Lists Transfer validation Digital Signatures Prove identity Token Based Prove time
Chronopolis Integrity Current: Producer supplied authoritative manifest Peers locally monitor integrity Manually trace back to point of ingest
Chronopolis Integrity In-progress Single integrity token back to ingest Ideal Tokens issued prior to arrival ‘Prove’ the state of data to point before Chronopolis
Manifests 2.0 Beyond simple transfer list Token manifests Portable, embeddable Python, etc
Cloud Integrity Digests in a cloud validate transfer only Http headers can pass extended integrity information End-user verification
Integrity as provenance Integrity checking forward in time Consumer level verification of data Integrity from object creation Start integrity checking before archiving
Closing Why are you hashing? What do you want to prove? Hashing Cost/performance
Contact Mike Smorul