BRETT STONE-GROSS, MARCO COVA, LORENZO CAVALLARO, BOB GILBERT, MARTIN SZYDLOWSKI, RICHARD KEMMERER, CHRISTOPHER KRUEGEL, AND GIOVANNI VIGNA PRESENTATION BY SAM KLOCK Your Botnet is my Botnet: Analysis of a Botnet Takeover
Background Botnet: network of machines compromised by malware (bots) under control of external agent Botmaster: agent controlling a botnet Command and control (C&C): mechanism by which botmaster controls a botnet
Motivation Botnets: big and growing security issue on the Internet More broadband Internet access makes them easier to build Wealth of information transported makes them profitable Sizeable botnets can participate in large-scale malicious acts We want to know more about them How do they grow? What can they do? How do we address the threats existing and potential botnets pose? How do we preempt their growth (address user vulnerabilities)?
Problem Analyzing botnets is difficult Topologies vary: top-down, P2P, random Protocols and goals vary Sizes vary widely Several techniques are typical Passive analysis: collect spam likely sent from bots; analyze query patterns to DNS/DNSBL; examine network traffic Can’t scale to entire Internet Some metrics only work for botnets engaging in certain activities Infiltration: join the botnet and monitor Most botnets avoid supplying information to member bots Images: Wang, Sparks, Zou, “An Advanced Hybrid Peer-to-Peer Botnet”, in IEEE Transactions on Dependable and Secure Computing, 7(2):
Approach Hijack the botnet Idea: investigate botnet C&C, then tamper with it Learn about botnet behavior from perspective of botmaster Two ways to accomplish Seize botmaster’s C&C machines Law enforcement’s job Better: collaborate with DNS providers Goal: redirect C&C traffic to us Then mimic C&C behavior Approach depends on targeted botnet
Target: Torpig “One of the most advanced pieces of crimeware ever created” Mainly harvests personal information Opens ports for HTTP and SOCKS on victim machines Useful for anonymous browsing, sending spam Not yet clear what Torpig does with them Good candidate for DNS-based hijacking Centralized C&C Bots identify C&C via domain names Communication via HTTP
Torpig vs. Others Torpig has interesting characteristics Domain flux Bot identifiers A lot of harvested information Implementable protocol Past attempts: Conficker: no bot identifiers, protocol authentication Size estimation is hard No authentication no data Kraken: no data collection (spam sending) Little insight into data harvesting
Torpig Characteristics Basic idea: Trojan-horsed based rootkit Uses Mebroot Attack via drive-by-download Vulnerable web server Vulnerable client/OS Install Mebroot, then install Torpig malware (0) Inject URL (1) Client HTTP GET (2) Deliver injected URL (3) Client HTTP GET from DbD server (4) Download & run Mebroot
Torpig Characteristics (cont’d) (5) Fetch Torpig libraries (6) Configure, monitor (7) Execute man-in-the- browser phishing
Bot Behavior Periodic C&C communication ~20 minutes Uploads harvested data Server responds okn or okc Man-in-the-browser more complex List of targeted URLs Requests sent to special injection server Bypasses SSL, certificates, etc. Can be hijacked (not attempted here)
Hijacking Torpig Domain flux Related to fast flux C&C hidden behind shifting domains Bots generate list of domains to check periodically Iterate through list; stop on valid response Domain generation algorithm (DGA) reverse- engineered Botmaster didn’t register domains in advance: big weakness Pseudocode for daily DGA
Hijacking Torpig (cont’d) Conceptually simple with DGA, protocol, botmaster carelessness Register domains first Mimic protocol (encryption easily broken) Not a general approach Conficker: 50,000 domains per day Nondeterministic Estimated cost: > $91.3m per year In practice: Two different hosting providers, two different registrars Redundancy Apache handled requests Data obtained downloaded and discarded from hosts Total: 8.7 GB Apache logs, 69 GB pcap Up three weeks, collected ten days
Hijacking Torpig (cont’d) Legal/ethical implications Botnet is a criminal instrument Precedent in past research Follow-through: No new config ( okn only) Shared data with DoD, FBI, ISPs
Torpig Data Format Communication via HTTP POST URL: bot ID ( nid ), header Body: stolen data Header info: ts ip hport, sport os, cn bld, ver
Torpig Data Collected
Analysis: Botnet Size nid may be used to count bots Computed from HDD model/serial Not completely unique: couple with os, cn, bld, ver Subtract researchers, probes, casual machines Found 182,800 likely infected hosts Identifying researchers Intuition: analyze in controlled environment Use virtual machine VMs have default hardware specs (HDD model/serial) Eliminate nid s computed from VM defaults Discounted 40 hosts
Analysis: Botnet Size (cont’d) Much more accurate than IP counting DHCP churn causes overcount 706 machines: > 100 IPs One host: 694 unique IPs NAT causes undercount 1,247,642 unique IPs vs. 182,800 est. bots Traffic characteristics Peaks at 9am PST, troughs 9pm PST Within hour: unique IPs = unique bots Within day: unique IPs > unique bots
Analysis: Botnet Growth Most bots in U.S., Germany, Italy Intuition: targeted websites mainly English, German, Italian IP counting overestimates Italian/German infections Found 49,294 new infections Most on Jan 25, 27 How? ts = 0
Analysis: Botnet as Service Why bld ? Twelve different values Some values more active than others dxtrbc : 5,432,528 submissions mentat : 1,582,547 submissions Features do not seem to differ from build to build Explanation: customers Treat bld as identifier for customers Can process output on basis of customer payment, wants Q: Paper doesn’t mention distribution of builds over members. Could build activity be attributable to that?
Analysis: Stolen Data Institutional data 8,310 accounts, 410 institutions Paypal (1,770) Poste Italiane (765) Capital One (314) E*Trade (304) Chase (214) 310 institutions: < 10 accounts Notifying victims: complicated 38% credentials stolen from password managers
Analysis: Stolen Data (cont’d) Credit cards Checked prefixes, used Luhn heuristic Found 1,660 unique debit/credit card numbers 1,056 Visa 447 MasterCard 81 American Express 36 Maestro 24 Discover 49% in U.S., 12% in Italy, 8% in Spain, rest in 40 others 86%: only one card number One case: 30 numbers Value (via Symantec): $0.10 to $25 per card $10 to $1000 per account $83k to $8.3m over ten days: profitable Assumes all data is fresh
Analysis: Proxies and Other Uses HTTP/SOCKS proxies 20.2% machines public accessible Looked at 10,000 most active IPs Most likely to be used Checked IPs against Spamhaus list One is known spammer 244 flagged as proxies or malware-infected Conclusion: usable, but can’t claim current use Distributed denial-of- service (DDoS) Question: how much bandwidth? Looked up connection types for IPs via ip2location 65% analyzable IPs used cable/DSL Low baseline of 435 kbps upstream: 19 Gbps total Add in corporate connections (22%) – much higher Caveat: could not look up for two-thirds of hosts
Analysis: Passwords Sophos poll (March 2009): 33% of Internet users use poor password practices (n = 676) Torpig supplied a lot of passwords: we can validate 297,262 user/password pairs from 52,540 machines 28% reused passwords for 368,501 sites, similar to Sophos Password strength Fed 173,686 unique passwords to John the Ripper 65 minutes: ~56,000 cracked (simple replacement) +10 minutes: ~14,000 cracked (wordlist) +24 hours: ~30,000 cracked (brute force) 40% cracked in < 75 mins
Conclusion Contributions: Comprehensive analysis of Torpig Insight into victims Usability of botnets for fun, profit, attack Lessons: IP-counting wildly imprecise. Do not use it User culture is a big problem Lots of passwords were guessed easily in this sample Intuition: users do not understand usage risks Solution: educate, educate, educate Coordination with registrars, hosting facilities, victim institutions, law enforcement is hard Makes redressing victims difficult Solution: regulatory intervention