Web Security CS 236 Advanced Computer Security Peter Reiher April 22, 2008
Groups for This Week Golita Benoodi, Zhen Huang, Ioannis Pefkianakis Darrell Carbajal, Vishwa Goudar, Jason Liu Andrew Castner, Michael Hall, Kuo-Yen Lo Chia-Wei Chang, Nikolay Laptev, Hootan Nikbakht Chien-Chia Chen, Chen-Kuei Lee, Faraz Zahabian Yu Yuan Chen, Chieh-Ning Lien, Min-Hsieh Tsai Michael Cohen, Jih-Chung Fan, Abhishek Jain Peter Wu, Peter Peterson
Outline Security on the web Some approaches An idea to critique
Web Security Lots of Internet traffic is related to the web Much of it is financial in nature Also lots of private information flow around web applications An obvious target for attackers
The Web Security Problem Many users interact with many servers Most parties have little other relationship Increasingly complex things are moved via the web No central authority Many developers with little security experience Many critical elements originally designed with no thought to security Sort of a microcosm of the overall security problem
Aspects of the Web Problem
Who Are We Protecting? The clients From the server From the client From each other
What Are We Protecting? The client’s private data The server’s private data The integrity (maybe secrecy) of their transactions The client and server’s machines Possibly server availability For particular clients?
Some Real Threats Buffer overflows and other compromises Client attacks server SQL injection Malicious downloaded code Server attacks client
More Threats Cross-site scripting Clients attack each other Threats based on non-transactional nature of communication Client attacks server Denial of service attacks Threats on server availability (usually)
Compromise Threats Much the same as for any other network application Web server might have buffer overflow Or other remotely usable flaw Not different in character from any other application’s problem
What Makes It Worse Web servers are complex They often also run supporting code Which is often user-visible Large, complex code base is likely to contain such flaws Nature of application demands allowing remote use
Solution Approaches Patching Use good code base Minimize code server executes Maybe restrict server access When that makes sense Lots of testing and evaluation
SQL Injection Attacks Many web servers have backing databases Much of their information stored in database Web pages are built (in part) based on queries to database Possibly using some client input . . .
SQL Injection Mechanics Server plans to build a SQL query Needs some data from client to build it E.g., client’s user name Server asks client for data Client, instead, provides a SQL fragment Server inserts it into planned query Leading to a “somewhat different” query
An Example Intent is that user fills in his ID and password “select * from mysql.user where username = ‘ “ . $uid . “ ‘ and password=password(‘ “. $pwd “ ‘);” Intent is that user fills in his ID and password What if he fills in something else? ‘or 1=1; -- ‘
What Happens Then? $uid has the string substituted, yielding “select * from mysql.user where username = ‘ ‘ or 1=1; -- ‘ ‘ and password=password(‘ “. $pwd “ ‘);” This evaluates to true Since 1 does indeed equal 1 And -- comments out rest of line If script uses truth of statement to determine valid login, attacker has logged in
Basis of SQL Injection Problem Unvalidated input Server expected plain data Got back SQL commands Didn’t recognize the difference and went ahead Resulting in arbitrary SQL query being sent to its database With its privileges
Solution Approaches Carefully examine all input To filter out injected SQL Use database access controls Of limited value Randomization of SQL keywords Making injected SQL meaningless
Malicious Downloaded Code Modern web relies heavily on downloaded code Full language and scripting language Instructions downloaded from server to client Run by client on his machine Using his privileges Without defense, script could do anything
Types of Downloaded Code Java Full programming language Scripting languages Java Script VB Script ECMAScript XSLT
Solution Approaches Disable scripts Not very popular Use secure scripting languages Also not popular Particularly with code writers Isolation mechanisms VM or application-based Vista mandatory access control
Cross-Site Scripting XSS Many sites allow users to upload information Blogs, photo sharing, Facebook, etc. Which gets permanently stored And displayed Attack based on uploading a script Other users inadvertently download it And run it . . .
The Effect of XSS Arbitrary malicious script executes on user’s machine In context of his web browser At best, runs with privileges of the site storing the script Often likely to run at full user privileges
Why Is XSS Common? Use of scripting languages widespread For legitimate purposes Most users leave them enabled in browser Only a question of getting user to run your script Often only requires fetching URL
Typical Effects of XSS Attack Most commonly used to steal personal information That is available to legit web site User IDs, passwords, credit card numbers, etc. Such information often stored in cookies at client side
Solution Approaches Don’t allow uploading of scripts Usually by carefully analyzing uploaded data Provide some form of protection in browser
Exploiting Statelessness HTTP is designed to be stateless But many useful web interactions are stateful Various tricks used to achieve statefulness Usually requiring programmers to provide the state Often trying to minimize work for the server
A Simple Example Web sites are set up as graphs of links You start at some predefined point A top level page, e.g. And you traverse links to get to other pages But HTTP doesn’t “keep track” of where you’ve been Each request is simply the name of a link
Why Is That a Problem? What if there are unlinked pages on the server? Should a user be able to reach those merely by naming them? Is that what the site designers intended?
A Concrete Example The ApplyYourself system Used by colleges to handle student applications For example, by Harvard Business School in 2005 Once admissions decisions made, results available to students
What Went Wrong? Pages representing results were created as decisions were made Stored on the web server But not linked to anything, since results not yet released Some appliers figured out how to craft URLs to access their pages Finding out early if they were admitted
Another Example A e-commerce site that uses a shopping cart paradigm Users collect things they plan to buy in a shopping cart HTTP won’t help maintain state So Cartman software stored shopping cart state in a cookie Which wasn’t encrypted . . .
What Went Wrong? The cookie held the names of the products And the offered prices Cookies were kept by users Presented on each web request Users could alter cookies to set their own prices
Another Example Airlines offer on-line choice of seats to users buying tickets Different seats offered to different people Depending on price, frequent flyer status, etc. Users consult map to choose seat Choice represented in HTTP message
What Went Wrong? Since HTTP is stateless, no memory of seat map that was sent to a user If user crafts URL asking for seat he wasn’t offered, System doesn’t know it wasn’t offered Allowed users to request seats the airline didn’t want to offer to them
The Common Thread No protocol memory of what came before So no protocol way to determine that response matches request Could be built into the application that handles requests But frequently isn’t Or is wrong
Solution Approaches Get better programmers Or better programming tools Back end system that maintains and compares state Front end program that observes requests and responses Producing state as a result
A Proposed Version of the Latter A system sitting between web server and the users Observes all requests going to server And responses to users Builds stateful representation of communication Catches attempts to deceive based on underlying statelessness
Seats 12A, 15F, and 28B available Conceptually, REJECTED! Request list of seats Seats 12A, 15F, and 28B available We didn’t offer 5C! I choose seat 5C
How Do We Do This? Keep track of state of communications between client and server Match requests and responses Understand nature of server messages well enough to analyze client responses Figure out what’s OK and what’s not
Other Things You Can Do This Way Use static analysis of server content to find unlinked pages Reject requests to get them Analyze client side scripts To deduce plausible, acceptable results
Why Do it This Way? Works with unaltered web server applications Of arbitrary character Provides second line of defense if apps poorly written Handles issues of incautious server maintenance
Challenges to the Approach How well can you understand the flow? Matching requests and responses properly Special cases like bookmarks Analyzing general scripts Isn’t that solving the halting problem?
Another Issue for Web Security How could we design more secure web activity? Without altering the general paradigm Without requiring users to learn anything Without assuming programmers get better
Are There Other Approaches? What else can we do? Should we help web programmers? Should we help clients? Should we build standalone appliances? Should we build tools that analyze and correct servers?
Is the Core Approach Wrong? Is statelessness just a bad security idea? Or is it the fault of the design of web servers? Or is it the fault of design of operating systems? E.g., should we be able to attach/enforce policies to files? Or the fault of ivory tower professors who don’t teach students how to code safely?
A Philosophical Question Ultimately, how stupid can various players in the Internet be, given that the Internet must be secured to a chosen degree? How stupid can users be? How stupid can programmers be? How stupid can administrators be? How much can a few smart people/programs/systems do to cover for the stupid?