M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #23 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Previously: Security Next: Finish Security XML Hardware Project part 5 is up >1 multi-table query Cite (in app) any sources of data! Returning project parts 3,4 1-minute responses
M.P. Johnson, DBMS, Stern/NYU, Sp Review: Why security is hard It’s a “negative deliverable” It’s an asymmetric threat Tolstoy: “Happy families are all alike; every unhappy family is unhappy in its own way.” Analogs: “homeland”, jails, debugging, proof- reading, Popperian science, fishing, MC algs So: fix biggest problems first
M.P. Johnson, DBMS, Stern/NYU, Sp Injection attacks – MySQL/Perl/PHP Consider another input: user: your-boss pass: ' OR 1=1 OR pass = ' SELECT * FROM users WHERE user = u AND pass = p; SELECT * FROM users WHERE user = u AND pass = p; SELECT * FROM users WHERE user = 'your-boss' AND password = ' ' OR 1=1 OR pass = ' '; SELECT * FROM users WHERE user = 'your-boss' AND password = ' ' OR 1=1 OR pass = ' '; Copy from: SELECT * FROM users WHERE user = 'your-boss' AND pass = '' OR 1=1 OR pass = ''; SELECT * FROM users WHERE user = 'your-boss' AND pass = '' OR 1=1 OR pass = '';
M.P. Johnson, DBMS, Stern/NYU, Sp Multi-command injection attacks Consider another input: user: ' ; DROP TABLE users; SELECT FROM users WHERE pass = ' pass: abc SELECT * FROM users WHERE user = u AND pass = p; SELECT * FROM users WHERE user = u AND pass = p; SELECT * FROM users WHERE user = ' ' ; DROP TABLE users; SELECT FROM users WHERE password = ' ' AND password = 'abc'; SELECT * FROM users WHERE user = ' ' ; DROP TABLE users; SELECT FROM users WHERE password = ' ' AND password = 'abc'; SELECT * FROM users WHERE user = ''; DROP TABLE users; SELECT FROM users WHERE pass = '' AND pass = 'abc'; SELECT * FROM users WHERE user = ''; DROP TABLE users; SELECT FROM users WHERE pass = '' AND pass = 'abc';
M.P. Johnson, DBMS, Stern/NYU, Sp Multi-command injection attacks Consider another input: user: ' ; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE pass = ' pass: abc SELECT * FROM users WHERE user = u AND pass = p; SELECT * FROM users WHERE user = u AND pass = p; SELECT * FROM users WHERE user = ' ' ; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE password = ' ' AND password = 'abc'; SELECT * FROM users WHERE user = ' ' ; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE password = ' ' AND password = 'abc'; SELECT * FROM users WHERE user = ''; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE pass = '' AND pass = 'abc'; SELECT * FROM users WHERE user = ''; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE pass = '' AND pass = 'abc';
M.P. Johnson, DBMS, Stern/NYU, Sp Injection attacks – MySQL/Perl/PHP Consider another input: user: ' OR 1=1 OR user = ' (corrected!) pass: ' OR 1=1 OR user = ' Delete everyone! DELETE FROM users WHERE user = u AND pass = p; DELETE FROM users WHERE user = u AND pass = p; DELETE FROM users WHERE user = ' ' OR 1=1 OR user = ' ' AND pass = ' ' OR 1=1 OR user = ' '; DELETE FROM users WHERE user = ' ' OR 1=1 OR user = ' ' AND pass = ' ' OR 1=1 OR user = ' '; DELETE FROM users WHERE user = '' OR 1=1 OR user = '' AND pass = '' OR 1=1 OR user = ''; DELETE FROM users WHERE user = '' OR 1=1 OR user = '' AND pass = '' OR 1=1 OR user = '';
M.P. Johnson, DBMS, Stern/NYU, Sp Preventing injection attacks Source of problem (in SQL case): use of quotes Soln 1: don’t allow quotes! Reject any entered data containing single quotes Q: Is this satisfactory? Does Amazon need to sell O’Reilly books? Soln 2: escape any single quotes Replace any ‘ with a ‘’ or \’ In PHP, turn on magic_quotes_gpc flag in.htaccess show both versions
M.P. Johnson, DBMS, Stern/NYU, Sp Preventing injection attacks When to do security checking for quotes, etc.? Natural choice: in client-side data validation But not enough! As saw: can still manually submit GET and POST Must do security checking on server
M.P. Johnson, DBMS, Stern/NYU, Sp Preventing injection attacks Soln 3: use prepare parameterized queries Supported in JDBC, Perl DBI, PHP ext/mysqli Very dangerous: using tainted data to run commands at the Unix command prompt Semi-colons, prime char, etc. Safest: define set if legal chars, not illegal ones
M.P. Johnson, DBMS, Stern/NYU, Sp More Info phpGB MySQL Injection Vulnerability "How I hacked PacketStorm“
M.P. Johnson, DBMS, Stern/NYU, Sp And now for something completely different: XML XML: eXtensible Mark-up Language Very popular language for semi-structured data Mark-up language: consists of elements composed of tags, like HTML Emerging lingua franca of the Internet, Web Services, inter-vender comm
M.P. Johnson, DBMS, Stern/NYU, Sp Unstructured data At one end of continuum: unstructured data Text files Stock market prices CIA intelligence intercepts Audio recordings “Just one damn bit after another” Henry Ford No (intentional, formal) patterns to the data Difficult to manage/make sense of Why we need data-mining
M.P. Johnson, DBMS, Stern/NYU, Sp Structured data At the other end: structured data Tables in RDBMSs Data organized into semantic chunks entities Similar/related entities grouped together Relationships, classes Entities in same group have same structure Same fields/attributes/properties Easy to make sense of But sometimes too rigid a req. Difficult to send—convert to tab-delimited
M.P. Johnson, DBMS, Stern/NYU, Sp Semi-structured data Not too random Data organized into entities Similar/related grouped to form other entities Not too structured Some attributes may be missing Size of attributes may vary Support of lists/sets Juuust Right Data is self-describing
M.P. Johnson, DBMS, Stern/NYU, Sp Semi-structured data Predominant examples: HTML: HyperText Mark-up Language XML: eXtensible Mark-up Language NB: both mark-up languages (use tags) Mark-up lends self of semi-structured data Demarcate boundaries for entities But freely allow other entities inside
M.P. Johnson, DBMS, Stern/NYU, Sp Data model for semi-structured data Usually represented as directed graphs Graph: set of vertices (nodes) and edges Dots connected by lines; not nec. a tree! In model, Nodes ~ entities or fields/attributes Edges ~ attribute-of/sub-entity-of Example: publisher publishes >=0 books Each book has one title, one year, >=1 authors Draw publishers graph
M.P. Johnson, DBMS, Stern/NYU, Sp XML is a SSD language Standard published by W3C Officially announced/recommended in 1998 XML != HTML XML != a replacement for HTML Both are mark-up languages Big diffs: 1. XML doesn’t use predefined tags (!) But it’s extensible: tags can be added 2. HTML is about presentation:,, XML is about content:,
M.P. Johnson, DBMS, Stern/NYU, Sp XML syntax Like HTML in many respects but more strict All tags must be closed Can’t have: this is a line Every start tag has an end tag Although style can replace both IS case-sensitive IS space-sensitive XML doc has a unique root element
M.P. Johnson, DBMS, Stern/NYU, Sp XML syntax Tags must be properly nested Not allowed I’m not kidding Intuition: file folders Elements may have quoted attributes … Comments same as in HTML: Draw publishers XML
M.P. Johnson, DBMS, Stern/NYU, Sp Escape chars in XML Some chars must be escaped Distinguish content from syntax Can also declare value to be pure text: >< <> && "" '' jsdljsd <>>]]> 3 < 5 "Don't call me 'Shirley'!"
M.P. Johnson, DBMS, Stern/NYU, Sp XML Namespaces Different schemas/DTDs may overlap XHTML and MathML share some tags Soln: namespaces as in Java/C++/C# … 15 …. … 15 ….
M.P. Johnson, DBMS, Stern/NYU, Sp From Relational Data to XML Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons XML: persons
M.P. Johnson, DBMS, Stern/NYU, Sp Semi-structured Data Explained List-valued attributes XML is not 1NF! Impossible in (single) tables: Mary Mary two phones ! namephone Mary ???
M.P. Johnson, DBMS, Stern/NYU, Sp Object ids and References SSD graph might not be trees! But XML docs must be Would cause much redundancy Soln: same concept as pointers in C/C++/J Object ids and references Graph example: Movies: Lost in Translation, Hamlet Stars: Bill Murray, Scarlet Johansson Lost in Translation 2003 Hamlet 1999 Bill Murray Lost in Translation 2003 Hamlet 1999 Bill Murray
M.P. Johnson, DBMS, Stern/NYU, Sp What do we do with XML? Things done with XML: Send to partners Parse XML received Convert to RDBMS rows Query for particular data Convert to other XML Convert to formats other than XML Lots of tools/standards for these…
M.P. Johnson, DBMS, Stern/NYU, Sp DTDs & understanding XML XML is extensible Advantage: when creating, we can use any tags we like Disadv: when reading, they can use any tags they like Using XML docs a priori is very difficult Solution: impose some constraints
M.P. Johnson, DBMS, Stern/NYU, Sp DTDs DTD: Document Type Definition You and partners/vertical industry/academic discipline decide on a DTD/schema for your docs Specify which entities you may use/must understand Specify legal relationships DTD specifies the grammar to be used DTD = set of rules for creating valid entities DTD tells your software what to look for in doc
M.P. Johnson, DBMS, Stern/NYU, Sp DTD examples Well-formed XML v. valid XML Simple example: Copy from: Partial publisher example rules: Root publisher Publisher name, book*, author* Book title, date, author+ Author firstname, middlename?, lastname
M.P. Johnson, DBMS, Stern/NYU, Sp Partial DTD example (typos!) <!DOCTYPE PUBLISHER [ <!DOCTYPE PUBLISHER [ DTD is not XML, but can be embedded in or ref.ed from XML Replacement for DTDs is XML Schemas
M.P. Johnson, DBMS, Stern/NYU, Sp XML Applications/dialects MathML: Mathematical Markup Language ations/ictp99/ictp99N8059.html ations/ictp99/ictp99N8059.html VoiceXML: es/rps.xml es/rps.xml ChemML: Chemical Markup Language XHMTL: HTML retrofitted as an XML application
M.P. Johnson, DBMS, Stern/NYU, Sp Next time Next: Hardware, etc. For next time: reading online Now: one-minute responses