Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010.

Similar presentations


Presentation on theme: "Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010."— Presentation transcript:

1 Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

2 2 What this Course Is About  How do we build services like Google, Akamai, iTunes, Facebook, EBAY, …?  What are the principles behind them? (This is NOT a course on building Web sites!)  How do “cloud computing,” P2P, and Web services relate?  The main themes of the course:  Distributed systems concepts, with emphasis on data, scalability and interoperability (including “the cloud”)  Data representation fundamentals, with emphasis on XML  Information retrieval concepts, including ranking and indexing  It’s a course that involves building software using the principles learned, evaluating it, and programming in teams

3 3 How Does this Relate to Other CIS Courses? CIS 330/550  Data representation and management  Relational querying with SQL; XML querying with XQuery  DBMS-backed web sites  455/555 focuses on data with respect to interoperability CIS 350/573: software engineering and mashups CIS 505: focuses on distributed systems and algorithms  CIS 505 is less project-oriented than CIS 555  CIS 555 covers Web services, cloud architectures in more detail

4 4 Some Things We’ll Look at  What are the principles behind building systems that work on the Internet?  How do these relate to many of today’s hot technologies?  Web servers, DHTML, Servlets, JSP, …  XML  Web services  Peer-to-peer  Application servers  Cloud computing environments  Content distribution networks  Web search  Mash-ups  The cloud  …

5 5 Staff  Instructor: Zack Ives, zives@cis  Office: 576 Levine North  Office hours Th 3:30-4:30 (and by arrangement)  TA: Katie Gibson, gibsonk@seas  Office hours TBA  Discussion group:  cis-455-555-spring10@googlegroups.com cis-455-555-spring10@googlegroups.com  http://groups.google.com/cis-455-555-spring10 http://groups.google.com/cis-455-555-spring10

6 6 Textbooks  Distributed Systems: Principles and Paradigms, 2 nd ed, Tanenbaum and van Steen  We’ll read from the book ~50% of the time  Frequent supplementary handouts  Excerpts from several books  Many recent research papers  Your first one, which you should read by Wed: http://research.microsoft.com/en-us/um/people/blampson/33- Hints/Acrobat.pdf(linked off the CIS 555 “Schedule & Slides” page) http://research.microsoft.com/en-us/um/people/blampson/33- Hints/Acrobat.pdf  Send me mail if it’s difficult for you to find a way of printing the paper yourself

7 7 Prerequisites, Workload, etc. Necessary skills:  Ability to code in Java: there is a substantial implementation project  Good debugging skills – this will be the biggest time sink!  The ability to work as a team with classmates (towards the end)  A willingness to learn how to read API documentation  Some exposure to threads and concurrent programming  A willingness to “push the envelope” Workload:  Several programming/debugging-based homework assignments  A substantial term project with experimental evaluation and a report  Two midterms Payoff:  Lots of practical development and debugging experience  A good working knowledge of the fundamentals behind scalable systems  A working “academic clone of Google,” hosted on Amazon EC2! WARNING: this course should be considered 1.5 CU!

8 8 A Disclaimer…  This remains a “bleeding edge” course!  Goal 0: an understanding of scalable distributed data-centric systems  Goal 1: a look under the covers of today’s hottest topics – in lectures and in projects  Goal 2: a level of comfort in managing large, complex software development with others’ code  Part of this means doing a substantial implementation project  As in the real world: learning APIs, dealing with inadequate tools  Most of you will find this a struggle! You’ll spend many hours debugging!  We will be using some immature technology  Not everything has been tested and validated ahead of time  e.g., this will be the first year we are using Amazon Elastic Compute Cloud  We’ll do the best we can to smooth over the bugs  We hope it will be a fun course, though… … And an interesting one!

9 9 A Bit of Context for the Course

10 10 What Exactly Is the Web?  The Web consists of HTTP servers that publish HTML, XML, and a few other content types  These are hyperlinked via URLs (a subset of URIs)  Plus there are a huge number of web clients  The Web is built on a number of Internet protocols:  DNS, TCP, IP  Other Internet services use other protocols  SMTP, IMAP, POP, AIM, FTP, …  Streaming media, music swapping protocols, …  Web services, custom applications may actually also use HTTP in ways it wasn’t designed for

11 11 The Internet is Built in Layers IPv4, IPv6 Unicast, (multicast) TCP (session- based) UDP (sessionless) WiFi, ZigBee, Ethernet, WiMax Lightweight streaming, etc. SSH, FTP, HTTP, IM, P2P, … Web Services, distrib transactions, … Link IP Transport Session Middleware Your Application ……

12 12 What Is an Internet System?  Not just a web server or web application…  An application built over the Internet, whose functionality is distributed across more than one machine  Typically, at least in a client-server or server-to-server fashion, but may have many more participants  Typically, data and/or code must be exchanged in distributed fashion for the functioning of the application  Often, the data must be partitioned, replicated, translated, etc. (“shards” in Google-speak)  Often, the code is written in multiple different environments, languages, etc.  Often, there are concerns about handling failures, firewalls, attacks, …

13 13 Why Are Internet System Topics Interesting?  Understanding what’s underneath today’s Web  How does it work?  What are its shortcomings?  What are its strengths?  Understanding distributed algorithms  Using the right approach when designing new protocols and web systems  Being able to anticipate what’s actually possible in the future

14 14 Example: Web Search, a Cloud Service Index Servers Crawlers Search Interface Servers queries HTML forms; results query results Web Pages pages keywords + locations client Uses a model of document/word similarity to rank matches

15 15 Example: Social Networking (Facebook / Twitter), a Cloud Service Recommender Users & entities User Page Servers clicks pages & notifications suggestions common properties, usage logs, … client updates, posts

16 16 Example: Information Integration XML sources Mediator System queries results in “mediated schema” client Relational sources HTML sources XQuery + XPath over XML SQL ODBC results HTTP POST HTML Maps all data into a single format and virtual schema

17 17 Example: SETI@home Problem Partitioning client Breaks computation into many parts and distributes them to the clients Data Aggregation New sub- problems Computed subresults

18 18 Example: P2P File Sharing client request data Processes name-based requests for data; each node can make requests, forward requests, return data

19 19 What are the Hard Problems?  Disclaimer: most of the hard problems AREN’T solved (or solvable) – and there often isn’t any single BEST solution Much of systems design is about finding the right compromise for each specific problem  We can divide them into:  Scalability  Availability / reliability  Consistency  Interoperability  Location and resource discovery

20 20 Scalability  How do we support a large number of clients or requests?  Distribute work!  Challenges:  Coordination – takes significant overhead in the general case  Load balancing – avoid having bottlenecks  Parts of the solution:  Client-server, multi-tier, P2P architectures  Restricted programming models, e.g., MapReduce  Data partitioning, replication, remote procedure calls, …

21 21 Availability/Reliability  How do we ensure the system is “up” when we want it to be, and doing the “right” thing?  Replication and redundancy  Security measures against attacks  Ability to undo/redo  Challenges:  Keeping things consistent  Performance vs. security  Acknowledgments  Parts of the solution:  Data partitioning, replication, …  Logging, transactions, …  Redundant hardware, multiple sites, …  Quorum and consensus algorithms

22 22 Consistency / Consensus  Replication, distribution, and failures make it difficult to keep a unified, consistent view of the world – how do we combat this?  Locking, concurrency control, and invalidation schemes  Clock synchronization  Challenges:  Locking has huge performance overhead  Network partitions, disconnected operation  Parts of the solution:  Optimistic concurrency control, 2-phase locking  Distributed clock sync  Conflict resolvers

23 23 Interoperability  How do we coordinate the efforts of components that have different data formats and/or source languages, and are on different machines?  Standardization!  Challenges:  Everything has a different semantics!  Parts of the solution:  Standard data formats: XML, XML schemas  “Schema mediation” and data translation  Remote procedure calls: CORBA, XML-RPC, …

24 24 Location & Resource Discovery  How do you find what you’re looking for?  Naming  Declarative queries over standard schemas  Advertisements  Challenges:  Naming has implicit semantics  What do you do when you don’t know what to call something?  Parts of the solution:  Directory systems – DNS, LDAP, etc.  Resource discovery and advertising protocols  Overlay networks, sharding schemes  Standardized schemas

25 25 Our First Focus: Single Machines, aka Servers  How do you handle large numbers of concurrent users?  Processes  Threads  Events  Hybrids (e.g., thread pools)  Staged architectures

26 26 Next Time (Wed due to MLK Day)…  We’ll look under the covers of an HTTP server  Key ideas in building scalable systems  Principles of HTTP and web servers  Management of concurrent sessions  To read by next Wednesday:  Lampson and Saltzer paper http://research.microsoft.com/en-us/um/people/blampson/33- Hints/Acrobat.pdf http://research.microsoft.com/en-us/um/people/blampson/33- Hints/Acrobat.pdf  Tanenbaum Ch. 3.1  If necessary: Review Tanenbaum “Modern OS,” Ch. 2.3 or a similar OS book on interprocess communication


Download ppt "Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010."

Similar presentations


Ads by Google