Download presentation
Presentation is loading. Please wait.
Published byDale Maxwell Modified over 9 years ago
1
Introduction Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 17, 2008
2
2 What this Course Is About The focus is NOT on building Web applications in PHP, or servlets, or ASP… It’s about how to build services like Google, Akamai, iTunes, … What are the principles behind them? Distributed systems concepts, with emphasis on scalability and interoperability Data representation fundamentals, with emphasis on XML Information retrieval concepts, including ranking and indexing It’s a course that involves building software, evaluating it, and programming in teams
3
3 How Does this Relate to Other CIS Courses? CIS 505: focuses on distributed systems with an emphasis on concurrency CIS 330/550 Data representation and management Building a DBMS-backed, servlet-based web site (e.g., mashup) 455/555 focuses on data with respect to interoperability CIS 350: focuses on software development and engineering CIS 573: software engineering & mashups Assumptions: You know something about threads and synchronization primitives; and you’re at least vaguely familiar with database ideas (or can quickly learn them)
4
4 Some Things We’ll Look at What are the principles behind building systems that work on the Internet? How do these relate to many of today’s hot technologies? Web servers, DHTML, Servlets, JSP, … XML Web services Peer-to-peer Application servers Content distribution networks Web search Mash-ups …
5
5 Staff Instructor: Zack Ives, zives@cis Office: 576 Levine North Office hours T 3:30-4:30 (and by arrangement) TA: Mengmeng Liu, mengmeng@seas Office: 575 Levine North Office hours TBA Discussion group: cis-455-555-spring08@googlegroups.com cis-455-555-spring08@googlegroups.com http://groups.google.com/cis-455-555-spring08 http://groups.google.com/cis-455-555-spring08
6
6 Textbooks Distributed Systems: Principles and Paradigms, 2 nd ed, Tanenbaum and van Steen Frequent supplementary handouts Excerpts from several books Many recent research papers
7
7 Prerequisites, Workload, etc. Necessary skills: The ability to code in Java – there is a substantial implementation project The ability to work as a team with a classmate A willingness to “push the envelope” Knowledge of threads & sync: CSE 380 – Operating Systems – or equivalent Suggested CIS 330 / 550 – Databases – or equivalent Workload: Several programming/debugging-based homework assignments A substantial term project with experimental evaluation and a report Midterm and final exam WARNING: this course should be considered 1.5 CU (and we’re in the process of making that happen)
8
8 A Disclaimer… This is a “bleeding edge” course! Goal 1: give you a look under the covers of today’s hottest topics – in lectures and in projects Goal 2: give you a level of comfort in managing large, complex software development with others’ code Part of this means doing a substantial implementation project As in the real world: learning APIs, dealing with inadequate tools Most of you will find this a struggle! We will be using some immature technology Not everything has been tested and validated ahead of time We’ll do the best we can to smooth over the bugs We hope it will be a fun course, though… … And an interesting one!
9
9 A Bit of Context for the Course
10
10 What Exactly Is the Web? The Web consists of HTTP servers that publish HTML, XML, and a few other content types These are hyperlinked via URLs (a subset of URIs) Plus there are a huge number of web clients The web is built on a number of Internet protocols: DNS, TCP, IP The Internet has many other protocols SMTP, IMAP, POP, AIM, FTP, … Streaming media, music swapping protocols, … Web services, custom applications
11
11 The Internet is Built in Layers Link layer (802.11x, 802.3, …) IP layer – point-to-point and multicast Transport, session layers: TCP, UDP -- session-based vs. sessionless; reliable vs. unreliable UDP is used as the core of many multimedia protocols (e.g., Real or WMP streaming protocols) TCP is used as the basis of most of our session-oriented protocols: Telnet, SSH, FTP HTTP Other protocols are built over HTTP (e.g., XML-RPC) IM, P2P protocols, … Middleware and application layers Sometimes we interpose extra layers that are invisible e.g., Akamai
12
12 What Is an Internet System? Not just a web server or web application… An application built over the Internet, whose functionality is distributed across more than one machine Typically, at least in a client-server or server-to-server fashion, but may have many more participants Typically, data and/or code must be exchanged in distributed fashion for the functioning of the application Often, the data must be partitioned, replicated, translated, etc. Often, the code is written in multiple different environments, languages, etc. Often, there are concerns about handling failures, firewalls, …
13
13 Why Are Internet System Topics Interesting? Understanding what’s underneath today’s web How does it work? What are its shortcomings? What are its strengths? Understanding distributed algorithms Using the right approach when designing new protocols and web systems Being able to anticipate what’s actually possible in the future
14
14 Example: Web Search Index Servers Crawlers Search Interface Servers queries HTML forms; results query results Web Pages pages keywords + locations client Uses a model of document/word similarity to rank matches
15
15 Example: Information Integration XML sources Mediator System queries results in “mediated schema” client Relational sources HTML sources XQuery + XPath over XML SQL ODBC results HTTP POST HTML Maps all data into a single format and virtual schema
16
16 Example: SETI@home Problem Partitioning client Breaks computation into many parts and distributes them to the clients Data Aggregation New sub- problems Computed subresults
17
17 Example: P2P File Sharing client request data Processes name-based requests for data; each node can make requests, forward requests, return data
18
18 What are the Hard Problems? Disclaimer: most of the hard problems AREN’T solved (or solvable) – and there often isn’t any single BEST solution Much of systems design is about finding the right compromise for each specific problem We can divide them into: Scalability Availability / reliability Consistency Interoperability Location and resource discovery
19
19 Scalability How do we support a large number of clients or requests? Distribute work! Challenges: Coordination – takes significant overhead Load balancing – avoid having bottlenecks Parts of the solution: Client-server, multi-tier, P2P architectures Data partitioning, replication, remote procedure calls, …
20
20 Availability/Reliability How do we ensure the system is “up” when we want it to be? Replication and redundancy Security measures against attacks Ability to undo/redo Challenges: Keeping things consistent Performance vs. security Acknowledgments Parts of the solution: Data partitioning, replication, … Logging, transactions, … Redundant hardware, multiple sites, …
21
21 Consistency Replication and distribution make it difficult to keep a unified, consistent view of the world – how do we combat this? Locking, concurrency control, and invalidation schemes Clock synchronization Challenges: Locking has huge performance overhead Network partitions, disconnected operation Parts of the solution: Optimistic concurrency control, 2-phase locking Conflict resolvers
22
22 Interoperability How do we coordinate the efforts of components that have different data formats and/or source languages, and are on different machines? Standardization! Challenges: Everything has a different semantics! Parts of the solution: Standard data formats: XML, XML schemas “Schema mediation” and data translation Remote procedure calls: CORBA, XML-RPC, …
23
23 Location & Resource Discovery How do you find what you’re looking for? Naming Declarative queries over standard schemas Advertisements Challenges: Naming has implicit semantics What do you do when you don’t know what to call something? Parts of the solution: Directory systems – DNS, LDAP, etc. Resource discovery and advertising protocols Standardized schemas
24
24 Our First Focus: Single Machines, aka Servers How do you handle large numbers of concurrent users? Processes Threads Events Hybrids (e.g., thread pools) Staged architectures
25
25 Next Time… We’ll look under the covers of an HTTP server Key ideas in building scalable systems Principles of HTTP and web servers Management of concurrent sessions To read: Lampson and Saltzer paper Tanenbaum Ch. 3.1 For next week: “HTTP Made Really Easy” and Rexford Ch. 4 If necessary: Review Tanenbaum “Modern OS,” Ch. 2.3 or a similar OS book on interprocess communication
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.