Presentation is loading. Please wait.

Presentation is loading. Please wait.

PHP at Yahoo! Michael J. Radwin

Similar presentations


Presentation on theme: "PHP at Yahoo! Michael J. Radwin"— Presentation transcript:

1 PHP at Yahoo! http://public.yahoo.com/~radwin/ Michael J. Radwin
Abstract: In 2002, Yahoo selected PHP for Web site development and began to phase out its own proprietary server-side scripting language. Three years later, Michael Radwin reflects on how the switch to PHP offered both technical challenges and productivity increases. The first part of the presentation offers a look inside Yahoo's decision-making process to adopt an open-source scripting language. Radwin addresses why Yahoo selected PHP over other languages, focusing on the performance and stability required to serve billions of page views a day. In the second part, Radwin discusses Yahoo's PHP development methodology, which has enabled its engineers to rapidly implement features while still creating software that is maintainable over long periods of time. Biography: Michael J. Radwin is an engineering manager for Yahoo's Infrastructure Software group. His team develops and supports Web platform technologies such as Apache, PHP, and MySQL, and more recently SOAP/REST toolkits. Radwin has been hacking on Apache since 1998 in high-performance environments and his team has been instrumental in helping Yahoo migrate from proprietary to open source software. Michael J. Radwin October 20, 2005

2 Outline Yahoo!, as seen by an engineer Choosing PHP in 2002
PHP architecture at Yahoo!

3 The Internet’s most trafficked site

4 25 countries, 13 languages

5 Yahoo! by the Numbers 411M unique visitors per month
191M active registered users 11.4M fee-paying customers 3.4B average daily pageviews October 2005 Numbers from Q Yahoo! Earnings October 18, 2005

6

7 Engineering Values Security & Privacy High Availability Performance
We must protect our customers’ information High Availability If the site is offline, we’re missing the opportunity to serve our customers Performance We serve billions of pageviews a day Flexibility & Innovation Customize site for each market Rapid development of new features

8 From Proprietary to Open Source
Web Server Apache “Filo Server” DB Flat Files Web Lang yScript

9 How and Why We Selected PHP
Choosing a Language How and Why We Selected PHP

10 Choosing PHP: brief history
October 2001: 3 proprietary languages Costly to continue to maintain each Limited features (no subroutines!) Committee began researching Compare features, performance Build vs. Buy vs. Open Source PHP selected May 2002

11 Ideal Language Criteria
High performance Robust, sand-boxed Language features Loops, conditionals Complex data-types C/C++ extensions Runs on FreeBSD Interpreted or dynamically compiled i18n support Clean separation of presentation/content/app semantics Low training costs Doesn’t require CS degree to use

12 Top 10 Language Choices yScript mod_include XSLT

13 Performance: Requests
mod_perl yScript Compared PHP 4.1.2, mod_perl, yScript (Yahoo proprietary) Pentium III 800Mhz, 512M RAM, FreeBSD 4.3 (average for early 2002) Sample app: 33K input script, 41K output Included and evaluated 3 other files Header, navbar, footer Arithmetic, regex, echo variables Pseudo-personalization (“Hello, mradwin”) A few calls to C++ extension Fetch user profile from profile server Insert advertisements from adserver

14 Performance: Memory mod_perl yScript

15 Why we picked PHP Designed for web scripting High performance
Large, Open Source community Documentation, easy to hire developers “Code-in-HTML” paradigm <html> <?php echo "Hello World"; ?> </html> Integration, libraries, extensibility Tools: IDE, debugger, profiler

16 PHP at Yahoo! Today

17 Yahoo!’s Development Methodology
Server Architecture File Layout Dependency Management Security Performance Globalization

18 Server Architecture Web Server web server web server Load Balancer
Scripts User Profile Server Web Services Apache Yahoo property (sports, finance, personals, etc…) Load balancer - which server can most handle requests coming in based on algorithm (round robin, least connections, etc..) Running on server are bunch of PHP scripts. Can make remote calls to relational databases, or to other web services. Ad Server

19 Data access, Networking, Crypto
File Layout HTML Templates /usr/local/share/htdocs/*.php 95% HTML 5% PHP Template Helpers /usr/local/share/htdocs/*.inc 50% HTML 50% PHP Business Logic /usr/local/share/pear/*.inc 0% HTML 100% PHP Web pages go regular Apache htdocs dir /usr/local/share/htdocs/dk/login.php Business logic goes in PEAR directory /usr/local/share/pear/HTML/Form.php /usr/local/share/pear/Yahoo/Sports/Teams.php C/C++ Core Code Data access, Networking, Crypto 0% HTML 0% PHP

20 Dependency Management
Base PHP package depends only on XML parser ./configure --disable-all Self-Contained Extensions mysql, dba, curl, ldap, pcre, gd, iconv To enable Install /usr/local/lib/php/ /mysql.so Add “extension = mysql.so” to php.ini Avoids unnecessary dependencies Smaller Apache memory footprint

21 Security: INI Settings
open_basedir Insurance against /etc/passwd exploits allow_url_fopen = Off Use libcurl extension instead Avoid open proxy exploits display_errors = Off However, log_errors = On safe_mode = Off Intended for shared hosting environment

22 Security: Input Filtering
Cross Site Scripting (XSS) most common attack Also “SQL Injection” Normal approach strip_tags() mysqli_escape_string() Examine every line code Tedious and error-prone Use input_filter hook Sanitize all user-submitted data GET/POST/Cookie

23 Performance: Opcode Caches
Easiest performance boost Cache parsed .php scripts in shared memory Optimizations No code modifications! Several products available Zend Performance Suite APC Turck MMCache

24 Performance: PHP Extensions in C++
PHP ships with 80 extensions written in C/C++ Yahoo! develops its own proprietary extensions Fast execution speed Access to client libraries Longer development cycle Edit, compile, link, debug Manual memory-management Profile with APD to see where your hot spots are. If you see a function being called 8,000 times on one page, that might be a good candidate to port to C Focus on scripts (or include files) that get hit a lot Don’t bother optimizing a script that only gets called once in a while Examples of candidates for extensions Distributed locking i18n Advertisements UDB (user database) Cookies DBM-like flat files Security Input Filtering

25 Globalization: PHP Unicode
+ + ICU = 6 Native Unicode support in 2006 Collaborative effort Andrei Zmievski (Yahoo!) Andi Gutmans (Zend) Many members of PHP Community

26


Download ppt "PHP at Yahoo! Michael J. Radwin"

Similar presentations


Ads by Google