Download presentation
Presentation is loading. Please wait.
1
PHP at Yahoo! http://public.yahoo.com/~radwin/ Michael J. Radwin
Abstract: In 2002, Yahoo selected PHP for Web site development and began to phase out its own proprietary server-side scripting language. Three years later, Michael Radwin reflects on how the switch to PHP offered both technical challenges and productivity increases. The first part of the presentation offers a look inside Yahoo's decision-making process to adopt an open-source scripting language. Radwin addresses why Yahoo selected PHP over other languages, focusing on the performance and stability required to serve billions of page views a day. In the second part, Radwin discusses Yahoo's PHP development methodology, which has enabled its engineers to rapidly implement features while still creating software that is maintainable over long periods of time. Biography: Michael J. Radwin is an engineering manager for Yahoo's Infrastructure Software group. His team develops and supports Web platform technologies such as Apache, PHP, and MySQL, and more recently SOAP/REST toolkits. Radwin has been hacking on Apache since 1998 in high-performance environments and his team has been instrumental in helping Yahoo migrate from proprietary to open source software. Michael J. Radwin October 20, 2005
2
Outline Yahoo!, as seen by an engineer Choosing PHP in 2002
PHP architecture at Yahoo!
3
The Internet’s most trafficked site
4
25 countries, 13 languages
5
Yahoo! by the Numbers 411M unique visitors per month
191M active registered users 11.4M fee-paying customers 3.4B average daily pageviews October 2005 Numbers from Q Yahoo! Earnings October 18, 2005
7
Engineering Values Security & Privacy High Availability Performance
We must protect our customers’ information High Availability If the site is offline, we’re missing the opportunity to serve our customers Performance We serve billions of pageviews a day Flexibility & Innovation Customize site for each market Rapid development of new features
8
From Proprietary to Open Source
Web Server Apache “Filo Server” DB Flat Files Web Lang yScript
9
How and Why We Selected PHP
Choosing a Language How and Why We Selected PHP
10
Choosing PHP: brief history
October 2001: 3 proprietary languages Costly to continue to maintain each Limited features (no subroutines!) Committee began researching Compare features, performance Build vs. Buy vs. Open Source PHP selected May 2002
11
Ideal Language Criteria
High performance Robust, sand-boxed Language features Loops, conditionals Complex data-types C/C++ extensions Runs on FreeBSD Interpreted or dynamically compiled i18n support Clean separation of presentation/content/app semantics Low training costs Doesn’t require CS degree to use
12
Top 10 Language Choices yScript mod_include XSLT
13
Performance: Requests
mod_perl yScript Compared PHP 4.1.2, mod_perl, yScript (Yahoo proprietary) Pentium III 800Mhz, 512M RAM, FreeBSD 4.3 (average for early 2002) Sample app: 33K input script, 41K output Included and evaluated 3 other files Header, navbar, footer Arithmetic, regex, echo variables Pseudo-personalization (“Hello, mradwin”) A few calls to C++ extension Fetch user profile from profile server Insert advertisements from adserver
14
Performance: Memory mod_perl yScript
15
Why we picked PHP Designed for web scripting High performance
Large, Open Source community Documentation, easy to hire developers “Code-in-HTML” paradigm <html> <?php echo "Hello World"; ?> </html> Integration, libraries, extensibility Tools: IDE, debugger, profiler
16
PHP at Yahoo! Today
17
Yahoo!’s Development Methodology
Server Architecture File Layout Dependency Management Security Performance Globalization
18
Server Architecture Web Server web server web server Load Balancer
Scripts User Profile Server Web Services Apache Yahoo property (sports, finance, personals, etc…) Load balancer - which server can most handle requests coming in based on algorithm (round robin, least connections, etc..) Running on server are bunch of PHP scripts. Can make remote calls to relational databases, or to other web services. Ad Server
19
Data access, Networking, Crypto
File Layout HTML Templates /usr/local/share/htdocs/*.php 95% HTML 5% PHP Template Helpers /usr/local/share/htdocs/*.inc 50% HTML 50% PHP Business Logic /usr/local/share/pear/*.inc 0% HTML 100% PHP Web pages go regular Apache htdocs dir /usr/local/share/htdocs/dk/login.php Business logic goes in PEAR directory /usr/local/share/pear/HTML/Form.php /usr/local/share/pear/Yahoo/Sports/Teams.php C/C++ Core Code Data access, Networking, Crypto 0% HTML 0% PHP
20
Dependency Management
Base PHP package depends only on XML parser ./configure --disable-all Self-Contained Extensions mysql, dba, curl, ldap, pcre, gd, iconv To enable Install /usr/local/lib/php/ /mysql.so Add “extension = mysql.so” to php.ini Avoids unnecessary dependencies Smaller Apache memory footprint
21
Security: INI Settings
open_basedir Insurance against /etc/passwd exploits allow_url_fopen = Off Use libcurl extension instead Avoid open proxy exploits display_errors = Off However, log_errors = On safe_mode = Off Intended for shared hosting environment
22
Security: Input Filtering
Cross Site Scripting (XSS) most common attack Also “SQL Injection” Normal approach strip_tags() mysqli_escape_string() Examine every line code Tedious and error-prone Use input_filter hook Sanitize all user-submitted data GET/POST/Cookie
23
Performance: Opcode Caches
Easiest performance boost Cache parsed .php scripts in shared memory Optimizations No code modifications! Several products available Zend Performance Suite APC Turck MMCache
24
Performance: PHP Extensions in C++
PHP ships with 80 extensions written in C/C++ Yahoo! develops its own proprietary extensions Fast execution speed Access to client libraries Longer development cycle Edit, compile, link, debug Manual memory-management Profile with APD to see where your hot spots are. If you see a function being called 8,000 times on one page, that might be a good candidate to port to C Focus on scripts (or include files) that get hit a lot Don’t bother optimizing a script that only gets called once in a while Examples of candidates for extensions Distributed locking i18n Advertisements UDB (user database) Cookies DBM-like flat files Security Input Filtering
25
Globalization: PHP Unicode
+ + ICU = 6 Native Unicode support in 2006 Collaborative effort Andrei Zmievski (Yahoo!) Andi Gutmans (Zend) Many members of PHP Community
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.