Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 1 Python, XML, and PythonLabs Fred L. Drake, Jr.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 2Outline Python 1.6 and XML –What does Python offer XML users in release 1.6? PythonLabs at BeOpen.com –What does the formation of PythonLabs mean for Python?
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 3 Python 1.5.* and SGML, XML sgmllib, htmllib –Just enough SGML to work with HTML-as-deployed … somewhat. –Dispatcher model usable for small projects (SAX-like). –Does not process any DTD information. xmllib –Simple XML support for ASCII-only element and attribute names. –Namespace support, but difficult to use. –Shared dispatch model from sgmllib, htmllib, so familiar to existing user base. –Not XML 1.0 compliant. –No Unicode support.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 4 Python 1.6 and XML Existing modules remain for backward compatibility –But xmllib is deprecated. Expat interface is included in standard distributions –Can generate UTF-8 or UTF-16. –Installed by default on Windows. –Add-on package for Linux (RPMs, etc.) – probably installed by default on common distributions. –Requires getting & building Expat separately when building from source. –Jack Jansen, Paul Prescod, Andrew Kuchling. SAX 2 Interface –Contributed by Lars Marius Garshol.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 5 PyXML Extension Package Validating parser –100% Pure Python by Lars Marius Garshol! Level 1 DOM –Contributed by FourThought, LLC. Many convenience modules –Build DOM documents from ESIS streams. –ISO 8601 date format support. –SAX handler classes to dump a nicely indented XML document. Coordinated by Andrew Kuchling –A product of the XML Special Interest Group at python.org.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 6 Unicode Support Python 1.6 includes Unicode support in the core! –In source code: u’abc’ –From data: unicode(’raw data from file’, ’iso ’) –From file objects: –Support for over 60 codecs in the standard library. –Uses UTF-16 to avoid excess memory consumption; no support beyond the basic multilingual plane. Basic string type is still 8-bit characters –Avoids breaking legacy code.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 7 Unicode in Files >>> import codecs >>> f = codecs.open('test.utf8', 'w', encoding='utf-8') >>> f.write(u'Marc-Andr\xE9 Lemburg') >>> f.close() >>> open('test.utf8').readline() 'Marc-Andr\303\251 Lemburg' >>> codecs.open('test.utf8', encoding='utf-8').readline() u'Marc-Andr\351 Lemburg'
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 8 Unicode and Regular Expressions New regular expression matching engine –Supports both Unicode and 8-bit strings. –Matches faster than pcre library used in Python 1.5.*. –Regular expression compiler is 100% Pure Python. –Keeps the Perl-compatible syntax for regular expressions. –Written by Fredrik Lundh of Secret Labs, AB.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 9 PythonLabs at BeOpen.com
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 10 Who is PythonLabs? The old crew from CNRI: –Guido van Rossum, the creator of Python –Barry Warsaw, maintainer of JPython, MailMan developer –Fred Drake, Python’s Documentation Tzar –Jeremy Hylton, the pragmatic academician And a familiar voice from the community: –Tim Peters, the universal expert
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 11 Why ? Core development team will devote full time to Python –Core language development & implementation. –Community building. –Extend our efforts to improve development and deployment tools: IDLE (Python IDE using Tk) KDevelop integration? CPAN/CTAN-like repository for 3 rd -party packages? –Improve integration facilities Database API. Web-related APIs should support the latest standards. Better visibility in corporate development shops Our development efforts will be 100% Open Source –All software will have a license that conforms to the Open Source Definition (
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 12 Late Breaking News