All Things Python

Blog by Marc-André Lemburg, Senior Solution Architect, Python Core Developer, Trainer, Coach and Consultant

Introduction to PyRun - Python in 3.5MB

Yesterday, we announced a new version of our open-source eGenix PyRun, the “one-file Python run-time”. So what is this “one-file Python run-time” ?

image

In 2008, eGenix started work on a product which has to run Python on Linux servers. We had looked into distributing the code just as Python files running on OS provided Python run-times, but the many different Linux distributions and their Python variants quickly caused us to reconsider the approach. Instead, we opted to shipping Python together with the product, as many companies do when distributing Python products. That way, we avoided the problem of having to adjust to all the differences found in Python installations across Linux distributions.

Now, shipping Python together with a product usually means shipping around 100MB worth of code. This may look nice from a marketing perspective (lots of code for the money), but it’s not really an ideal way of distributing products to customers.

Back to the 90s…

In the late 1990s, I had started a project called mxCGIPython. At the time, web hosters only support FTP access and Perl/shell as CGI run-time. Of course, I wanted to use Python on the hosters, so I thought to myself: wouldn’t it be great to upload a single file to the hoster’s CGI directory and then have a shell script make this executable to use as basis for CGI scripting ?

I ran some tests with simple executables and the idea actually worked pretty well.

Next, I had to turn Python together with the standard library into a single binary. Python came with a tool called freeze to create stand-alone binaries for applications, so I pointed freeze at the standard library to create such a binary.

This worked, but did require some additional tweaks to actually make the setup work. See the README of freeze to get an idea of how it works (or read the code, like I did at the time :-)).

Since I did not have access to all the different web hosting platforms, I made the project open source. People loved the idea and sent in lots of pre-compiled binaries for all kinds of platforms - covering most of the ones used by web hosters at the time.

After a few years, hosters finally caught on to also support Python as CGI platform and nowadays it’s normal to run complete web stacks using Python as implementation language.

Aside: The platform module you find in the Python standard library was the result of this project. I wanted a clean way to name the mxCGIPython binaries, so wrote the platform module as a way to come up with a standardized name.

Fast forward again…

Right, so we were looking for a solution to ship Python, but not using the 100MB heavy-weight approach. I remembered the mxCGIPython project and how small the footprint of those binaries was.

We gave it a try and, voilà, it worked great; well, after a few tweaks, of course.

Now, you might ask: why didn’t you simply freeze just the product into a single executable. The reason is simple. We wanted to be able to use this platform for future products as well and ideally be able to send out patches by just distributing ZIP files with the Python code.

And, of course, we also believe that others can make good use of the technology as well, so we improved the code, turned it into a product and open sourced it.

That’s how eGenix PyRun was born, again, from the ashes, so to speak.

Working on the UI

After a few releases, we found that installation using unzip/untar is great, but having to find the location of the distribution files for the platform is not. As a result, we added a bash script install-pyrun to take on this task, which automates the installation and also adds pip and setuptools.

First, you get the script and install is somewhere as executable:

tmp/pyrun-demo> wget https://downloads.egenix.com/python/install-pyrun
tmp/pyrun-demo> chmod 755 ./install-pyrun

Then you run it in a directory where you want the PyRun environment to be installed:

tmp/pyrun-demo> ./bin/pyrun
eGenix PyRun 2.7.10 (release 2.1.1, default, Oct  1 2015, 12:01:41)
Thank you for using eGenix PyRun. Type "help" or "license" for details.
>>> 

And that’s it.

If you want a Python 2.6 version, pass --python=2.6 to the script, for Python 3.4, use --python=3.4.

Seeing is believing

Let’s have a look at the sizes:

tmp/pyrun-demo> ls -l bin/pyrun*
-rwxr-xr-x 1 lemburg lemburg 11099374 Oct  1 12:03 pyrun2.7
-rwxr-xr-x 1 lemburg lemburg 18784684 Oct  1 12:03 pyrun2.7-debug

That’s around 11MB for an almost complete Python run-time in a single file. Not bad. But we can improve this even more by using an exe-compressor such as upx:

tmp/pyrun-demo> upx bin/pyrun2.7
        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  11099374 ->   3549128   31.98%  linux/ElfAMD   pyrun2.7          

upx will uncompress the executable during load time, so load time increases, but it’s still impressive how small Python can get:

tmp/pyrun-demo> ls -l bin
-rwxr-xr-x 1 lemburg lemburg  3549128 Oct  1 12:03 pyrun2.7
-rwxr-xr-x 1 lemburg lemburg 18784684 Oct  1 12:03 pyrun2.7-debug

Ok, it’s not as small as Turbo Pascal was when it first hit the market with a binary of only 48k, including a compiler, editor and run-time lib, but 3.5MB is mobile app size and that alone should ring a few bells :-)

Just think of how much bandwidth you’d save compared to the 100MB gorilla, when pushing your executable to all those containers in your cluster farms in order to run your application.

To make things even easier to install, we’ve recently added an -r requirements.txt parameter to install-pyrun, so you can have it install all your dependencies together with eGenix PyRun in one go.

Some things not included in eGenix PyRun

To be fair, some shared modules from the standard library are not included (e.g. ctypes, parser, readline). install-pyrun installs them in lib/pythonX.X/lib-dynload/, so that they can optionally be used, for a total of 2.5MB in .so files.

The main purpose of eGenix PyRun is to work as run-time, so we optimized for this use. The optional shared modules can be added to the binary as well, if needed, by adding appropriate lines to the Setup.PyRun-X.X files used when building eGenix PyRun.

Anyway, give a try and let me know what you think.

Enjoy,

Marc-André

Simplifying print() string formatting

A couple of weeks ago, there was a lengthy discussion on the python-ideas mailing list about how to simplify string formatting in Python 3, esp. related to print() output.

Several proposals were made and three PEP came out of the discussion:

When will this be available ?

The accepted PEP 498 will only make it into Python 3.6, which will take another (around) 18 months to be released.

However, you don’t really have to wait that long and it’s even possible to come close to the proposal in Python 2.7, if all you want is print such strings.

With print() being a function in Python 3 and optionally in Python 2.7, you can use simple helper functions to define your formatting functions:

# For Python 2 you need to make print a function first:
from __future__ import print_function
import sys

# Keep a reference to the native print() function
_orig_print = print # Use .format() as basis for print() def fprint(template, *args, **kws): caller = sys._getframe(1) context = caller.f_locals _orig_print(template.format(**context), *args, **kws) # Use C-style %-formatting as basis for print() def printf(template, *args, **kws): caller = sys._getframe(1) context = caller.f_locals _orig_print(template % context, *args, **kws) # Examples: a = 1 fprint('a = {a}') printf('a = %(a)s') # Let's use fprint() as standard print() in this module: print = fprint b = 3 print('b = {b}')

Running the above script, gives the expected output:

a = 1
a = 1
b = 3

Both in Python 2.7 and Python 3, and now rather than later.

Good idea or bad ?

Whether it’s a good idea to have code implicitly look up locals is a good idea remains to be seen. I generally prefer to write things down explicitly - even if there is a bit of duplication.

Tricks such as the above certainly make it harder to see where variables are being used and can thus be a source of error. Then again, the formatting routines will complain loudly if they cannot find a variable, instead of just generating empty strings as output:

>>> printf('%(c)s')
Traceback (most recent call last):
  File "", line 1, in 
  File "myprint.py", line 23, in printf
    _orig_print(template % context, *args, **kws)
KeyError: 'c'

Enjoy,

Marc-André

Starting a Python blog

Long ago, in the late 1990s, I had a website on Christian Tismer’s Starship Python to show my Python projects to the community and report on new developments, provide tips, hints and small utilities. The site was called “Marc’s Python Pages”:

image

Turning a hobby into business

I then launched my company eGenix.com Software, Skills and Services GmbH in 2000 to market a web application server I had been working on for a couple of years. As it turned out, I was too early with the product. The market was still thriving using CGI scripts, Perl and a couple of static pages to run websites. At the same time, the Internet bubble burst, so it wasn’t exactly perfect timing for starting a DotCom company.

I eventually sold a single license of the application server to Steilmann in Bochum and then turned to consulting work, using the eGenix mx Extensions I had written for the application server as a way to market the company and myself to companies using Python.

This worked reasonably well and I have since run several projects, in-house at clients or outsourced to eGenix, and continued to add new useful commercial products to our portfolio - mostly around the commercial ODBC database Python interface mxODBC I had originally written for the application server.

Commercial and Open Source Software

Incidentally, the mxODBC development in 1997 also triggered the development of a date/time library mxDateTime at the time, since Python had no way of storing date/time values as objects apart from using Unix ticks values. Since it was a basic building block, I open-sourced it, in the same spirit as Python itself was open source (even though the term wasn’t known at the time). mxDateTime became the de facto standard for date/time storage, until Python itself received a datetime module.

I also open sourced several other C extensions for Python, which were all used in the application server, such as mxTextTools, mxTools, mxProxy, etc. - what was then to become the eGenix mx Extensions.

Still enjoying Python and it’s community

Over 15 years later, I still haven’t lost interest in Python, which I think means something. I continue to enjoy working with it, for it and enjoy the community that has developed around Python every single day.

I usually write lots of emails on mailing lists to discuss and stay in touch with people. Lately, I found I was missing a more persistent way of writing down ideas and snippets, something along the lines of what once were the Starship pages, so here you go… hope you’ll enjoy the ride.

If you want to follow the blog, please see the contact page. It’s currently possible to use RSS, Twitter and Tumblr for this.

Enjoy,

Marc-André

PS: The website is not yet complete, e.g. the project pages don’t work yet. I’ll add more content over the next few weeks.