Tall, Snarky Canadian

Moving my blog to Silvrback

2015-08-11T11:03:51-07:00

I have been using Svbtle since it opened up to the public and in general I have enjoyed it. It’s has a very clean design and you get to blog in Markdown which is great when you typically talk about code.

But I have been wanting to set up a custom domain for my blog which typically requires paying for blog hosting (which I’m fine with). And if I’m going to be putting money into something I want to make sure it’s the best I can get for the money I’m spending. And that’s when I found Silvrback.

Basically Silvrback does everything that Svbtle does in terms of features from custom domain to letting people subscribe by email. I still do prefer Svbtle’s overall look, Silvrback isn’t far off. And then there is the fact that Silvrback is USD 30/year while Svtble works out to USD 72/year. Look quibbles aside, I just couldn’t ignore the difference, especially when I have Damian of Silvrback so responsive to questions/suggestions I had.

And so my blog/bio page is now at http://www.snarky.ca/ and is hosted by Silvrback (if you prefer HTTPS access you can get it through https://nothingbutsnark.silvrback.com/ but obviously that is not as stable as the custom domain).

Python 3 support on PyPI

2015-06-12T08:11:14-07:00

At (and since) PyCon 2015, there has been interest in trying to get quantified numbers in relation to Python 3 adoption (see PyPI download numbers and uptake in the astronomy community). One number I am personally interested in is per-project adoption of Python 3. While the Python 3 Wall of Superpowers shows wide support for Python 3 with the top projects by download, I wanted to take a larger look at PyPI as a whole to measure project adoption. So I wrote a script that downloads the JSON data for every project on PyPI and then analyzes the data in various ways.

Methodology #

The methodology I used was to classify a project as supporting Python 3 if its trove classifier said it did (in other words, putting a project in the “Python 3” bin means it at least supports Python 3 and may support Python 2 as well), the same goes for Python 2 classifying (although if the project supports Python 3 it goes in that bin instead), and all other cases being bucketed as “unknown”. Unfortunately you will discover that the majority of projects don’t have a trove classifier specifying what versions of Python they support. Because of this and knowing that Python 2 is still the dominant version of Python it might be tempting to automatically clump all unknown projects into the Python 2 bucket, but as caniusepython3’s overrides file shows, Python 3 projects also don’t always specify their trove classifiers properly. So while you can group all unknown projects into Python 2 as a simple worst-case scenario, it is also conservative and thus not totally accurate (as is anything where human beings are relied upon to provide the data). I just unfortunately don’t know what the possible rate is for people leaving the trove classifier off of Python 3 projects without doing a random sample, so I have erred on the side of conservative and just clumped all unknown projects into the Python 2 bucket in any adoption rate numbers cited below (which are calculated by simply doing Python 3 / (Python 2 + unknown) so it’s a ratio more than an overall percentage which means 100% would indicate half of all projects are in Python 3 and the other half are in Python 3).

Another thing to realize about these numbers is that when one project switches to Python 3 it doesn’t necessarily lead to only one other project porting, but possibly many. Think of when Django added Python 3 support: it didn’t open the possibility of just one other project to port to Python 3 but hundreds. That means the impact of these numbers can be a little misleading since it is not going to be a linear porting rate due to network effects.

And finally, Python 3 will never hit 100%. This isn’t just because there will be people who never port to Python 3 (because there will be), but also because some people have created separate projects for their Python 3 ports. I don’t know how widespread this practice is, but it does mean in some instances the numbers actually cancel each other out because one project counts once for Python 2 support and then again for Python 3 support under a different name. In other words whatever number you want to consider meaning Python 3 has been adopted by the community, you need to make sure it’s below 100%.

Some numbers #

By release date #

First I looked at every project on PyPI that has ever uploaded a release (i.e., created a project and uploaded some file):

Unknown: 34,447
Python 2: 8,064
Python 3: 11,377

This is a rather junk number since Python 2 is heavily weighted thanks to simply existing longer and thus having more abandoned projects. But even in this instance, Python 3 is supported by over 26% of projects.

The next number I looked at is the number of projects which released a version in the last 2 years:

Unknown: 19,760
Python 2: 5,898
Python 3: 10,295

40% of projects support Python 3 with this filter. I would argue this is the bare minimum release cadence to consider a project not fully abandoned.

Now it’s probably more reasonable to claim that a project that has been updated in the last year is a better metric:

Unknown: 12,864
Python 2: 4,091
Python 3: 8,329

At about 49%, there is a clear increase in Python 3 support over the past year. This might correspond with Python 3.4.0 being released on 2014-03-16 which falls closer to being a year ago than two years ago.

What happens if you make the cut-off at 6 months?

Unknown: 8,183
Python 2: 2,809
Python 3: 6,134

That’s over 55% for Python 3. This seems to suggest that more and more people are using Python 3 and that the growth from 2 years ago versus this past year is simply increased Python 3 support.

Now looking at just the latest release still runs into the issue of new projects that were simply uploaded and then never touched again, which is a similar issue as just looking at all PyPI projects without any sort of time horizon. So what does the numbers look like if 2 releases have to be made within the time horizon? For instance, what if a project has to make 2 releases within the past year?

Unknown: 8,033
Python 2: 2,779
Python 3: 5,889

That puts Python 3 adoption at over 54% compared to Python 2. That’s a measurable increase compared to projects that had to have only a single release within the past year.

How about 2 releases in the last six months?

Unknown: 4,747
Python 2: 1,732
Python 3: 3,920

This gets us to 60% adoption for Python 3 compared to Python 2. Once again, an increase compared to the single release case. This continues to suggest that a decent of active projects are supporting Python 3.

By downloads #

Having looked at release dates and how recently two releases have occurred, you might have noticed that popularity has still not come into the picture. What happens if you take monthly download counts into consideration? Take for instance all projects that are downloaded more than 1,440 time a month (which is equal to if the project was downloaded twice an hour):

Unknown: 4,115
Python 2: 1,215
Python 3: 3,332

Over 62%. Now download numbers are really unreliable as continuous integration can muck with the numbers, so this should be taken with a grain of salt. And to reiterate, most people who tell me they have a dependency blocking them from moving to Python 3 is more of a long tail issue and not major projects.

Now what if you use all of the measurement angles that I have suggested? What if you only look at projects that have had 2 releases in the past year which were downloaded at least 1,440 times in the past month?

Unknown: 2,241
Python 2: 868
Python 3: 2,465

Over 79% in this instance. Obviously the same issues with the other download-based numbers apply here, especially in the instance of people saying the long tail is their blocking dependency and not major projects. This also aligns with the Python 3 Wall of Superpowers suggesting that very active, popular projects have mostly added Python 3 support.

By some definition of “actively maintained” #

All these numbers are nice, but the definition of an “actively maintained” project is still rather loose. What if we tightened it up a bit more? One could argue that a project that has made two releases over the past year that were at least 90 days apart seems like an active project. This suggests a rough average of a release at least every six months and weeds out any quick releases due to bugs being found immediately after release. Those numbers look like:

Unknown: 3,158
Python 2: 1,188
Python 3: 2,786

That’s 64% and more in the range of measurements based on downloads rather than the strict latest release measurements. This suggests to me that if a good number of fairly active projects have been ported (which probably also tend to be popular, either because of their activity or their activity is because of their popularity).

New projects #

And finally, the last way to slice the PyPI data is to look at new projects. Way back when Python 3.0 was released, Guido said that he hoped a majority of new projects would be using Python 3 within 5 years. It’s now over 6 years since Python 3.0’s release in December 2008 (which based on how I’m calculating ratios means hitting 100%). How has that hope panned out?

Let’s first look at projects created over the past year:

Unknown: 8,242
Python 2: 2,589
Python 3: 5,050

That’s over 46%. It should also be mentioned that since there is no historical data to compare against this is entirely based on the oldest release for a project. If a project deletes old releases – which would be bad as it would break people using older versions for some reason – then that could cause them to look newer than they actually are (you also used to be able to re-upload a file under the same release number, but PyPI no longer supports that). But using this definition of project creation does allow for calculating the rate throughout the past which may be a nice way to graph the adoption rate (it also makes it easier to re-calculate values in case more robust adoption measurements are used, e.g., checking whether any uploaded wheel files support Python 3).

What happens if we limit it to projects created in the past 6 months?

Unknown: 4,260
Python 2: 1,401
Python 3: 2,952

52%. Is this increasing rate of new projects supporting Python 3? Let’s just consider projects created in the past 30 days (which might be too noisy due to the number of projects under consideration):

Unknown: 761
Python 2: 247
Python 3: 534

Over 52% still. So at the moment the percentage is close to constant, but we are also diving into numbers that are small enough that little fluctuations will throw things off.

Conclusion #

A whole bunch of numbers that vary from 26% to 79% comparing Python 3 to Python 2 (remember: 100% would mean a 50/50 split between the two versions).

Based on all of this, I think there are a couple of statistics we can use to measure Python 3 adoption as a community. For new members of the Python community, the percentage of projects created within the last year seems like a good measure. For long tail acceptance, you can either measure from a set date like the release of Python 3.1.0 or a sliding window like any project that has released in the past e.g., 5 years; I’m not sure which way to measure will be the best nor what values to use to detect long tail adoption (feel free to hit me up on social media if you have any input on this). Major project support is mostly there so I don’t think there is a need to be measuring that separate from the other two measurements. Maybe toss in one or two other metrics – downloads from python.org? – and we should have the metrics suite we want to measure Python 3 adoption as an overall community.

Looking at Python 3 usage in the astronomy community

2015-05-15T06:45:49-07:00

Thomas Robitaille ran a survey on Python usage in the astronomy community and wrote up a great blog post on the results (a big thanks from me to Thomas for taking the time do to this!). The highlights of the survey results are:

17% - 20% of respondents are using Python 3 depending on how you want to count
No one really uses any version other than Python 2.7 and 3.4
Windows users are better about using Python 3 than Linux and OS X users
Seasoned Python programmers are using Python 3 more than new users
The biggest reason people give for not using Python 3 is lack of incentives

Now this data is very much skewed towards astronomers using Python, but I still think it’s worth looking at the results as a microcosm of the general community.

The amount of Python 3 adoption #

I have seen PyPI numbers that provide a lower bound of 5% adoption across the community, while I have seen other numbers like the one Thomas found where in parts of the Python community that have had their popular libraries and frameworks ported the uptake is closer to 20% (I have primarily seen these kind of adoption numbers for the scientific and web communities). And while I would like to see the overall number be higher than 5% (which might be a little low due to how the PyPI numbers are calculated), I’m quite happy with the 20% penetration in communities who have decided to embrace Python 3.

You should probably drop support for Python 2.6 and 3.3 #

If you want an idea of how much skew there might be in the PyPI numbers due to caching and what version of Python the initial project fetch was done for, look at the fact that the PyPI numbers suggest that just under 30% of the community overall is on Python 2.6. Anecdotally from people I have talked to at PyCon suggests that number is a bit high, and Thomas’ survey shows that at least in astronomy circles it’s very high. This year I have begun to advocate that people drop Python 2.6 support as it was released on October 1, 2008 which is over 7 years ago and thus past the 5 year support period for security fixes from the Python development team. Python 2.7 also has nearly 5 years of bugfixes on Python 2.6 since 2.6.6’s release on August 8, 2010. That means Python 2.7 has significantly less bugs, making it not only nicer to use but also easier to port to Python 3.

As for Python 3.3, the percentage of users from an absolute perspective is just too small. Both the PyPI numbers and Thomas’ numbers suggest only about 1% of people use Python 3.3 directly. And with Python 3.5 due out in a few months you will want to only be supporting Python 3.4 and 3.5 anyway.

Windows users are the most up-to-date #

Thomas’ survey showed that Windows users are using Python 3 at a rate of about 2x to 3x the rate of Linux and OS X users, respectively. I bet this is a side-effect of Windows users having to install Python themselves, and thus grabbing the newest release. The implication of this is people do typically use the version of Python that comes installed on their system.

For OS X 10.10 users, they get Python 2.4, 2.6, and 2.7 installed by default. Now I have no clue when Apple will include Python 3 by default with OS X, so who knows how long the lack of Python 3 coming with OS X could go on for.

But for Linux users, this will start changing in 2016. Ubuntu plans to only ship Python 3 in 16.04 which also happens to be an LTS release. Fedora plans to only ship Python 3 in Fedora 23. That means the two most popular Linux distributions are going to make Python 3 what you get automatically and force you to install Python 2 manually if you want it in 2016.

Getting new users into Python 3 #

Thomas’ data suggests that the people using Python 3 are not the people new to Python, but more seasoned users who know what Python 3 gets you. I suspect part of this ties into the system installation of Python not being Python 3 yet, and for some they just use whatever is already installed.

But for others it’s possible they are being taught Python 2 over 3. In that instance I think it’s a mistake. In my experience it’s easier to teach Python 3 since it’s a more consistent language and then teach exceptions to the rules for Python 2 use. Going the other way from Python 2 to learn what rough edges have been softened in Python 3 typically isn’t as smooth for the new user.

Incentives for switching to Python 3 #

Communicating the benefits of Python 3 has always been difficult if you don’t use Unicode. I have always said that once you have used Python 3 and had the sharp corners of Python 2 removed you simply won’t want to use Python 2 unless you absolutely have to. Literally everyone I have ever talked to who gets to write code in pure Python 3 agrees with me; once you have fully switched to Python 3 you simply don’t want to go back to Python 2.

Unfortunately because this is an instance of a lot of little things adding up to an overall improvement it’s hard to point to any one single feature that makes people go, “ooh, I really want that” which causes them to make the switch. It’s hard to say that keyword-only arguments, __pycache__ directories, or enhanced exceptions are good enough on their own to cause people to make the switch compared to Python 2. This is especially true for things from the stdlib which people have backported to Python 2, thus minimizing what is exclusive to Python 3 to only that which is based on syntax.

But features unique to Python 3 which can’t be backported and can be motivation on their own started in Python 3.4 with asyncio combined with yield from. That was the first instance where there was a very clear benefit to a group of people – those writing asynchronous code – where switching to Python 3 would be truly useful, no questions asked (yes, trollius does exist but yield from makes asyncio nicer to work with).

But Python 3.5 is going to bring two big changes which simply won’t work in Python 2. The first is matrix multiplication thanks to PEP 465. The numpy users will now have a syntax available to them for when they matrix multiply to arrays. This should make a large number of scientists very happy.

The second feature landing in Python 3.5 is async/await syntax thanks to PEP 492. This will essentially do away with the need to use yield from with asyncio or any other event loop for asynchronous programming and instead use syntax similar to that which is planned for/used in C#, Dart, ES7, etc. This should lead to a bunch more asynchronous programming in Python since the syntax makes it very explicit and easy to understand asynchronous programming.

I think 2016 is going to be a milestone in Python 3 adoption #

I think the confluence of Linux distributions switching over to Python 3, syntactic features in Python 3.5 that won’t be available in Python 2, all of the tooling now available to port to Python 3 (HOWTO and PyCon presentation), the end-of-life for Python 2.7 being less than 5 years away, and just simply time passing is going to lead to a nice bump in Python 3 adoption during 2016 that will hopefully be sustained going forward.

Porting to Python 3 is like "eating your vegetables"

2015-04-23T12:14:19-07:00

While at PyCon this year, someone pointed out that the hope/goal/expectation when Python 3 was released was to have over 50% of new projects using Python 3 within five years, and that had not come to fruition (while I have not personally crunched the numbers and this was told to me anecdotally, I will just assume they are right). Immediately after pointing this out they said that Python 3 is thus a failure at this point and we need to start figuring out when we call it quits on Python 3. I obviously disagree with this view.

We can’t predict the future #

Just because some prediction didn’t come true does not mean we have lost hope in Python 3. All it means is that the community’s uptake of Python 3 has been slower than we initially thought, and honestly that’s fine. The extension of Python 2.7’s end-of-life to 2020 is an acknowledgment of this slower pace. There are plenty of companies and projects which have switched to Python 3 fully or are at least supporting it and seem happy with Python 3. The polling of the audience by Guido during his keynote along with some uptake numbers shared in confidence with the core developers shows that the level of use is actually too high to consider abandoning Python 3 without screwing over a sizable chunk of the community (Donald Stufft’s post on PyPI download statistics shows this).

So why are we as a community, more than 6 years after Python 3 was released, not transitioning to Python 3 faster? At the language summit, Larry Hastings said porting to Python 3 is like “eating your vegetables”; you know you should, but that doesn’t make it enjoyable. I think that is a very fair analogy as to why most people don’t finding porting fun.

Why the hell we separated text and binary data in the language #

I have a sneaking suspicion that some people think we decided to clearly separate text and binary data “just because” or it seemed like the proper thing to do from some theoretical language design standpoint or something; all of those reasons are wrong. The reason we made the separation was to provide language level support/restrictions to prevent people from making subtle bugs when they mixed text and binary data. Talk to a Django core developer and ask them how much code and work it is to properly handle the separation of text and binary data in Python 2 and I’m sure you will be told it’s a lot and it’s a pain. I think Jacob Kaplan-Moss tells me almost annually at PyCon that the day Django stops supporting Python 2 they will be able to rip out a ton of code that exists purely because it was so easy to mix text and binary data and get it wrong. In other words most of us are mediocre programmers and this text/binary separation helps us not to screw up (also making sure code works with Unicode for text is also a benefit of this separation).

This means dealing with the text/binary separation in Python 3 is part of eating your vegetables. It was done to help people in the long term. People also seem to forget that the core developers have to eat the exact same vegetables as everyone else. The core developers are not some magical group of people who get to code in Python 3 all day long; plenty of us core developers still work with Python 2 as part of our job. That means decisions like the text/binary separation are not taken lightly as it impacts our lives just like everyone else’s.

How to make eating your vegetables easier #

So you should eat your vegetables, but that doesn’t mean we can’t try to make it easier. I gave a talk at PyCon 2015 on making Python 2 code be 2/3 compatible. You should be able to get all the same information from the Python 2/3 porting HOWTO. The tooling and such should actually be to the point that if you run your code against Python 2.7 and Python 3.5 (when it’s released) that any porting issue shouldn’t be silent but instead raise a warning, exception, message, etc. In other words the only bit that should require a good amount of thinking is the text/binary separation for your APIs and making that separation happen.

You can also view the various steps to support Python 3 as part of your code’s health if that helps motivate you (or your manager). Since the changes in Python 3 are to make your code better, you could argue that modernizing your code and cleaning it up in Python 2 has a side-effect of supporting Python 3. And much like code maintenance, you can make porting a piecemeal process where you don’t have to do all of it at once but in pieces to gain different things (e.g., moving to iterator versions of map and other built-ins can lead to better performance and more use of iterators which just happens to align with what Python 3’s built-ins do).

But please don’t assume porting is free. It can be fairly straightforward for some projects, but harder for others. Either way there is some work involved. There have been reports of people in the community being somewhat mean to project maintainers for not porting which isn’t right. You can always ask someone to port their code, but you should always be cordial about it and be understanding if they choose not to.

Getting treats for eating your vegetables #

Like many of you, when I was younger and I didn’t want to eat my vegetables my parents would tell me that if I did eat my vegetables I would get dessert. The Python development team is constantly trying to toss in more desserts into Python 3.

For some it may be asyncio that they get to look forward to when they port (and I’m willing to bet the first person to create a library which really harnesses asyncio is going to have a popular project on their hands). Others may be looking forward to type hints inlined in their code (for Python 2 you can put the type hints in stub files). Assuming things go the way I think they will, the async/await syntax for co-routines will be landing in Python 3.5 and that seems to be exciting people. For others it might be performance improvements that we have landed over the years which help with things like memory usage from strings, etc. But the point is that we are always trying to make your effort in porting to Python 3 worth that much more.

We are one community #

Regardless of what version of Python you run, we are still a single community. While I think Python 3.4 is currently the best version of Python available, you’re welcome to disagree with me or agree but not think it’s worth your time and effort to use it; all I ask is you be informed in your opinion. Honestly, if some people never switch to Python 3 and make Python 2 the COBOL of dynamic programming languages I’m totally fine with it (just as long as you don’t make me maintain it past 2020 ☺). The one thing I really want to make sure never goes away is how awesome the Python community is.

Going all-in on the mobile web

2015-03-06T10:44:03-08:00

In this world of Android vs. iOS – with a smattering of Windows Mobile and Blackberry – I find native apps somewhat annoying. In the beginning, iOS was actually not going have any third-party apps on the phone and everything would run through Safari on your iPhone. Developer backlash due to worries about performance, though, helped lead to Apple changing its position on native apps on iOS.

I think this is a shame. I appreciate that web apps embody SLICE (Secure, Linkable, Indexable, Composable, Ephemeral, and I have heard it suggested that Updatable get tossed in). I also appreciate that the web embodies a common denominator platform that is cross-platform (I’m not naïve enough to think the entire web platform is cross-platform thanks to differences in what APIs are implemented at any given time by the various browsers). Why do developers need to choose whether to launch on iOS or Android first or exclusively on one platform? Why can’t developers simply launch simultaneously and instantaneously on all platforms through the web?

The answer is that many can, but they choose not to for various reasons (some legitimate, some not).

What leads to requiring a native app? #

What has traditionally set native apps apart from web apps on mobile phones? I would argue it’s the following:

Performance (varies between browser releases and phone generations, but low-level stuff like controlling socket connections outside of WebSockets or SIMD for GPU calculations isn’t possible)
Offline access (being solved thanks to Service Workers as AppCache is not pleasant to work with; actual storage space can also be an issue)
Periodic background processing (Service Workers could do it if the task scheduler API gets accepted)
Notifications (w3c spec for text-based notifications, plus there is a new Push API for Service Workers to work in the background to push notifications to the user)
Sensors (stuff like geolocation are available, some other things are not)
OS-specific features (e.g., intents on Android)

As you can see, a good amount of features have either just landed in browsers – Service Workers arrived in Chrome 40 – or are coming very soon – the Push API for Service Workers is coming in Chrome 42. Unfortunately not everything is actively scheduled to land in a browser – scheduling a Service Worker to run isn’t planned for any browser yet – and OS-specific features like intents are typically off the table since they are not OS-agnostic and thus won’t work in a browser no matter what OS it’s running on. In terms of raw performance, it’s a constantly fluctuating thing that’s always being discussed, e.g. Mozilla, Google, and Intel looking into SIMD in JavaScript. In other words claiming the browser is “slow” doesn’t hold a-priori.

Making myself a guinea pig #

How many apps do people use that lack a mobile web version but which actually could get away with having one (either full-featured or with some degraded UX)? Taking stock of what I have on the homescreen of my Android phone, I have the following list of applications grouped by category which I use almost daily and looked at whether they could have a web app experience of some form that was still useful. Anything in italic means that a web experience of some sort is possible today, and if something is bold then someone actually implemented a mobile-friendly browser experience.

Alarm (Timely): not possible as a web app due to lack of background scheduling and the ability to force a web page to appear to play some alarm constantly
Podcast player (Pocket Casts): they have a web app, but it isn’t mobile-optimized
Navigation (Waze): would require keeping the app upfront, but there is no reason a navigation app couldn’t exist through a browser
Email (Inbox): Inbox doesn’t have a mobile web app, but Gmail does
Local search (Maps): Maps has a full-featured mobile experience
Messaging (Hangouts, WhatsApp, Cord, and Facebook Messenger): a degraded experience until the Push API is available is possible when the web app is running upfront; Facebook has Messenger as part of their mobile web app while Hangouts does not and the way WhatsApp is designed simply won’t allow a web experience that doesn’t require the native app without better storage guarantees in the browser
Phone calls (Hangouts Dialer): WebRTC shows that making phone calls doesn’t require anything special from the browser on your phone
Check-in (Swarm): A degraded experience that doesn’t tell you about when your friends are nearby is totally doable in the browser
Learning a language (Duolingo): already works on the desktop, so no reason not to work on mobile
Lists (Keep): Keep has a mobile web app
Membership cards (Stocard): Geofencing would let you get deals pushed to you, but scanning barcodes in a browser and showing them later is totally doable
Streaming (Netflix and Play Movies): Netflix is obviously on the desktop, so as long as mobile browsers have the DRM support necessary it should work in mobile browsers
Movie/TV info (IMDb and Series Guide): IMDb works fine on mobile – albeit with some missing features, like custom lists – and Series Guide can store its data in Trakt which has a mobile web site
Music (Play Music and Sonos): Sonos requires special socket access so that’s not doable, but Play Music could totally be done in the browser
News (News & Weather): Google News has a mobile website
Social media (Google+ and Facebook): Google+ and Facebook both have good mobile websites
Feeds (gReader Pro backed by Feedly): Feedly lacks a mobile site, but The Old Reader does not and gReader can read from either as a backend service.
Read later (Pocket): Pocket has a desktop site, there is no mobile site; Instapaper does have one, though
Calendar (Calendar): might not be able to raise event reminders, but Google Calendar is available on mobile
Passwords (Oplop and Dashlane): Oplop has been mobile-friendly for years while Dashlane is not (nor is LastPass)

Out of 20 categories, 19 could have a useful web experience but only 11 do (and 3 of them would require changing who provided me the service to get a web experience). I would be really curious to see a study done that evaluated if doing a mobile web app for Android and iOS – or even just one of the platforms – led to more or less work than doing a native app. But my point remains that just because someone provides a native app doesn’t mean a web app wouldn’t also work for the same use-case.

Alternative voting systems for Canada

2015-01-26T14:56:56-08:00

With 2015 bringing a federal election where current polling suggests that either the Liberals or Conservatives could have a minority government based on how people vote, I’m rather interested in how election systems could influence the outcome. Last month there was an article about two alternative voting systems and how they would influence the elections. While reading into these systems I also found out that voting system reform is not a new thing in Canada and that both the NDP and Liberals have actually discussed changing how federal elections work after the 2015 election. With the importance of this election being so high – the Conservatives have been the ruling party since 2006 with a majority since 2011 and so it is expected they will at minimum lose their majority stance – I decided to try and learn what alternative voting systems are being proposed and how they would influence the political landscape in Canada.

Quick primer on Canada’s federal government #

Canada’s government is based on the Westminster system where there is Parliament made up of representatives elected by the public. Each member of Parliament – known as an MP – is elected by a riding of which the largest is just over 180,000 people but hovers more around 100,000 nationally (which I have been told is one of the best ratios of people to representative in the world). The political party with the most number of representatives appoint the prime minister – who is always the party leader – who is known at the PM. Federal elections are called whenever Parliament is dissolved which typically occurs when a majority of MPs vote for elections to occur.

One key thing to realize about this kind of system is that whenever a party has a majority government – which cannot last longer than 5 years by law and what Canada currently has – they can basically pass any law they want as long as the Senate signs off on it (which they almost always do). The only check on Parliament then comes from the Queen of England and the Governor General who is the Queen’s representative in Canada. Either of these two people can dissolve Parliament and are the only people who can do so (when Parliament votes to dissolve Parliament it’s just a suggestion to the Governor General).

We also have five national parties: Conservatives, Liberals, NDP, Greens, and Bloc Quebecois. The first three are the big ones while the Greens are fairly new and only have 2 MPs and the Bloc is exclusively from Quebec and only have 2 MPs. In other words fairness in voting is a big deal here since voting is not a binary choice.

First-past-the-post voting (what the Conservatives want and what the current system is) #

The voting system currently used in Canada is called first-past-the-post (abbreviated FPTP). In this system you get a single vote and you cast it for the person who you want to win. The person with the largest number of votes wins the election. Very simple.

Unfortunately it also leads to disproportionate representation. Let’s pretend we have a total of 3 MPs named Brett, Andrea and Gidget. Each MP represents a riding made up of 10 people. Brett is a Liberal and gets elected by 6 votes with the other 4 going to the Greens. Andrea is also elected by a 6-4 vote along Liberal/Greens party lines. Gidget, though, gets all 10 votes in her riding and she is a Green representative. So while Parliament would be split 2-1 Liberal/Greens, the 30 voters actually voted 12-18 Liberal/Greens. In other words while people as a region voted more for Greens than Liberals, because of the way first-past-the-post works the Liberals are actually the party in power.

This example shows why sacrificial voting is caused by first-past-the-post. In our example, it obviously is not worth voting for the Greens if you know they won’t win your riding, regardless of whether you think they are the best party to run Canada. And so people end up voting for the best party they think can win instead of the best party period. While a two-party system like they have in the United States doesn’t really need to worry about this so much, in Canada where there are five officially recognized political parties at the federal level, sacrificial voting is definitely something one has to take into consideration, e.g. I might prefer the Greens but if I would rather the Liberals win out in a close race against the Conservatives I would be compelled to vote for the Liberals to help make sure the least bad result which actually has a chance of occurring.

Alternative voting (what the Liberals want) #

Also known as instant-runoff voting, alternative voting – abbreviated AV – is a voting system where you rank the candidates in order of preference. Once all votes are cast each ballot counts as a single vote for the person’s top choice. If no one got a majority of votes, then the candidate with the fewest votes is eliminated and then the ballots are counted again. In this next round, though, any ballot that cast the now-eliminated candidate as their primary choice shift to having their second pick count as their vote. This process of elimination and shifting who gets the vote from a ballot continues until someone gets a majority of votes. Or another way of looking at it is imagine everyone casts their votes for their top choice, if there is no clear winner then the least popular person is eliminated, and then there is an instant-runoff where everyone votes again with one less person to consider (this is where the name of the system comes from).

In Canada, AV is used to elect the leaders of the Liberal and Conservative parties. It’s also used in Australia to elect members of their House of Representatives.

AV does a better job of proportional representation than FPTP which is obviously good. It also makes key ridings less important as you may not win a riding simply by having the most #1 votes if it’s less than a majority while someone else has a majority of #2 votes.

FPTP is not flawless though, as it does lead to collusion between parties in telling people to vote specific parties lower than another one. The idea is that I will tell my constituents to put you second if you tell your constituents to put me second. So there is room to influence votes with AV.

Mixed-member proportional representation (what the NDP and Greens want) #

Abbreviated MMP, mixed-member proportional representation has a somewhat interesting history in Canada. Back in 2004, the Law Commission of Canada – de-funded since 2011 – released a report suggesting MMP would be a good voting system for Canada (PDF; relevant discussion of MMP is in Section 4.4 starting on page 83 of the printed text, page 105 of the PDF). So what is MMP and why was it suggested for Canada (and is already used in New Zealand and Germany among others for their federal elections)?

In MMP you vote for two separate things. First you vote for the party you want to lead the federal government. Second, you vote for the person you want to represent you. In both FPTP and AV this is one and the same vote which means if you don’t like the person running in your riding but they represent the party you want to run the country then you have to decide which is more important to you. But in MMP who you think can represent your riding the best is not conflated with what political party should be setting the agenda in Ottawa. Think of it as you’re voting for the party you want running the federal government and you’re separately suggesting the individual you want to personally represent you regardless of what party they represent.

The other interesting aspect of MMP is how these two seemingly separate votes consolidate with each other so that you end up with a representative government that is proportional to what your region wants. For each riding, the representative MP that you directly voted for wins based on first-past-the-post. These elected riding MPs make up 2/3 of all MPs representing a region (which can be as big as a province). The remaining 1/3 of MPs are filled based on how many MPs a party would need to top up their MP count to more accurate represent the party vote in the region.

These “filler” MPs are called list MPs because they are essentially pulled from a list provided by the party (the Law Commission of Canada suggested letting people either vote for specific list MPs that they preferred or simply vote for a party and let that party choose which list MP would get that vote, that way people can choose how involved they want to be in choosing their list MPs; pulling from the list exclusively is called closed list MPs, always voting for the list MP is called open list, and allowing either is called mixed list).

Consider a region with 10 MPs: 6 riding MPs and 4 list MPs. Let’s also say that the Liberals got 50% of the party vote along with 3 riding MPs elected. That would mean the Liberals get 2 list MPs since the region said they should make up 50% of the government but fell short in the riding MP votes by 2 representatives. Notice how voting for a riding MP does not mean you can vote for your party twice; if you voted in a Liberal MP and voted for the Liberal party you don’t get to make 2 MPs suddenly appear for the Liberals, just that you got to help select one of the Liberal MPs directly because you liked them personally. A good example of how this whole thing works can be found in the Law Commission of Canada’s findings on printed page 94.

A huge effect of MMP is that minority parties can get much larger representation in government. Since your party vote directly correlates to proportional representation for your region it means you can vote for e.g., the Greens knowing that your votes actually counts towards them representing you in Parliament. This more favourable representation is why the NDP actually brought a vote on MMP this past December and have a petition to try and change the voting system in Canada after the 2015 election.

MMP is not perfect, though. Your vote can still not count the same as someone in another region based on how the number of MPs a region represents is divided. In some places that use MMP they solve this by having the list MPs be filled in nationally so that everyone’s vote is equal (this wouldn’t work in Canada for historical reasons, but basically certain provinces and all territories have minimum representation guarantees in Parliament and so having overflow representation would break those guarantees). But compared to both FPTP and AV, MMP usually has the better representation of voter intent.

The other issue is the size of the ballot. With a mixed list vote you would have a ballot listing all of the riding MPs you have to choose from and then you would have the mixed list vote where you either vote for a party generally or vote for a list MP specifically which could be as long as 15 people. I have looked at the example ballot and it isn’t complex, but it definitely isn’t small if you have up to 5 parties listing 15 people each for their listing MP votes (this is more of a technicality and is a solvable problem, but it is something to consider).

My personal opinion #

After reading about all of these voting systems, I have to admit that MMP appeals to me the most. It comes the closest to making sure my vote for a party actually leads to representation by that party in Parliament. I also appreciate that my personal representation in Parliament by an MP is a separate vote and does not impact which party I want to represent me.

It also has the side-effect that majority governments would become a rarity. With small parties able to win more seats it leads to more compromise and teamwork between parties in order to get legislation passed. MMP just seems the most fair, and as a Canadian I like being fair.

Why WhatsApp made its web app use your phone as a server

2015-01-23T09:27:18-08:00

Ever since WhatsApp announced their web app, I have seen various people complain about having to keep your phone on to send messages. But in all of these posts people seem to be overlooking two key points about the design of WhatsApp which either necessitate this design or at least facilitate WhatsApp in keeping their service lean and fast.

Encryption #

In case you didn’t hear, WhatsApp (supposedly) does end-to-end encryption. That means WhatsApp can’t read what people say in their messages. What this also means, though, is that if someone sends you a message and you have your phone and web app, there is no way to read that message independently of both devices without sharing the decryption key, which is not good security practice. You could have separate keys for your phone and web browser, but then you would then need to make sure to share keys between your phone, your browser, and all of your contacts so they could receive messages from any of your devices.

But if you treat your phone as your personal WhatsApp server, then your phone can continue to have the master keys for your account and then you only need to manage keys between your phone and your other devices, making it a hub-and-spoke system where you phone is the hub and your other devices are the spokes. This keeps WhatsApp out of the key management business and allows you to easily revoke keys for your other devices without WhatsApp having to store anything for you. So tying everything through your phone keeps everything encrypted and simplifies key sharing between users to just their phones and keeps security simple which is how you want it to be.

Storage #

Someone pointed out to me that WhatsApp is not in the storage business, and when you stop and look at how the service was structured before the web app came along you will realize that WhatsApp tried to minimize what data it had to store from its inception. For instance, if you get a new phone you actually have to migrate your messages over yourself. Since WhatsApp doesn’t keep messages on their servers, your phone ends up being the keeper of truth when it comes to what messages you sent and received.

But the key thing about WhatsApp not storing messages on their servers is how much it simplifies their service. Consider their last publicly stated user count of 500,000,000. Since WhatsApp doesn’t store messages for you, they really only need to store messages that have yet to be delivered to a user’s phone and your account’s configuration data. So let’s assume every user suddenly sent a bunch of photos that came to a total of 1 MB of pending messages (remember that WhatsApp is only going to show you a small version of a photo and so they can compress them such that they don’t take up much space; 1 MB should go a long way). That’s 1 MB * 500,000,000 = 500 TB of storage. OK, not a puny number for most services.

But let’s look at this from a cloud perspective. Let’s say you wanted an extremely fast service, so you would want to use local SSD which Google Compute Engine offers. As I write this, a local SSD on GCE is 375 GB and you can have up to 4 per instance. At 375 GB that gets means you would need 1,334 SSDs to store 500 TB of data (it would also require at minimum 334 instances and probably enough for the service to run, but I’m not pricing out computation or bandwidth costs to keep this simple). Now according to GCE’s pricing, a local SSD costs USD 81.75/month. That means it would cost you USD 81.75 * 1,334 SSDs = USD 109,054.50/month or USD 1,308,654/year. Now let’s be totally extravagant and say you want the data replicated near the sender, near the receiver, and on one other continent for safekeeping until delivery (when the sender and receiver are on the same continent the data could go to two separate clusters). So we are talking about 3N data replication. That works out to USD 3,925,962/year in local SSD storage costs if you used Google Compute Engine at full retail and always maintained this level of storage constantly. In the current startup climate, USD 4 million/year is pittance (and unnecessary as I bet WhatsApp could get away with 2N data replication if that since this is only for buffering purposes). And for a company like Facebook who have their own clusters? Storing 1.5 PB of data would not be difficult at all, so WhatsApp could go as far as backing up the data on every populated continent if they wanted to for 3 PB or less than USD 8 million/year.

In other words by relying on your phone as the storage mechanism for messages and not having to keep anything on their own servers, WhatsApp can run very cheaply and efficiently (and only cost you USD 1/year as a user). This makes WhatsApp basically nothing more than a fast routing service with some buffering between phones, much like the telcos (which might be what inspired WhatsApp to use Erlang). It’s rather ingenious and probably why the service seems so fast and has such great uptime.

And I bet users don’t care about the potential loss of messages either; when was the last time you scrolled back into the history on your phone to a point that preceded you getting that phone or having catastrophic data loss on the device? Plus you have to consider how many users of WhatsApp are really going to use the web app; I bet a large portion of their users don’t even have their own computers beyond their phones so this web app service probably isn’t critical to WhatsApp’s success.

How to get your parents off of Skype using WebRTC

2015-01-07T11:50:33-08:00

My mother-in-law has a cousin she likes to talk to. Because I like to simplify my tech support requirements by standardizing my entire family on a single platform when possible, all of my immediate family uses Google Hangouts (promoting saving money through free phone calls to the US and Canada also helps). Unfortunately my mother-in-law’s cousin is on Yahoo and Skype and is not that tech-savvy. That means I want a video chat solution that is dead-simple and doesn’t require a common account (if she had a Google account then my mother-in-law could just email a link to a hangout and just reuse that hangout every time).

Luckily it turns out that you can overcome these obstacles thanks to WebRTC. If you don’t happen to know what WebRTC is, think of it as providing real-time voice and video in the browser without plug-ins. On a more technical level, think of it as browser-supported session and signaling support for voice and video communication. It’s currently supported in Chrome, Firefox, and Opera (including Chrome and Firefox for Android); Safari and IE don’t support WebRTC natively at the moment, but do have plugins (IE has announced plans to add support in the future). All of this means that you can now do video chats through the browser without installing a plug-in.

The common format is a website that will launch a video chat room at some URL (sometimes of your own choosing by picking a chat room name). You can then send that URL to other people that they visit in a supported browser. At that point all they have to do is allow the web page to access their microphone and camera (probably saying “look for something asking if the web page can use the camera and microphone at the top or bottom of the page” should be enough instruction for most people). The sites that I found which worked in some form from Android on Chrome are appear.in, Talky, and vLine, although none of them are mobile or tablet-optimized (Talky is my personal favourite). Mozilla is also adding support through Hello which requires Firefox to initiate the video chat but which can be joined through an WebRTC-supported browser.

And for you developers out there, a lot of open source code exists to facilitate using WebRTC, so if you want to add audio/video support to your own web page it shouldn’t be too difficult.

So for those instances where you and someone you want to video chat with don’t have a common messaging platform, WebRTC works out well as a common denominator.

Commentary on getting your code to run on Python 2/3

2014-12-05T13:00:08-08:00

Today I committed a heavily updated version of the Python 2/3 porting HOWTO. Basically the doc has shifted from suggesting you use 2to3 for gaining Python 3 compatibility for your Python 2 code and instead you aim for a Python 2/3 source-compatible code base using various tools. Read the HOWTO for the details.

This blog post is about how the shift in approach for Python 3 support came about, general timelines on what will (not) be coming in the future to help with porting, and to emphasize that you should start porting your code today no matter what your dependencies say.

The story behind the shift in direction for the HOWTO #

When Python 3 was being developed, python-dev thought a transpiler from Python 2 code to Python 3 code would be the best solution. So we created 2to3 to transpile Python 2 code to Python 3 and structured it so that distutils would do the transpiling at install-time. That way people could continue to write in idiomatic Python 2 code that could then also run under Python 3 with an extra step.

But opinions had changed by the language summit at PyCon 2014. By then several years had passed since Python 3 came out in December 2008. It had become clear that the transpile step was not the way the community had decided they preferred to make Python 2 code support Python 3. Instead, the community had noticed that the differences to get Python 2 code to run under Python 3 were not really that drastic, and so they had begun writing source-compatible Python 2/3 code, completely cutting out 2to3 and its transpiling step. And so at the summit we talked about what Python the language could do in Python 3.5 to help with this source-compatibility approach and realized that more tooling was necessary to facilitate supporting Python 3.

Being a staunch supporter of Python 3, I decided to take it upon myself to get the tooling to where it needed to be so that source-compatible Python 2/3 was as automated as possible. I had already created caniusepython3 to help people track their dependencies and their Python 3 status, so that part of the puzzle was solved. To actually help transition code I looked around and there was Armin Ronacher’s Modernize and python-future’s Futurize. Since Futurize extended Modernize I figured I would help on the lower-level tool.

Right when I decided to start helping out, Armin let the project be spun out by Thomas Kluyver and Daira Hopwood. I looked at what Modernize did and what it could potentially do, and then set out to make the discrepancy between those two lists disappear. In the end I was actually made a project owner by Thomas and Daira and managed to get all of my desired changes in.

With that out of the way, I realized that just because someone updated their code to be Python 2/3 source-compatible it didn’t mean their dependencies had caught up. How were projects to stay Python 3 compatible if they weren’t able to run under Python 3 yet due to their dependencies? That’s when I realized I could update Pylint to have checkers for things that would not work under Python 3 (either syntactically or semanitcally). This would allow projects to basically make Python 2/3 source-compatibility part of their style guide and have Pylint help enforce it. In the end a bunch of checks got added by me and Pylint added a --py3k to run Pylint with just the Python 3 checkers so people weren’t forced to buy into all of Pylint’s other checkers.

I would like to point out that when I started this endeavour I didn’t announce it ahead of time, nor was I in some special position with either Modernize or Pylint. I was just another developer out there contributing to open source. And yet I was able to affect both projects and (hopefully) improve them for the better by contributing code. Open source is quite an amazing process when it works.

What is (not) coming in the near future to help with porting? #

So with the tooling all there now for porting to Python 2/3 code, what exactly is coming in the future that might influence when you start porting? Basically the only thing is modulo/% operator support for the bytes type in Python 3.5 thanks to PEP 461. That should help with the usual text/binary data separation that can trip people up when they do a lot of binary manipulation.

But otherwise from a language perspective I wouldn’t expect anymore backporting of Python 2 features to close the gap between Python 2 & 3. Basically once Python 3.5 comes out you should expect to have the language support you are going to have until Python 2.7 support ends in 2020 in order to facilitate supporting Python 2 and 3 simultaneously.

Start porting TODAY #

You have Modernize and Futurize to automate a large portion of the transition. You have Pylint to help make sure you don’t regress in your Python 3 support. You have caniusepython3 to let you know when your dependences are no longer the hold-up. The tooling is all there, and so there is no reason to wait to port. Even if PEP 461 would help you, you can port now and simply wait to run on Python 3.5.

And you shouldn’t wait for your dependencies either because transitioning now can be helpful. You can start using new practices that Python 3 introduced and get used to them sooner rather than later. And Python 2/3 code is actually very readable; the only major differences from straight Python 2 code is you have to import some functions that used to be built-ins and you use functions to access iterators over dictionaries instead of methods.

Please don’t postpone porting your code. You might as well enjoy the newer idioms you get when you start using all the __future__ statements in Python 3 along with other idiomatic changes. It also means that once your dependencies get ported you will be able to hit the ground running on Python 3 instead making yourself be the hold-up.

The long-term view of Python 2.7

2014-11-28T11:02:49-08:00

When Python 2.7.9rc1 was released, I shared the news through the +Python Google+ account. Comments on the post ranged from “thanks for keeping Python 2.7 alive!” to “why haven’t you just killed off Python 2.7?” To help frame these discussions, realize that Python 2.7.9 has two big themes:

Getting HTTPS/SSL support and security more in line with Python 3
Adding ensurepip so that pip will be installed alongside Python 2.7.9

If you don’t know the reasoning behind these changes it might seem like we are suddenly adding features to Python 2.7 to help extend its life and keep it relevant. The truth, though, is more nuanced.

We as the Python development team have decided to support Python 2.7 until 2020 (that was announced at PyCon 2014 so hopefully this isn’t news to you). As with other releases of Python, “support” means bugfixes and not new features. So why the heck have we backported the ssl module from Python 3.4 back to Python 2.7 and made these changes to the default security setup?

It boils down to security and backwards-compatibility. The security part is hopefully pretty obvious: you want HTTPS and SSL to be up-to-date and secure whenever possible. Since Python 2.7 didn’t secure HTTPS by default, this was an issue. And with Python – both overall and specifically Python 2 – being so popular, having insecure software running a large chunk of the internet isn’t good. And so the decision was made through PEPs 466 and 476 to make Python 2.7 secure by default when it comes to networking.

The backwards-compatibility part comes into play since this is a change in defaults in the name of security. Since Python 2.7 has been essentially insecure for so long we wanted to give users who want that specific setup a way to opt-out of the new defaults and get back to the way things were.

As for the addition of ensurepip, since that is a side benefit of the installer and not the language or standard library it was not viewed as backwards-incompatible. Plus we want to help move the community towards switching to pip for their project installation needs to unify the community around a single tool (it also helps move the community towards using wheels more).

In other words, these changes are in line with the Python development team’s normal dedication to security and backwards-compatibility. This doesn’t mean we don’t want people to be switching to Python 3 actively (we do and I should have news about that before the year is out), nor does it mean we are going to let Python 2.7 be a security hazard simply because we want people to move on to Python 3. Python 2.7 is still in bugfix-only mode and support will end in 2020 as planned, we just happened to fix a long-standing security issue after we decided we were not going to drop Python 2.7 support in 2015 as originally planned.