Get ready for machines to make mistakes

In 1994, Intel acknowledged a bug in its Pentium processor that produced incorrect results in some rare circumstances. Intel calculated that the average user would hit this bug once every 27,000 years. Consumers reacted quite negatively to this news, and Intel was eventually forced to recall processors worth some 500 million USD.

Ever since computers were invented we have marveled their perfection. Machines are good at what human’s aren’t: reproducibly repeating billions and billions of calculations, never once making a mistake. We have come to embrace this expectation as a fundamentally held belief: machines are flawless. Intel violated this expectation through its now infamous FDIV bug, and it reaped customers’ wrath for it.

AI is about to do what Intel failed to do in 1994: reshape our expectations in computers.

The past of computers belongs to software and software is deterministic. If you send data through a program, there is a certain output we expect, and we can reason over why that output was produced, or why not.

AI is inherently non-deterministic. One of the strength of AI is that it allows us to solve problems we simply don’t know how to write algorithms for. For decades researchers struggled to write algorithms that recognize handwriting, for example. The advent of modern machine learning solved this problem without actually solving it. We still have no clue how to design an algorithm that recognizes handwriting, but we successfully used machine learning to create models that do so.

The caveat is that AI recognizes handwriting probabilistically. Even a perfectly well shaped letter or number may at any time be completely misread by a model. Well trained models are mostly right, most the time, and well … sometimes randomly wrong.

And thats not a bug. Its simply a fundamental property of AI as a probabilistic data-driven system. The key strength of AI is that it can produce often correct results for data it has never seen before. And this key strength is also its key weakness. There is simply no guarantee that the results will be correct for any particular input. At best we can make probabilistic predictions regarding accuracy.

So are consumers going to revolt against AI making mistakes the way they revolted against the FDIV bug 20 years ago? I don’t think so. The more “human-level” problems (voice recognition, image recognition) AI solves, the easier it becomes for us humans to relate to machines making mistakes. We make similar mistakes ourselves, after all. If Siri misunderstands my voice command in a noisy environment, its hard to get upset about that. Humans sometimes also have to ask again if there is a lot of background noise. Machines solving problems that humans naturally solve and sometimes fail at on a daily basis will help us adapt to this new world where machines make mistakes. To use psychology terminology, its a lot easier to have empathy for AI than for an algorithm.

AI silicon vs AI software

There is a lot of activity in the neural network hardware space. Intel just bought Nervana for $400m and Movidius for an undisclosed amount. Both make dedicated silicon to run and train neural networks. Most other chipset vendors I have talked to are similarly interested in adding direct support for neural networks to future chips. I think there is some risk in this approach. Here is why.

Most of the time executing a neural network is spent in massive matrix operations such as convolution and matrix multiplication. The state of the art is to use GPUs to do these operations because GPUs have a lot of ALUs and are well optimized for massively data parallel tasks. If you spent time optimizing neural networks for GPUs (we do!), you probably know that a well optimized engine achieves about 40-90% efficiency (ALU utilization). Dedicated neural network silicon aims to raise that to 100%, which is great, but in the end a 2x speedup only.

The problematic part is that chipset changes have a long lead time (2-3 years), and you have to commit today to the architectures of tomorrow. And thats where things get tricky. A paper published earlier this year showed that neural networks can be binarized, which reduces the precision of weights to 1 bit (-1, 1). Slow and energy inefficient floating point math turns into very efficient binary math (XNOR), which speeds up the network on existing commodity silicon by 400% with a very small loss in precision. Commodity GPUs support this because they are relatively general purpose computers. Dedicated neural network silicon is much more likely designed for a specific compute mode.

In the last 12 months or so alone we have seen dramatic advances in our understanding how to train and evaluate neural networks. Binarization is just one of them, and its fair to expect that over the next 2-3 years similarly impactful advances will be published. Committing to hardware designs based on today’s understanding of neural networks is ill advised in my opinion. My recommendation to chipset vendors is to beef up their GPUs and DSPs, and support a wide range of fixed point and floating point resolutions in a fairly generic manner. Its more likely that that will cover what we’ll want from silicon over the next 2-3 years.

Oracle sinks its claws into Android

This is my first blog post since leaving my role as Mozilla’s CTO 6 months ago. As you may have read in the press, a good chunk of the original Firefox OS founding team has moved on from mobile and we created a startup to work on some cool products and technologies for the Internet of Things. You’ll hear more about what we are up to next month.

While I am no longer working directly on mobile, a curious event got my attention: A commit appeared in the Android code base that indicates that Google is abandoning its own re-implementation of Java in favor of Oracle’s original Java implementation. I’ll try to explain why I think this is a huge change and will have far-reaching implications for Android and the Android ecosystem.

Why did Google create its own Java clone?

To run a Java app, you need a runtime library written in Java called the Java standard classes. This library implements basic language constructs like hash tables or strings.

Since the early days, Android didn’t use Sun’s version of the Java standard classes. Instead, the Android team enhanced the open source Apache Harmony Java standard libraries. Harmony is an independent “clean room” open-source implementation of the Java standard libraries maintained by the Apache Foundation.

There is basically no technical advantage in using Harmony. Its a strictly less complete and less correct version of Sun’s original implementation. Why did Android invest all this effort to duplicate Sun’s open source Java standard classes?

Apache vs GPL

Over the course of Android’s meteoric rise, the powers behind Android demonstrated a deep strategic understanding of different classes of open source licenses, their strength, and their weaknesses. Android has from its early days successfully used open source licenses to enable proprietary technology. Sounds counter-intuitive, but it explains why Google rewrote so much open technology for Android.

Java is actually not the only major open technology piece Google reinvented. Since Android 1.0 Google uses bionic as its standard C library. There were very few strong technical reasons to use bionic over open source alternatives such as the GNU libc. Quite to the contrary, at Mozilla in the earlier days of Android we had to constantly fight deficiencies in bionic in comparison to existing open source standard C libraries. Famously, bionic was not thread safe in many cases, crashing multi-threaded applications.

Writing a standard C library from scratch is crazy. Its one of the most commoditized pieces of software. Its almost impossible to do it significantly better than existing implementations, and it costs a ton of time and money and compatibility is a huge pain. Why did Google do it anyway? There is a simple answer: Licensing.

Bionic (as Google’s Java implementation) is licensed under the non-viral Apache 2 (APL) license. You can use and modify APL code without having to publish the changes. In other words, you can make proprietary changes and improvements. This is not possible with the GNU libc, which is under the LGPL. I am pretty sure I know why Google thought that this is important, because as part of launching Firefox OS I talked to many of the same chipset vendors and OEMs that Google works with. Silicon vendors and OEMs like to differentiate at the software level, trying to improve Android code all over the stack. Especially silicon vendors often modify library code to take advantage of their proprietary silicon, and they don’t want to share these changes with the world. Its their competitive moat–their proprietary advantage. Google rewrote bionic — and Java standard classes — because silicon vendors and OEMs probably demanded that most parts of Android are open (ironic use of the word in this context) to this sort of proprietary approach.

UpdateBob Lee who worked for Google at the time commented below that OpenJDK didn’t exist yet when Android 1.0 launched (2007), so Google couldn’t use OpenJDK back then. GNU Classpath (LGPL) did exist since 2004, however. Google still chose Harmony, and stuck with it even after Apache abandoned the Harmony project in 2011. The switch now is clearly driven by the Oracle vs Google lawsuit.


OpenJDK is the name for Oracle’s Java that you can obtain under the GPL2, a viral open source license with strong protections. Any changes to OpenJDK have to be published as source code (except if you are Oracle).

Because Oracle has means to control Java beyond source code, OpenJDK is about as open as a prison. You can vote on how high the walls are, and you can even help build the walls, but if you are ever forced to walk into it, Oracle alone will decided when and whether you can leave. Oracle owns much of the roadmap of OpenJDK, and via compatibility requirements, trademarks, existing agreements, and API copyright lawsuits (Oracle vs Google) Oracle is pretty much in full control where OpenJDK is headed. What does this mean for Android?

In short: there is a new sheriff in town. The app ecosystem is at the heart of every mobile OS. Its what made Android and iOS successful, and its what made Firefox OS struggle. The app ecosystem rests on the app stack, in Android’s case Harmony in the past, and going forward OpenJDK. In other words, Oracle has now at least one hand at the steering wheel as well.

Its anyone’s guess what Oracle will do with it, but Google and Oracle have a long history of not getting along, so its going to be quite curious to watch. Java itself aims to be a platform, and it is similarly vast in scope as Android. Java includes its own user interface (UI) library Swing, for example. Google has of course its own Android UI framework. Swing will now sit on every Android phone, using up resources. Its unlikely that Oracle will try to force Google to actually use Swing, but Google has to make sure it works and is present and apps can use it. And, Oracle can easily force Google to include pretty much any other code or service that pleases Oracle. How about Java Push Notifications, specified and operated by Oracle? All Oracle has to do is add it to OpenJDK, and it will make its way into Android. Google is now on Oracle’s Hamster wheel.

A rough year ahead

In the short term Google’s biggest challenge will be to rip out Harmony and replace it with OpenJDK. They actually have been working on this for a while. It seems this project started in secret already 11 months ago, and is now being merged into the public repository.

All this code and technology churn will have massive implications for Android at a tactical level. Literally millions of lines of code are changing, and the new OpenJDK implementation will often have subtly different correctness or performance behavior than the Harmony code Google used previously. Here you can see Google updating a test for a specific boundary condition in date handling. Harmony had a different behavior than Oracle’s OpenJDK, and the test had to be fixed.

The app ecosystem runs on top of these shifting sands. The Android app store has millions of apps that rely on the Java standard classes, and just as tests have to be fixed, apps will randomly break due to the subtle changes the OpenJDK transition brings. Breakage will not be limited to correctness. Performance variances will be even harder to track down. Past performance workarounds will be obsolete but it will be hard to tell which ones, and entirely new performance problems will pop up. Fun.

And of course, licensing is changing as you can see here. The core of Android’s ecosystem runtime is now powered by GPL2 library code, copyright Oracle.

I also have a very hard time imagining Android N coming out on time with this magnitude of change happening behind the scenes. Google is changing engines mid-flight. The top priority will be to not crash. They won’t have much time to worry about arriving on schedule.

The winner

No matter how you look at this, this is a huge victory for Oracle. Oracle never had much of a mobile game, and all the sudden Oracle gained a good amount of roadmap and technology influence over the most important mobile ecosystem by scale. Oracle is a mobile titan now. I didn’t see that one coming.

The losers

Google, and silicon vendors. The entire middle part of the Android stack will be subject to proprietary Oracle control. Google calls this “reduced fragmentation” in their press release. That’s true, kind of. There will be less fragmentation because Oracle will control anything Java, including Android.

Silicon vendors will be still allowed to do proprietary enhancements if they obtain the same library code under a difference (non-viral, non-GPL2) license from Oracle — for a fee. Oracle has actually already a history of up-selling OpenJDK. They are offering certain components of the Java VM only for royalty payments. You get a basic garbage collector for free. If you want the really good one, it’ll cost you. I expect Oracle to attempt to monetize in similar ways the billions of mobile users it just stumbled upon.