Search works differently than you may think

Search is the main way we all navigate the Web, but it works very differently than you may think. In this blog post I will try to explain how it worked in the past, why it works differently today and what role you play in the process.

The services you use for searching, like Google, Yahoo and Bing, are called a search engines. The very name suggests that they go through a huge index of Web pages to find every one that contains the words you are searching for. 20 years ago search engines indeed worked this way. They would “crawl” the Web and index it, making the content available for text searches.

As the Web grew larger, searches would often find the same word or phrase on more and more pages. This was starting to make search results less and less useful because humans don’t like to read through huge lists to manually find the page that best matches their search. A search for the word “door” on Google, for example, gives you more than 1.9 billion results. It’s impractical — even impossible — for anyone look through all of them to find the most relevant page.

search-1

Google finds about 1.9 billion results for the search query “door”.

To help navigate the ever growing Web, search engines introduced algorithms to rank results by their relevance. In 1996, two Stanford graduate students, Larry Page and Sergey Brin, discovered a way to use the information available on the Web itself to rank results. They called it PageRank.

Pages on the Web are connected by links. Each link contains anchor text that explains to readers why they should follow the link. The link itself points to another page that the author of the source page felt was relevant to the anchor text. Page and Brin discovered that they could rank results by analyzing the incoming links to a page and treating each one as a vote for its quality. A result is more likely to be relevant if many links point to it using anchor text that is similar to the search terms. Page and Brin founded a search engine company in 1998 to commercialize the idea: Google.

PageRank worked so well that it completely changed the way people interact with search results. Because PageRank correctly offered the most relevant results at the top of the page, users started to pay less attention to anything below that. This also meant that pages that didn’t appear on top of the results page essentially started to become “invisible”: users stopped finding and visiting them.

To experience the “invisible Web” for yourself, head over to Google and try to look through more than just the first page of results. So few users ever wander beyond the first page that Google doesn’t even bother displaying all the 1.9 billion search results it claims to have found for “door.” Instead, the list just stops at page 63, about a 100 million pages short of what you would have expected.

Despite reporting over 1.9 billion results, in reality Google’s search results for “door” are quite finite and end at page 63.

With publishers and online commerce sites competing for that small number of top search results, a new business was born: search engine optimization (or SEO). There are many different methods of SEO, but the principal goal is to game the PageRank algorithm in your favor by increasing the number of incoming links to your own page and tuning the anchor text. With sites competing for visitors — and billions in online revenue at stake — PageRank eventually lost this arms race. Today, links and anchor text are no longer useful to determine the most relevant results and, as a result, the importance of PageRank has dramatically decreased.

Search engines have since evolved to use machine learning to rank results. People perform 1.2 trillion searches a year on Google alone  — that’s about 3 billion a day and 40,000 a second. Each search becomes part of this massive query stream as the search engine simultaneously “sees” what billions of people are searching for all over the world. For each search, it offers a range of results and remembers which one you considered most relevant. It then uses these past searches to learn what’s most relevant to the average user to provide the most relevant results for future searches.

Machine learning has made text search all but obsolete. Search engines can answer 90% or so of searches by looking at previous search terms and results. They no longer search the Web in most cases — they instead search past searches and respond based on the preferred result of previous users.

This shift from PageRank to machine learning also changed your role in the process. Without your searches — and your choice of results — a search engine couldn’t learn and provide future answers to others. Every time you use a search engine, the search engine uses you to rank its results on a massive scale. That makes you its most important asset.

WebVR is coming to Firefox Nightly

In 2014 Mozilla started working on adding VR capabilities to the Web. Our VR team proposed a number of new Web APIs and made an experimental VR build of Firefox available that supports rendering VR content using the Web to Oculus Rift headsets.

Consumer VR products are still in a nascent state, but clearly there is great promise for this technology. We have enough confidence in the new APIs we have proposed that we are today taking the step of integrating them into our regular nightly Firefox builds. Head over to MozVR for all the details, and if you own an Oculus Rift headset or mobile VR-capable hardware we support, give it a spin!

 

It takes many to build the Web we want

Mozilla is announcing today the creation of a WebRTC competency center jointly with Telenor.

Mozilla’s purpose is to build the Web. We do so by building Firefox and Firefox OS. The Web is pretty unusual when it comes to interoperable technology stacks, because it is not built by standards bodies. Instead, the Web is built by browser vendors that implement browsers that implement the Web, which in the end pretty much defines what the Web is.

The Web adds new technologies whenever a majority of browser vendors agree to extend it in an interoperable way. Standards bodies merely help coordinating this process. Very rarely do new Web capabilities originate in a standards body. New Web capabilities merely end up there eventually, once there is sufficient interest by multiple browser vendors to warrant standardization.

Mozilla doesn’t — and can’t — build the Web alone. What makes the Web unique is that it is owned by no-one, and cannot be held back by anyone. It doesn’t take unanimous consent to extend the Web. A mere majority of browser vendors can popularize a new Web capability, forcing the rest of the browser vendors to eventually come along.

While several browser vendors build the Web, Mozilla has a unique vision for the Web that is driven by our mission as a non-profit foundation. Whereas all other browser vendors are for-profit corporations, advancing the Web in the interest of their shareholders, Mozilla advances the Web for users.

The primary browser vendors today are Google, Apple, Microsoft and Mozilla. These four organizations have a direct path to bring new technologies to the Web. While many other technology companies have a strong interest in the Web, they lack the ability to directly move the Web ahead because only these four browser vendors develop a rendering engine that implements the Web stack.

There is one more aspect that sets Mozilla apart from its browser vendor competitors. We are several orders of magnitude smaller than our peers. While this might appear as a market disadvantage at first, combined with our neutral and non-profit status it actually creates a unique opportunity. Many more technology companies have an interest in working on the Web, but if you aren’t Google, Apple, or Microsoft its very difficult to contribute core technologies to the Web. These three companies have direct control over a rendering engine. No other technology company can equally influence the Web. Mozilla is looking to change that.

Jointly with Telenor we are launching a new initiative that will allow parties with a strong technology interest in WebRTC to participate as an equal in the development process of the WebRTC standard. Since standards are really just a result of delivering new Web technologies in a rendering engine, Telenor will assign Telenor engineering staff to work on Mozilla’s implementation of WebRTC in Firefox and Firefox OS.

The goal of this new center is to implement WebRTC with a broad, neutral vision that captures the technology needs of many, not just the technology needs of individual browser vendors.

Mozilla is an open source project where every opinion and technical contribution matters. The WebRTC Competency Center will accelerate the development of WebRTC, and ensure that WebRTC serves the diverse technology interests of many. If you would like to see WebRTC (or any other part of the Web) grow capabilities that are important to you, join us.

Yahoo and Mozilla Form Strategic Partnership

SUNNYVALE, Calif. and MOUNTAIN VIEW, Calif., Wednesday, November 19, 2014 – Yahoo Inc. (NASDAQ: YHOO) and Mozilla Corporation today announced a strategic five-year partnership that makes Yahoo the default search experience for Firefox in the United States on mobile and desktop. The agreement also provides a framework for exploring future product integrations and distribution opportunities to other markets.

The deal represents the most significant partnership for Yahoo in five years. As part of this partnership, Yahoo will introduce an enhanced search experience for U.S. Firefox users which is scheduled to launch in December 2014. It features a clean, modern and immersive design that reflects input from the Mozilla team.

“We’re thrilled to partner with Mozilla. Mozilla is an inspirational industry leader who puts users first and focuses on building forward-leaning, compelling experiences. We’re so proud that they’ve chosen us as their long-term partner in search, and I can’t wait to see what innovations we build together,” said Marissa Mayer, Yahoo CEO. “At Yahoo, we believe deeply in search – it’s an area of investment, opportunity and growth for us. This partnership helps to expand our reach in search and also gives us an opportunity to work closely with Mozilla to find ways to innovate more broadly in search, communications, and digital content.”

“Search is a core part of the online experience for everyone, with Firefox users alone searching the Web more than 100 billion times per year globally,” said Chris Beard, Mozilla CEO. “Our new search strategy doubles down on our commitment to make Firefox a browser for everyone, with more choice and opportunity for innovation. We are excited to partner with Yahoo to bring a new, re-imagined Yahoo search experience to Firefox users in the U.S. featuring the best of the Web, and to explore new innovative search and content experiences together.”

To learn more about this, please visit the Yahoo Corporate Tumblr and the Mozilla blog.

About Yahoo

Yahoo is focused on making the world’s daily habits inspiring and entertaining. By creating highly personalized experiences for our users, we keep people connected to what matters most to them, across devices and around the world. In turn, we create value for advertisers by connecting them with the audiences that build their businesses. Yahoo is headquartered in Sunnyvale, California, and has offices located throughout the Americas, Asia Pacific (APAC) and the Europe, Middle East and Africa (EMEA) regions. For more information, visit the pressroom (pressroom.yahoo.net) or the Company’s blog (yahoo.tumblr.com).

About Mozilla

Mozilla has been a pioneer and advocate for the Web for more than a decade. We create and promote open standards that enable innovation and advance the Web as a platform for all. Today, hundreds of millions of people worldwide use Mozilla Firefox to discover, experience and connect to the Web on computers, tablets and mobile phones. For more information please visit https://www.mozilla.com/press

Yahoo is registered trademark of Yahoo! Inc. All other names are trademarks and/or registered trademarks of their respective owners.

Firefox and Cisco’s Project Squared

Yesterday I was at Cisco’s Collaboration Summit where Cisco’s CTO for Collaboration Jonathan Rosenberg and I showed Cisco’s new WebRTC-based Project Squared collaboration service running in Firefox, talking to a Cisco Collaboration Desktop endpoint without requiring transcoding.

This demo is the culmination of a year long collaboration between Cisco and Mozilla in the WebRTC space. WebRTC enables voice and video communication directly from within the browser. This means that anyone can build a video conferencing service just using WebRTC and HTML5 standards, without the need for the user to download a plugin or a native application.

Cisco is not only developing WebRTC-based services that run on the Web. They have  also joined a growing number of organizations and companies helping Mozilla to build a better Web. Over the last year Cisco has contributed numerous technical improvements to Mozilla’s WebRTC implementation, including support for screen sharing and the H.264 video codec. These features are now shipping in Firefox. We intend to use them in the future in Mozilla’s own Hello communication service that we are bringing to Firefox.

Cisco’s contributions to the Web go beyond just advancing Firefox. For the last three years the IETF, the standards body defining the networking protocols for WebRTC, has been unable to agree on a mandatory video codec for WebRTC, putting ubiquitous interoperability in doubt.

One of the major blockers to coming to a consensus was that H.264 is subject to royalty-bearing patents, which made it problematic for open source projects such as Firefox to deploy it. To break this logjam, Cisco open-sourced its H.264 code base and made it available in plugin form. Any product  — not just Firefox — can download the plugin and use it to enable H.264 without paying any royalties.

This collaboration between Mozilla and Cisco enabled Firefox to add support for H.264 in WebRTC, and also played a significant role in the compromise reached at the last IETF meeting to adopt both H.264 and VP8 as mandatory video codecs for WebRTC in browsers. As a result of this compromise, in the future all browsers should match the capabilities already available in Firefox.

Mozilla will continue to work on advancing Firefox and the Web, and we are excited to have strong partners like Cisco who share our commitment to the open Web as a shared technology platform.

Let’s Encrypt: One more step on the road to TLS Everywhere

Principle 4 of the Mozilla Manifesto states: Individuals’ security and privacy on the Internet are fundamental and must not be treated as optional.

Unfortunately treating user security as optional is exactly what happens when sites let users connect over insecure HTTP rather than HTTP over TLS (HTTPS). What insecure means here is that your network traffic is totally unprotected and can be read and/or modified by anyone who shares a network with you, including random people sharing Starbucks or airport WiFi.

One of the biggest reasons that web sites don’t deploy TLS is the requirement to get a digital certificate — a cryptographic credential which allows a user’s browser to know it’s talking to the right site and not to an attacker. Certificates are issued by Certificate Authorities (CAs) often using a clumsy and error-prone manual process. A further disincentive to deployment is that  most CAs charge a fee for their certificates, which not only prices some people out of the market but also interferes with automatic issuance and renewal.

Mozilla, along with our partners Akamai, Cisco, EFF, and Identrust decided to do something about this situation. Together, we’ve formed a new consortium, the Internet Security Research Group, which is starting Let’s Encrypt, a new certificate authority designed to bring security to everyone. Let’s Encrypt is built around a few key principles:

  • Free: Certificates will be offered at no cost.
  • Automatic: Certificates will be issued via a public and published API, allowing Web server software to automatically obtain new certificates at installation time and without manual intervention.
  • Independent: No piece of infrastructure this important should be controlled by a single company. ISRG, the parent entity of Let’s Encrypt, is governed by a board drawn from industry, academia, and nonprofits, ensuring that it will be operated in the public interest.
  • Open: Let’s Encrypt will be publishing its source code and protocols, as well as submitting the protocols for standardization so that server software as well as other CAs can take advantage of them.

Let’s Encrypt will be issuing its first real certificates in Q2 2015. In the meantime, we have published some initial protocol drafts along with a demonstration client and server at: https://github.com/letsencrypt/node-acme and https://github.com/letsencrypt/heroku-acme. These are functional today and can be used to issue test certificates.

It’s been a long road getting here and we’re not done yet, but this is an important step towards a world with TLS Everywhere.

VP8 and H.264 to both become mandatory for WebRTC

WebRTC is one of the most exciting things to happen to the Web in years: it has the potential to bring instant voice and video calling to anyone with a browser, finally unshackling us from proprietary plugins and installed apps. Firefox, Chrome, and Opera already support WebRTC, and Microsoft recently announced future support.

Unfortunately, the full potential of the WebRTC ecosystem has been held back by a long-running disagreement about which video codec should be mandatory to implement. The mandatory to implement audio codecs were chosen over two years ago with relatively little contention: the legacy codec G.711 and Opus, an advanced codec co-designed by Mozilla engineers. The IETF RTCWEB Working Group has been deadlocked for years over whether to pick VP8 or H.264 for the video side.

Both codecs have merits. On the one hand, VP8 can be deployed without having to pay patent royalties. On the other hand, H.264 has a huge installed base in existing systems and hardware. That is why we worked with Cisco to develop their free OpenH264 plugin and as of October this year, Firefox supports both H.264 and VP8 for WebRTC.

At the last IETF meeting in Hawaii the RTCWEB working group reached strong consensus to follow in our footsteps and make support for both H.264 and VP8 mandatory for browsers. This compromises was put forward by Mozilla, Cisco and Google. The details are a little bit complicated, but here’s the executive summary:

  • Browsers will be required to support both H.264 and VP8 for WebRTC.
  • Non-browser WebRTC endpoints will be required to support both H.264 and VP8. However, if either codec becomes definitely royalty free (with no outstanding credible non-RF patent claims) then endpoints will only have to do that codec.
  • “WebRTC-compatible” endpoints will be allowed to do either codec, both, or neither.

See the complete proposal by Mozilla Principal Engineer Adam Roach here. There are still a few procedural issues to resolve, but given the level of support in the room, things are looking good.

We believe that this compromise is the best thing for the Web at this time: It lets us move forward with confidence in WebRTC interoperability and allows people who for some reason or another really can’t do one of these two codecs to be “WebRTC-compatible” and know they can interoperate with any WebRTC endpoint. This is an unmitigated win for users and Web application developers, as it provides broad interoperability within the WebRTC ecosystem.

It also puts a stake in the ground that what the community really needs is a codec that everyone agrees is royalty-free, and provides a continuing incentive for proponents of each codec to work towards this target.

Mozilla has been working for some time on such a new video codec which tries to avoid the patent thickets around current codec designs while surpassing the quality of the latest royalty-bearing codecs. We hope to contribute this technology to an IETF standardization effort following the same successful pattern as with Opus.