Browser Push Techniques: Pre-Medieval to Modern Day

Wed 18 Jan 2017 · 2245 words · 11 minute read

code

javascript · ajax · go

So you’re building a web application and you need to push data from the server to the browser. The good news is that you have some options. The bad news is that you have some options. The question of which technology to use here does not necessarily have an immediate and obvious answer. (TL;DR)

After struggling with the choices for browser push available to me on several new projects, I did a comparison and came to the conclusion described here. You may disagree with me, but I’ll outline my logic because it did address the concerns that I had and maybe it will help inform yours.

The truth is, there is no such thing as true “browser push”. The reason being that when using HTTP (version 1.1 or 2.0) browsers must connect to the server and not vice versa. However, in practice this is not a major limitation. Once the connection is established from browser to server, the server can decide when and how it will respond. And it can use this timing to effectively “push” information.

What to Consider

Before listing out the technologies, it’s worth briefly running through the major factors taken into account when judging them.

Among the most obvious concerns is browser support. If the browser of your end user does not support a technology, that’s clearly a deal breaker.

Once your browser supports a technology, then comes server support: how well does your back-end technology stack support it.

The simplicity of the implementation is also a significant factor. Solutions which can be implemented easily that do what you need are almost always better than something more complicated. Especially when supporting it moving forward.

Reliability is also a factor. Although if we assume the browser and the server support the technology in question, the reliability should not be radically different between these options. Recovering from timed-out connections and properly maintaining state is something that generally has to be dealt with at the application layer (i.e. by you as the developer), although there are certainly some differences between these approaches.

Integration with HTTP/2 is also a significant factor, which I’ll cover in more detail below.

Push Technologies

Now that we have the context, let’s roll through the available solutions. Note that the examples shown are trivial and error checking and some edge cases have been elided for clarity. The purpose of the examples is really to be clear about exactly what approach is being referred to.

The simplest approach is to not bother with “push” at all and simply fetch the latest information from the server every X seconds. This is unsophisticated and very simple. For things which change at a predictable rate, it may be an effective solution. If something needs to show “when it happens”, this is not a good approach. Polling faster increases load on the server unnecessarily, and polling slower will make the information show up later for the user.

The browser implementation in JS looks something like this:

Summary: Don’t use dumb polling unless it doesn’t matter if updates arrive late.

Since dumb polling isn’t push at all, the original “push” method was to simply request a resource from the server and have the server wait until something happens and respond with the information from the event.

This is a rather effective solution. Browser and server support are somewhat universal. The scalability of having thousands of client with connections to the server at the same time has a lot to do with how things are implemented on the server side. Luckily, this problem is not new and servers have been continually improving how they deal with this scenario.

The client might look like this:

And the server (all server-side examples shown are in Go):

In the code above, the server blocks until it gets whatever data the client is waiting for and then returns once it has it. The client makes an ajax call. When the call returns, if it’s successful, the page is updated accordingly. Regardless of whether or not the request succeeded or failed, it simply tries again. The cost of (potentially) reconnecting to the server each time can be a source of delay in HTTP/1.1.

Another common variation on this concept is to use a single response, but stream back to the client the updates one chunk at a time. HTTP’s “chunked” encoding mechanism lets you send back multiple discrete blocks of data for the same request, allowing the server to pause between chunks and wait for more data to arrive. Rather than covering that in more detail here, I’ll show an example of that kind of behavior in the next section.

It is also worth noting that implementing this properly on the server side does require some consideration regarding the number of concurrent connections. I’ve been writing servers in Go for a while and the use of “goroutines” mitigates this problem significantly. Node’s event-based/callback model can be used to good effect in a case like this. However many other popular environments such as PHP+Apache launch a separate process for each connection, which can be prohibitively expensive on server resources, particularly RAM. However, it’s also worth noting that this is not unique to long-polling, servers which support technologies like websockets are likely to have similar issues and require the same consideration.

Summary: Long-polling works reasonably well but is not very sophisticated.

Server-Sent Events are essentially a form of long-polling made easier and more consistent with the EventSource JS class.

MDN has a decent article on how to use EventSource, so I won’t rehash that here. But in essence EventSource takes care of reconnecting when the connection times out, reading chunked data from a response and providing you each distinct message, and if you provide an “id” field with each message it will send that back in a header with the next request, allowing the server to pick up where it left off.

It’s basically long-polling made easy and more standardized.

The client would look something like:

And on the server side the idea is that each event being pushed to the client has an ID. When the client connects, the first time it will not provide a previous ID, but reconnects will tell the server where to pick up from. This provides a simple and effective mechanism to keep track of the messages in your stream:

EventSource is supported by all major browsers except Internet Explorer/Edge, which can be polyfilled.

When using HTTP/1.1, creating an EventSource often results in a new HTTP connection. Subsequent timeouts and reconnects would require the overhead of connection establishment. However the picture changes when using HTTP/2, where requests and responses are multiplexed over a single TCP connection. This means that setup/teardown cost of a “connection” is much cheaper in HTTP/2 (intentionally and by design), thus mitigating one of the major drawbacks of the SSE approach.

Summary: If you are thinking of using long-polling, you probably want SSE/EventSource instead. SSE works well with HTTP/2 and transparently takes advantage of its lower overhead.

Bidirectional-streams Over Synchronous HTTP (BOSH) is a form of long-polling which was created for XMPP, i.e. to allow chat to occur over HTTP. It is essentially a form of long-polling with some extra trickery in there where it uses two connections, one to send data and one to receive and then the connections switch rolls as a (supposed) performance optimization.

Unless you are building a chat application with XMPP (and even then), it does not have an advantage over other simpler and more common techniques.

For the past few years, websockets have been the gold standard for bidirectional browser communication and pushing data to the browser.

Example client:

Example server:

On the client side, websockets are well supported these days. Server support may be a bit more difficult, depending on your environment. Unlike SSE, websockets commandeer the HTTP connection and speak an entirely different protocol over it. This can cause complications in some cases, or at least require more complexity to implement. It is also possible that some proxies or more obscure configurations may not support websockets. And some CDN configurations don’t work with websockets. That said, websockets are a perfectly viable option for many use cases. They also have a simple and well-supported mechanism for sending raw binary data, which can be useful (if your application needs it).

It is also worth noting that if you need the “Last-Event-ID” functionality that SSE provides, you’ll have to implement this manually on top of websockets.

One thing to take into account is that since websockets hijack the HTTP connection and have their own protocol, they are fundamentally incompatible with HTTP/2. In practice this is not a significant issue, however it does mean that as HTTP/2 moves forward and evolves, related improvements and code maturity will not benefit your websocket-backed application.

Summary: Websockets are a popular and viable mechanism. It supports binary data, whereas SSE does not. Websockets however do not have “Last-Event-ID” functionality. And they are fundamentally incompatible with HTTP/2, which may become more of an issue moving forward.

This unfortunately doesn’t do what it sounds like from its name.

The purpose of HTTP/2 “server push” is not to inform an already loaded web page of new information, but to send a file along with a response that it knows the client will need. For example the server, in response to a request for “index.html”, may send not only that page but also the CSS file that it knows the browser will request when rendering it (saving the browser the time of having to ask for it). As such, it doesn’t work as a replacement for the other technologies listed here. I’ve only listed it because when you see the name you think “oh, can that be used to push data from server to client?” No, it can’t. It’s just a somewhat crappily named feature (although useful for its intended purpose).

Theoretically one could use WebRTC to achieve browser push functionality as well. However, it is not designed for it. WebRTC addresses the needs of realtime audio/video as well as peer-to-peer browser communication, and sacrifices a lot of simplicity in order to solve these harder problems. This is aside from the fledgling browser support. So unless you really need the additional functionality, you almost certainly want to look elsewhere.

Libraries like socket.io and faye work by supporting multiple approaches - usually trying several in sequence when the page first loads and picking one that works. For many applications, this can be an effective solution to the problem - just let the library deal with it. However, the more disparate and varied your clients and servers are, this may grow to be a problem. Like any library, you can get stuck with it and the baggage it carries. If you are reasonably confident that socket.io or other such library will suit your needs for the lifetime of your application - and any new clients that may need to be built against your server, then it’s probably a good choice. But if you are unsure, you might find that choosing one of the other approaches provides a single mechanism to support, and might make integration with other clients in the future simpler.

In the various options above, the only viable one that is truly “bi-directional” is websockets. Either end of a websocket can send or receive at any time.

Contrast this with SSE or long-polling, where a request is issued solely so the server can wait until there is something to send back. In this sense, the response is just “push”.

However, my take on it is that while “bi-directional” and “push” are technically different, it doesn’t really make a difference for most real-world applications. If you want to send data to the server from the client, you simply make an AJAX request. If you’re using HTTP/2 you can do this with very little overhead (and often even in HTTP/1.1 you still have a connection open and idling). When you look at it like that, the benefits of having a truly bi-directional pipe are really not worth much. If you need to correlate requests and responses, you still have the exact same problem either way. You have to perform any authentication when the user connects (regardless of how the connection is done - websockets, SSE, any request is bound to have the same issue). So while “bi-directional” websockets may seem more enticing, my experience is that it’s not really any better than using a “push” technique like SSE, you still have the same problems to solve.

Conclusion

In short, I think that SSE/EventSource, especially in conjunction with HTTP/2, offers the best balance of simplicity, feature-completeness, performance and compatibility. The other major alternative is to use websockets, and that is definitely a viable approach as well. But it is also a separate protocol that does not work with HTTP/2, something to take into account as more and more applications upgrade.

The optimizations that have been addressed in HTTP/2, including connection multiplexing, smaller header sizes, and fixing the “block at head of line” problem (at least partially) are important parts of making the web faster. I think choosing a push technology that takes advantage of this, even though it is otherwise fairly old school, is advantageous.

Sometimes the tools of old still serve us best today.

And if you read all this and said “well, I use websockets, not that old SSE crap”, then hey that works too.

What to Consider

Push Technologies

Philosophical Sidebar: “Bi-Directional” vs “Push”

Conclusion