Eric / Brooklyn

Visual

Blog Index

Random 5

Stripe and Tap

Multi-tab Websockets

Ruby vs. Ruby

Cookies

Web Accounts

Emoji Bits

Honeypot Bots

Bad Content

JS Data Chutzpah

Sendgrid Ban

Clean URLs

Git

Concerns

Rails UJS

Extending Devise

ENV Variables

See All

Multi-tab issues with WebSockets

September 2020

Using ActionCable in Rails

The out-of-the-box solution for websockets in Rails is ActionCable. With ActionCable, we get these little Channels that allow us to send information directly to clients (i.e. users) without clients asking in the traditional way (an HTTP request).

Channels are meant to encapsulate some logical piece of your application, similar to how you might organize your Controllers.

This is all to say that we can have a lot of Channels — they are just a way of segmenting our WebSockets work.

A Status Channel

We have a real-time chat application, so messages can only be sent between people who are online.

To keep track of who’s online at any given moment, we have a StatusChannel.

For each channel, there’s a JavaScript component on the front-end (status_channel.js), and Ruby component on the back-end (status_channel.rb).

When a user arrives on our website, the JavaScript runs, and connects the user to the Status Channel (via consumer.subscriptions.create).

// status_channel.js

import consumer from "./consumer"

consumer.subscriptions.create("StatusChannel", {
	connected() {
	// Subscription is ready for use on the server
	},

	disconnected() {
	// Subscription has been terminated by the server
	},

	received(data) {
	// There's incoming data on the websocket
	}
});

Quick Sideote — Because we’re using Turbolinks, we don’t really need to worry about this code rerunning on each page load. (Turbolinks swaps body content when moving between pages, and does not disconnect/reconnect the Websocket)

Now that the user is connected to the channel on the front-end, we receive a notice on the backend, in the subscribed method in our status_channel.rb.

class StatusChannel < ApplicationCable::Channel
	# current_user has subscribed
	def subscribed
		stream_from 'StatusChannel'
		current_user.set_online
	end

	def unsubscribed
	# Any cleanup needed when channel is unsubscribed  	 
	end
end

Another quick side note: the current_user in subscribed is different from the Devise helper current_user, although they can refer to the same user (yes this is confusing). This one is created in connection.rb file for use specifically in our Channels.

Subscribed

When in the subscribed method, we’ll mark this user as online (current_user.set_online).

After marking the user as online (in our database, but could also be in Redis), we want to broadcast this information to all other online users.

We do this with something like:

class User < ApplicationRecord

	... 

	def set_online
		current_user.update(online: true)
		
		ActionCable.server.broadcast(
		  'StatusChannel',
		  action: 'connected',
		  user: current_user.username
		)
	end
end

This broadcast will be received by all users who are receiving streams from the StatusChannel — this is why we had the stream_from 'StatusChannel' inside the subscribed method above.

These broadcasts will be delivered to the received(data) function on each user’s frontend (we saw this earlier in status_channel.js).

This setup is somewhat similar to the default Rails example in their docs, if you want to take a look.

Multi-tabs

Since our app is SPA-like (single-page-app), we don’t expect a user to open multiple tabs or windows, but what happens if he does? Does he re-subscribe every time?

Unfortunately, yes he does. This is a pain to deal with.

As we saw before, the stream_from ‘StatusChannel’ says that we want this newly online user to receive updates from the StatusChannel — this allows us to broadcast to all online users when other users log on and log off.

So for every new tab a user opens, the subscribed method runs. This may or may not be a problem, depending on how we set up our set_online instance method.

For us, this isn’t much of a problem — we just make sure a user appears online to all the other users.

Problem

However, if a user closes one of his tabs, we have an issue. unsubscribed runs every time a tab (or window) is closed, so we run set_offline on the current_user.

The problem is — set_offline makes the user appear offline to all other users. If a user has 5 tabs open, and closes one tab, he still has 4 open. Obviously in this case we don’t want him to appear offline.

So, ultimately, before setting a user offline, we want to know if he has other connections open.

Wrong Solution #1

Before we set a user offline in unsubscribed, can we check to see if he has other tabs open?

Perhaps, something like:

def unsubscribed
	connections = ActionCable.server.connections
	.select{ |c| c.user == current_user }.count
	return if connections > 0
	
	current_user.set_offline
end

If we find there’s still a connection alive (connections > 0), we have a guard case that returns from the method before setting the user offline.

Unfortunately — this doesn’t work.

Calling ActionCable.server.connections is unpredictable, and often undercounts, because it’s limited to the thread it’s called in - more info here.

Wrong Solution #2

Since we’re using Redis, we can query the connections there, so we aren’t bound by the per-thread limitation. This is detailed here.

Redis won’t show us duplicate connections though — it will only tell us if the user in question is connected (once or a million times).

To get all the active connections, we can call Redis.new.pubsub('channels', 'action_cable/*'), and it will return an array that looks something like this:

[
"action_cable/Z2lkOi8vYm91he93jiu3j5IvMQ", 
"action_cable/kf9fjskjdf91bmNlL1VzZXIvMQ",
"action_cable/LfJ84jldgfjsd9898798ZXIvMQ"
]

Each of these items represents a connected user. The values are encoded, so we’ll have to decode them.

Redis.new.pubsub('channels', 'action_cable/*')
.map { |c| Base64.decode64(c.split('/').last).split('/').last.to_i }

We map through the array, and for each connection, we split the encoded string (c) after the / and grab the last part.

This will give us something like “Z2lkOi8vYm91he93jiu3j5IvMQ”.

Then we decode that part (Base64.decode64), which will yield something like:

"gid://bounce/User/1"

Now we split this part at each /, and grab the last one, which represents the user_id.

All that’s left is converting this string to an integer, so we can find this user in the database.

So let’s get back to the unsubscribed method:

def unsubscribed
	connections = Redis.new.pubsub('channels', 'action_cable/*')
		.map do |c| 
			Base64.decode64(c.split('/').last)
			.split('/').last.to_i 
		end
	return if connections.include? current_user.id
	
	current_user.set_offline
end

Similar setup as the previous wrong solution — we check if connections array includes the current_user. If we find one, we use a guard case to return from the method before setting the user offline.

This seems promising, but there’s a catch.

The Redis connection for the unsubscribed user is still alive while in the unsubscribed method.

In other words — you’re using the website and only have 1 tab open. You close that tab. The unsubscribed method fires. We check to see if you’re still online using this Redis query. It says you are still online!

Wrong Solution #3 (Slight variation of #2)

Well, it seems that the Redis connection isn’t severed until after the unsubscribed method runs.

The good news is that we have an after_unsubscribe callback available to us in the ApplicationCable::Channel class.

So maybe, if we wait until after the unsubscribe method runs, our Redis query will give us accurate results (i.e. that the connection has been disconnected).

This will look something like:

class StatusChannel < ApplicationCable::Channel
  after_unsubscribe :handle_offline

  def subscribed
    # subscribed stuff here
  end

  def unsubscribed
  end

  private

  def handle_offline
		connections = Redis.new.pubsub('channels', 'action_cable/*')
			.map do |c|
				Base64.decode64(c.split('/').last)
				.split('/').last.to_i 
			end
		return if connections.include? current_user.id
		
		current_user.set_offline
  end

end

The bad news is that the Redis connection is still not severed in the after_unsubscribed callback.

Only after the after_unsubscribed method runs does the Redis connection disappear.

I don’t really understand why this is the case, but my best guess is that Redis doesn’t disconnect the connection until Rails is done doing it’s thing, and in the after_unsubscribe method, Rails is still doing its thing. (Again, this is a unsubstantiated guess)

Wrong Solution #4

We’ve had some bad luck trying to query the WebSocket connections. Either they’ve been inaccurate, or unrepresentative of the current state of things.

What if we decided, instead, to keep track of the number of connections manually?

One idea is to add a connection_count attribute to the User model with a default value of 0.

Every time a user hits subscribed in the StatusChannel, we add 1 to the connection count.

When he unsubscribes, we remove 1 from the connection_count,

Then, if his connection_count is positive (i.e. not zero), we can infer that he still has multiple connections. And if he has other open tabs, don’t set him offline. If the connection_count was 0, set him offline.

While this seems like a decent idea, it can break down in practice, and when it breaks down, it’s a major problem.

If for whatever reason, the connection_count tally gets miscalculated, and it’s off by as little as 1, that user will never be reliably marked online or offline.

It could get miscalculated for a variety of reasons:

  • if the server crashes, and unsubscribed never runs (and doesn’t subtract each of those connections)
  • if a user’s browser crashes and restarts, refreshing multiple tabs at once…the database will be reading and writing from the same attribute of the same record (user.connection_count) in multiple tabs, potentially causing a miscalculation.

Wrong Solution #5

Let’s go back to our idea of putting the Redis query into the after_unsubscribe method.

What if we set a delayed job in there, and waited a few seconds before checking Redis? The theory is that the Rails would be done closing the connection, allowing Redis to close the connection, and when the job runs (let’s say 5 seconds later), things should be settled.

class StatusChannel < ApplicationCable::Channel
  after_unsubscribe :handle_offline

  def subscribed
    # subscribed stuff here
  end

  def unsubscribed
  end

  private

  def handle_offline
	  CheckRedisConnectionsJob
	  .set(wait_until: Time.zone.now + 5)
	  .perform_later(current_user)
  end

end
class CheckRedisConnectionsJob < ApplicationJob
  queue_as :default

  def perform(user)
		connections = Redis.new.pubsub('channels', 'action_cable/*')
			.map do |c|
				Base64.decode64(c.split('/').last)
				.split('/').last.to_i 
			end
		return if connections.include? user.id
		
		user.set_offline
  end

end

This works okay, but it isn’t ideal.

For starters, the 5 second delay is arbitrary — what if the connection doesn’t close in time? It should work most of the time, but isn’t guaranteed.

Also, and more importantly, we don’t always get reliable information from the Redis query in a variety of edge cases, usually around multiple page refreshes, or server restarts.

It also seems to go against the pub/sub philosophy — that your pub/sub tool shouldn’t really keep track of the connections; it’s main purpose it to publish to them.

If you want to keep track of connections, it’s your application’s responsibility. Here’s a potential setup laid out in pseudo-code.

So, unfortunately, it seems all of these attempts at capturing the connections was misguided.

Solution # 6 (Current winner as of Sept 2020)

This is the current solution for now. It seems to work reliably, but if you (yes you the person reading this) can offer a better 7th solution, please email me :)

First, let’s create a separate channel called Presence. It doesn’t do much.

class PresenceChannel < ApplicationCable::Channel
  def subscribed
    stream_from 'PresenceChannel'
    stream_for current_user
  end

  def unsubscribed
  end
end

In our StatusChannel, we have a similar setup as in our previous failed solutions — we call a job in an after_unsubscribe method that waits 5 seconds.

class StatusChannel < ApplicationCable::Channel

  after_unsubscribe :handle_offline

  def subscribed
    # subscribed stuff
  end

  def unsubscribed

  end

  private

  def handle_offline
    HandleOfflineJob
    .set(wait_until: Time.zone.now + 5)
    .perform_later(current_user)
  end

end

In this job, we check if the user is still_connected?. This is a new instance method we haven’t seen yet. Note that we queue the job as critical so if jumps in front of less important stuff, if need be.

class HandleOfflineJob < ApplicationJob
  queue_as :critical

  def perform(user)

    return if user.still_connected? # might have other tabs open

    user.set_offline
    
  end
end

Here’s our still_connected? method.

class User < ApplicationRecord
	
	... 
	
	# lots of stuff
	
	...

  def still_connected?

    still_there = PresenceChannel
    .broadcast_to(self, action: 'presence-check')

    return true if still_there.is_a?(Integer) 
    && still_there.positive?

    false
  end
end

So, after waiting 5 seconds, we try to broadcast_to the user who triggered the unsubscribed method in the first place.

Broadcasting to an individual user is available to us, because in our PresenceChannel above, we set stream_for current_user.

The action of the PresenceChannel broadcast is arbitrary (action: 'presence-check') — it just seems that an argument is required, in addition to the destination (in our case self, which is just the user who closed his browser).

Sending a broadcast to an individual user returns the status of that broadcast. I haven’t found clear documentation of why and what is returned, but in my experiments, here’s how it seems to work:

  • If a broadcast is received by 0 users, it returns 0.
  • If a broadcast is received by 1 user, it returns 1.
  • If a broadcast is received by more than 1 user, it returns 2.
  • If a broadcast fails, it returns nil.

What that means for our application:

  • If a user has fully gone offline (0 tabs open), we should expect to receive a 0 when trying to broadcast to him.
  • If a user has 1 tab still open, we should expect to receive a 1 when broadcasting.
  • If a user has 2 or more tabs open, we should expect to receive a 2.

To recap: our still_connected? method tries to broadcast to the user who’s WebSocket just fired the unsubscribed method. If the broadcast is received by that user, we infer that he still has tabs open!

Inside the still_connected? method, we first check that the broadcast returns an integer, because we don’t want a nil value to break our method.

Then we see if the response is positive. If we get a 1 or 2 (both positive), then the user still has a tab (or more) still open somewhere, and is receiving the broadcast.

If we get a 0 (not positive), there are no tabs open to receive broadcasts, and we don’t trigger the guard case, allowing us to safely set the user offline.

Celebrate?

Again, I’m not thrilled with the 5 second delay, as it seems somewhat hacky, but so far in testing it seems to work just fine. If this is the last sentence you are reading, I haven’t found a better solution, or I just forgot to update this blog post :)