Summary

Recently I took over a Web push gateway, the main function is actually through the Socket.IO protocol to receive the connection of the front-end, and then push the message to the corresponding front-end, however, during the debugging, we meet a issue that cann’t connected successfully, and here we do a summary.

The version for the golang library used in this article is: https://github.com/googollee/go-socket.io,
and the node lib is: https://github.com/socketio/socket.io.

Reproduction Problem

The actual question is that when I used Postman(Yes, you can find that Postman support Websocket/Socket.io protocol now) to connect our gateway, and we found it keep in Connecting status, as the graph show:

Graph 1:Socket.IO keep in Connecting status

And the colleague found that if we remove the query param in the URL, and we can connect success:

Graph 2:connect success after remove query param

Problem Investigate

After I found that removing the query param allowed me to connect normally, I felt that it shouldn’t be my problem, but, out of rigour, I had to do it step by step. First, I needed to rule out that there was something wrong with my business code, so I wrote the simplest Hello World program using the corresponding version of the Golang Socket.IO library, and found that it reproduced the problem, OK! This eliminates the problem with my business code.

Then I looked at the old version of the project (1.0.1), and upgraded to the latest version (1.7.0), and found that the interfaces have changed (it’s worth complaining about this, how can you change the interfaces in the same major version?) But it doesn’t matter, just change the Hello World code, then try again with Postman, and found that it can’t be reproduced, which means that the latest version is normal.

In the same big version of the interface changes in the distrust, I tried to use the official native code to try to see how the standard protocol is implemented (Socket.IO official documentation is not clear related to the description), so I used the Node Socket.IO version 2.5.0, and found that the same can not be reproduced, that is, the normal connection, so we can finally conclude that it is IO version 2.5.0, and found that the same thing is not reproducible, i.e., it connects normally.

Problem Analysis

Since we know that there is a problem with the version of the code we are using, let’s trace the code and find out that the problem lies in the packet processing stage of Connect. The Server side seems to process the packet normally, but when it returns it to the Postman side, the Postman side does not think it is normal, and thus ignores the Connect response packet, thus the status of the connection stays at Connecting.

From the code, we can find the issue code is here

  1. [root@liqiang.io]# cat parser.go
  2. if next[0] == '/' {
  3. path, err := reader.ReadBytes(',')
  4. if err != nil && err != io.EOF {
  5. return err
  6. }
  7. pathLen := len(path)
  8. if pathLen == 0 {
  9. return fmt.Errorf("invalid packet")
  10. }
  11. if err == nil {
  12. path = path[:pathLen-1]
  13. }
  14. v.NSP = string(path)
  15. if err == io.EOF {
  16. return nil
  17. }
  18. }

When parsing Connect packets, Golang’s implementation uses the requested Path for its Namespace. Note that the Path is raw and unprocessed, so it comes with a query param, which means that it includes the query param as part of the Namespace; however, for standard Node implementations, the Namespace is the parsed URL Path with the query param removed. But for the standard Node implementation, the Namespace is the parsed URL Path with the query param removed, as seen here:

  1. [root@liqiang.io]# cat lib/client.js
  2. Client.prototype.ondecoded = function(packet) {
  3. if (parser.CONNECT == packet.type) {
  4. this.connect(url.parse(packet.nsp).pathname, url.parse(packet.nsp, true).query);
  5. } else {
  6. var socket = this.nsps[packet.nsp];
  7. if (socket) {
  8. process.nextTick(function() {
  9. socket.onpacket(packet);
  10. });
  11. } else {
  12. debug('no socket for namespace %s', packet.nsp);
  13. }
  14. }
  15. };

You can see that Namespace is resolved here as url.parse(packet.nsp).pathname, without Query Param. so here’s the difference between Golang and Node.

Some questions

The first question I have is that our front-end is able to connect normally with Query Param, so if the back-end has this problem, isn’t the front-end also able to handle it? We still have to look for the reason from the code: source, and the fact is that the Client section of the code also has a special logic to deal with this.

So why doesn’t Postman handle it correctly? This is because Postman’s source code is not open source, and I don’t see from the documentation whether it uses an open source library or its own implementation, but from the problem, it shouldn’t be able to handle Namespace correctly either. So I created a ticket to follow up.

Keywords