net: implement poll #14336

pull pstratem wants to merge 5 commits into bitcoin:master from pstratem:2018-09-24-socket-handler-poll changing 5 files +172 −43
  1. pstratem commented at 10:37 PM on September 26, 2018: contributor

    Implement poll() on systems which support it properly.

    This eliminates the restriction on maximum socket descriptor number.

  2. fanquake added the label P2P on Sep 26, 2018
  3. pstratem force-pushed on Sep 27, 2018
  4. pstratem force-pushed on Sep 27, 2018
  5. pstratem force-pushed on Sep 27, 2018
  6. DrahtBot commented at 2:37 AM on September 27, 2018: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    No conflicts as of last run.

  7. pstratem force-pushed on Sep 27, 2018
  8. DrahtBot cross-referenced this on Sep 27, 2018 from issue net: Refactor message parsing (CNetMessage), adds flexibility by jonasschnelli
  9. DrahtBot cross-referenced this on Sep 27, 2018 from issue Add p2p layer encryption with ECDH/ChaCha20Poly1305 by jonasschnelli
  10. pstratem force-pushed on Sep 27, 2018
  11. pstratem force-pushed on Sep 27, 2018
  12. pstratem force-pushed on Sep 28, 2018
  13. pstratem force-pushed on Sep 28, 2018
  14. pstratem force-pushed on Sep 29, 2018
  15. pstratem commented at 12:51 AM on September 29, 2018: contributor

    It seems that lots of the functional tests have races cause this pull request keeps triggering random seeming failures.

  16. in src/net.cpp:1392 in 7fd976b2f0 outdated
    1517 | +    if (interruptNet)
    1518 | +        return;
    1519 | +
    1520 | +    std::vector<struct pollfd> pollfds;
    1521 | +    struct pollfd pollfd;
    1522 | +    memset(&pollfd, 0, sizeof(struct pollfd));
    


    practicalswift commented at 7:18 AM on September 29, 2018:

    struct pollfd pollfd = {} instead of memset?


    pstratem commented at 3:50 AM on November 11, 2018:

    done

  17. in src/net.cpp:1470 in 7fd976b2f0 outdated
    1605 | -            for (CNode* pnode : vNodesCopy)
    1606 | -                pnode->AddRef();
    1607 | +            LOCK(pnode->cs_hSocket);
    1608 | +            if (pnode->hSocket == INVALID_SOCKET)
    1609 | +                continue;
    1610 | +            recvSet = recv_set.count(pnode->hSocket);
    


    practicalswift commented at 7:20 AM on September 29, 2018:

    recvSet = recv_set.count(pnode->hSocket) > 0 to avoid implicit conversion from unsigned long to bool. Applies also to the following to lines. Explicit is better than implicit :-)


    pstratem commented at 3:50 AM on November 11, 2018:

    done

  18. in src/net.cpp:1382 in 7fd976b2f0 outdated
    1507 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    1508 | +{
    1509 | +    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1510 | +    GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    1511 | +
    1512 | +    if (recv_select_set.empty() && send_select_set.empty() and error_select_set.empty()) {
    


    practicalswift commented at 7:22 AM on September 29, 2018:

    Use && instead of and :-)


    pstratem commented at 3:50 AM on November 11, 2018:

    done

  19. in src/net.cpp:1424 in 7fd976b2f0 outdated
    1540 | +    }
    1541 | +
    1542 | +    if(poll(&pollfds[0], pollfds.size(), 50) < 0)
    1543 | +        LogPrint(BCLog::NET, "poll failure %d %s\n", errno, strerror(errno));
    1544 | +
    1545 | +    for (struct pollfd pollfd : pollfds) {
    


    practicalswift commented at 7:23 AM on September 29, 2018:

    Use another variable name: pollfd shadows the already existing local variable with the same name :-)


    pstratem commented at 3:50 AM on November 11, 2018:

    done

  20. in src/net.cpp:1326 in d9ae35592c outdated
    1346 | @@ -1347,8 +1347,6 @@ void CConnman::SocketHandler()
    1347 |  
    1348 |      int nSelect = select(have_fds ? hSocketMax + 1 : 0,
    1349 |                           &fdsetRecv, &fdsetSend, &fdsetError, &timeout);
    1350 | -    if (interruptNet)
    


    jimpo commented at 11:16 PM on September 29, 2018:

    Move this into SocketHandler after the SocketEvents call? Or this method cool return a bool, true if it completes, false if it is interrupted.

    NOTE: this was fixed in the commit maybe fix things?

  21. in src/net.cpp:1393 in 5ab601601d outdated
    1380 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    1381 | +{
    1382 | +    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1383 | +    GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    1384 | +
    1385 | +    std::vector<struct pollfd> pollfds;
    


    jimpo commented at 11:25 PM on September 29, 2018:

    pollfds.reserve(recv_select_set.size() + send_select_set.size() + error_select_set.size())

  22. in src/net.cpp:1424 in 5ab601601d outdated
    1402 | +        pollfd.fd = hSocket;
    1403 | +        pollfd.events = POLLERR|POLLHUP;
    1404 | +        pollfds.push_back(pollfd);
    1405 | +    }
    1406 | +
    1407 | +    if(poll(&pollfds[0], pollfds.size(), 50) < 0)
    


    jimpo commented at 11:26 PM on September 29, 2018:

    nit: vectors now have .data(), which returns &pollfds[0] in a more object-oriented way.


    jimpo commented at 11:31 PM on September 29, 2018:

    nit: Make the 50 a named constant along with timeout.tv_usec in the select impl.

  23. in src/net.cpp:1331 in 7fd976b2f0 outdated
    1327 |  
    1328 |      std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1329 |      GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    1330 |  
    1331 | +    if (recv_select_set.empty() && send_select_set.empty() && error_select_set.empty()) {
    1332 | +        interruptNet.sleep_for(std::chrono::milliseconds(50));
    


    jimpo commented at 11:34 PM on September 29, 2018:

    nit: Replace 50 with timeout.tv_usec/1000

  24. in src/net.cpp:1406 in 7fd976b2f0 outdated
    1365 | -                FD_SET(i, &fdsetRecv);
    1366 | -        }
    1367 | +        int nErr = WSAGetLastError();
    1368 | +        LogPrintf("socket select error %s\n", NetworkErrorString(nErr));
    1369 | +        for (unsigned int i = 0; i <= hSocketMax; i++)
    1370 | +            FD_SET(i, &fdsetRecv);
    


    jimpo commented at 11:37 PM on September 29, 2018:

    I see this is the same as the code currently, but it seems like this should be fdsetError instead of fdsetRecv, even though they are handled exactly the same.


    pstratem commented at 5:35 PM on October 1, 2018:

    I don't think it should be there at all really. select() failures dont mean the sockets themselves have failed, but i guess it could mean we made a mistake and called select() with a closed socket?


    jimpo commented at 9:53 PM on October 2, 2018:

    Yeah, then iterating through each one and calling recv would identify which peer to disconnect. Though, if it's fdsetError and not fdsetRecv, it probably shouldn't disconnect when nBytes == 0.

  25. jimpo commented at 11:39 PM on September 29, 2018: contributor

    Concept ACK

  26. pstratem force-pushed on Oct 1, 2018
  27. pstratem force-pushed on Oct 1, 2018
  28. pstratem force-pushed on Oct 1, 2018
  29. laanwj commented at 7:37 PM on October 2, 2018: member

    Concept ACK, will review in detail when my brain works again.

  30. pstratem force-pushed on Oct 2, 2018
  31. theuni commented at 8:19 PM on October 2, 2018: member

    Concept ACK. Will also review shortly.

  32. jnewbery cross-referenced this on Oct 8, 2018 from issue [tests] Remove rpc_zmq.py by jnewbery
  33. pstratem force-pushed on Oct 16, 2018
  34. pstratem force-pushed on Oct 16, 2018
  35. in src/net.cpp:1386 in 5f99a7861c outdated
    1377 | @@ -1377,6 +1378,58 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
    1378 |          if(FD_ISSET(hSocket, &fdsetError))
    1379 |              error_set.insert(hSocket);
    1380 |  }
    1381 | +#else
    1382 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    1383 | +{
    1384 | +    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1385 | +    GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    


    jamesob commented at 8:10 PM on October 16, 2018:

    Could be worth having GenerateSelectSet return a bool that indicates whether or not the sets have any content so that we can avoid the .empty() checks below (and the repetition necessary for the legacy select impl).

  36. in src/net.cpp:1413 in 5f99a7861c outdated
    1407 | +            pollfds.push_back(pollfd);
    1408 | +        }
    1409 | +
    1410 | +        for (SOCKET hSocket : error_select_set) {
    1411 | +            pollfd.fd = hSocket;
    1412 | +            pollfd.events = POLLERR|POLLHUP;
    


    jamesob commented at 8:13 PM on October 16, 2018:

    This is ignored and reported on revents in any case, no? I guess no harm in being explicit.


    pstratem commented at 11:55 PM on October 18, 2018:

    I'm reusing the struct pollfd so it's necessary to overwrite the previous value.

  37. in src/net.cpp:1417 in 5f99a7861c outdated
    1412 | +            pollfd.events = POLLERR|POLLHUP;
    1413 | +            pollfds.push_back(pollfd);
    1414 | +        }
    1415 | +    }
    1416 | +
    1417 | +    if(poll(pollfds.data(), pollfds.size(), 50) < 0)
    


    jamesob commented at 8:19 PM on October 16, 2018:

    Should we make use of SELECT_TIMEOUT_MILLISECONDS here?


    jamesob commented at 8:19 PM on October 16, 2018:

    nit: space after if + braces


    pstratem commented at 11:55 PM on October 18, 2018:

    yeah

  38. in src/net.cpp:1311 in 5f99a7861c outdated
    1307 | @@ -1308,6 +1308,7 @@ void CConnman::GenerateSelectSet(std::set<SOCKET> &recv_set, std::set<SOCKET> &s
    1308 |      }
    1309 |  }
    1310 |  
    1311 | +#ifdef WIN32
    


    jamesob commented at 8:26 PM on October 16, 2018:

    Apparently poll has been flaky on previous versions of MacOS. Do we want to fall back to select on Darwin systems just to be safe?


    pstratem commented at 11:53 PM on October 18, 2018:

    I'm avoiding using poll() or select() to sleep specifically because of this class of bug.

  39. jamesob commented at 8:28 PM on October 16, 2018: member

    Some minor comments and a question about conservatively falling back to select for MacOS systems. Code looks good!

  40. in src/net.cpp:1313 in 84fae8d4cc outdated
    1312 |              }
    1313 |          }
    1314 |      }
    1315 | +}
    1316 | +
    1317 | +#ifndef HAVE_POLL
    


    jamesob commented at 8:29 PM on October 16, 2018:

    (from a previous comment on the outdated #ifdef WIN32) Apparently poll has been flaky on previous versions of MacOS. Do we want to fall back to select on Darwin systems just to be safe?


    pstratem commented at 7:53 PM on October 20, 2018:

    i'm specifically avoiding calling poll (or select) with nothing to listen for because of that class of bugs, i'll add a comment

  41. pstratem renamed this:
    net: implement poll
    wip: net: implement poll
    on Oct 20, 2018
  42. pstratem force-pushed on Oct 25, 2018
  43. pstratem force-pushed on Oct 29, 2018
  44. pstratem force-pushed on Oct 30, 2018
  45. pstratem force-pushed on Oct 30, 2018
  46. pstratem force-pushed on Oct 30, 2018
  47. pstratem force-pushed on Oct 31, 2018
  48. pstratem force-pushed on Oct 31, 2018
  49. pstratem force-pushed on Oct 31, 2018
  50. pstratem force-pushed on Oct 31, 2018
  51. pstratem force-pushed on Oct 31, 2018
  52. pstratem force-pushed on Oct 31, 2018
  53. pstratem force-pushed on Oct 31, 2018
  54. pstratem force-pushed on Oct 31, 2018
  55. pstratem force-pushed on Oct 31, 2018
  56. pstratem renamed this:
    wip: net: implement poll
    net: implement poll
    on Oct 31, 2018
  57. in src/net.cpp:1371 in ec652e12ee outdated
    1379 | +    //
    1380 | +    // Find which sockets have data to receive
    1381 | +    //
    1382 | +    struct timeval timeout;
    1383 | +    timeout.tv_sec  = 0;
    1384 | +    timeout.tv_usec = SELECT_TIMEOUT_MILLISECONDS * 1000; // frequency to poll pnode->vSend
    


    laanwj commented at 12:32 PM on November 1, 2018:

    don't know how long the old code waited, but isn't 50 milliseconds a bit short? if there are no events, why not sleep as long as possible? (saving CPU cycles) can you please add a comment explaining the reasoning for this specific value


    pstratem commented at 5:52 PM on November 1, 2018:

    I carried over the existing value.

    There are two reasons for this to be a short value:

    • we cant modify the list of sockets while waiting, so the sleep time is potentially the delay before a new connection is serviced at all
    • the disconnectnodes and inactivitychecks are time based and independent of socketevents

    pstratem commented at 5:57 PM on November 1, 2018:

    i just checked and this is the value from satoshi, so im sure it was entirely arbitrarily chosen

  58. in src/net.cpp:1269 in ec652e12ee outdated
    1263 | @@ -1262,28 +1264,10 @@ void CConnman::InactivityCheck(CNode *pnode)
    1264 |      }
    1265 |  }
    1266 |  
    1267 | -void CConnman::SocketHandler()
    1268 | +bool CConnman::GenerateSelectSet(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    


    laanwj commented at 12:52 PM on November 1, 2018:

    I guess we want these sets to be sorted by fd, or at least predictably sorted, as it's iterated over--so using set over unordered_set makes sense


    pstratem commented at 5:55 PM on November 1, 2018:

    regardless of iterating i cant imagine we'd see any change in a benchmark here


    sipa commented at 7:35 PM on November 10, 2018:

    It's probably slightly more efficient to return sorted vectors, and use std::binary_search to look up things in it (as std::map needs one allocation per element), but agree that it is unlikely to matter here.

  59. laanwj commented at 1:10 PM on November 1, 2018: member

    some minor comments but otherwise utACK from me

  60. pstratem force-pushed on Nov 1, 2018
  61. sipa added this to the "Blockers" column in a project

  62. pstratem force-pushed on Nov 5, 2018
  63. in src/compat.h:105 in efb25c0a31 outdated
     101 | @@ -102,12 +102,12 @@ typedef void* sockopt_arg_type;
     102 |  typedef char* sockopt_arg_type;
     103 |  #endif
     104 |  
     105 | +#if not defined WIN32
    


    ken2812221 commented at 1:09 AM on November 6, 2018:

    nit: MSVC does not like this. I would suggest to use #ifndef


    laanwj commented at 7:02 AM on November 6, 2018:

    I think the official syntax, which works with any C++ compiler, is

    #if !defined(WIN32)
    
  64. sdaftuar commented at 6:25 PM on November 6, 2018: member

    I haven't reviewed the code at all, but this doesn't seem to work on my mac (macos 10.13.3). It looks like I can't maintain a connection to any peers, here's an example of debug.log output when I grep for a single peer, to show an example:

    2018-11-06T18:20:54Z Added connection peer=1788
    2018-11-06T18:20:54Z sending version (103 bytes) peer=1788
    2018-11-06T18:20:54Z send version message: version 70015, blocks=549011, us=[::]:0, peer=1788
    2018-11-06T18:21:55Z socket no message in first 60 seconds, 0 1 from 1788
    2018-11-06T18:21:55Z disconnecting peer=1788
    2018-11-06T18:21:55Z Cleared nodestate for peer=1788
    

    This pattern is holding for all my peers: my node is making an outbound connection and then disconnecting after a minute.

  65. bitcoin deleted a comment on Nov 6, 2018
  66. pstratem force-pushed on Nov 6, 2018
  67. pstratem force-pushed on Nov 6, 2018
  68. pstratem commented at 11:25 PM on November 6, 2018: contributor

    @sdaftuar the last commit insures that there is a single struct pollfd per fd and adds significantly more logging

    can you give that a try, i dont have an os x system

  69. MarcoFalke added the label Needs gitian build on Nov 6, 2018
  70. sdaftuar commented at 1:54 AM on November 7, 2018: member

    @pstratem New version seems to be working, I'll let it run overnight and report back.

  71. DrahtBot commented at 5:51 PM on November 7, 2018: contributor

    <!--a722867cd34abeea1fadc8d60700f111-->

    Gitian builds for commit d864e45730be82879abe9c096c4d577975fdda7d (master):

    Gitian builds for commit 66701dc4abf7c110b7048ddd5249225f8b406ac2 (master and this pull):

  72. DrahtBot removed the label Needs gitian build on Nov 7, 2018
  73. pstratem commented at 8:48 PM on November 7, 2018: contributor

    @sdaftuar that's interesting, seems there's a bug in the os x implementation of poll regarding the same fd having multiple pollfd entries

  74. pstratem force-pushed on Nov 7, 2018
  75. pstratem force-pushed on Nov 8, 2018
  76. in src/net.cpp:1341 in 44f9c1ab13 outdated
    1344 | +        pollfds[hSocket].fd = hSocket;
    1345 | +        // These flags are ignored, but we set them for clarity
    1346 | +        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347 | +    }
    1348 | +
    1349 | +    std::vector<struct pollfd> vpollfds;
    


    jamesob commented at 2:42 AM on November 9, 2018:

    vpollfds.reserve(pollfds.size())?


    pstratem commented at 2:45 AM on November 9, 2018:

    i believe std::map::size is o(n)


    sipa commented at 2:46 AM on November 9, 2018:

    pstratem commented at 6:48 PM on November 9, 2018:

    ok changing now

  77. in src/net.cpp:1343 in 44f9c1ab13 outdated
    1346 | +        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347 | +    }
    1348 | +
    1349 | +    std::vector<struct pollfd> vpollfds;
    1350 | +    for(auto it : pollfds)
    1351 | +        vpollfds.push_back(it.second);
    


    jamesob commented at 2:42 AM on November 9, 2018:

    .push_back(std::move(it.second))?

  78. in src/net.cpp:1343 in 44f9c1ab13 outdated
    1345 | +        // These flags are ignored, but we set them for clarity
    1346 | +        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347 | +    }
    1348 | +
    1349 | +    std::vector<struct pollfd> vpollfds;
    1350 | +    for(auto it : pollfds)
    


    practicalswift commented at 8:33 AM on November 9, 2018:

    Could use std::transform?


    jamesob commented at 3:21 PM on November 9, 2018:

    Transform looks overly complicated for this: https://stackoverflow.com/a/771482

  79. jamesob commented at 2:58 PM on November 9, 2018: member

    I've been trying to figure out why an earlier iteration of this PR fails on macOS (Darwin). The earlier version passes in a separate pollfd struct for each (fd, event type) pair, potentially having duplicate entries for file descriptors listed in multiple *_select_set sets. The current version deduplicates pollfd structs across file descriptors, passing a single struct per fd which consolidates all events we're interested in.

    I did a little reading yesterday to try and figure out why the first approach works on some BSD platforms (per @sdaftuar's testing on FreeBSD), but not macOS. Bear with me while we briefly venture into the weeds.

    Darwin's poll() implementation basically wraps kqueue use; for each pollfd struct passed into Darwin's poll(), 3 separate kevent_register() calls are made and then waited upon using kqueue_scan() up to the timeout.

    I found an apparent bug in Darwin's kqueue implementation: for each kevent registered, a knote is created and enqueued on a per-fd list on the process' file descriptor table. These knotes track which events we're waiting on and are supposed to be unique per (fd, kevent filter) pair, where the kevent filter is either EVFILT_READ (for POLLIN) or EVFILT_WRITE (for POLLOUT), per the kqueue API. However, knote_fdfind() looks to be improperly implemented so that the first knote per fd is always returned, effectively ignoring the difference in filter. As far as I can tell, this could result in a situation where output events are ignored for an fd also being polled for input events (because the POLLIN kevent_register() call comes before the POLLOUT one in poll_nocancel()).

    Unfortunately, this doesn't explain why POLLIN events are ignored in the previous version of this change - in @sdaftuar's logs above and in my own testing on macOS, we always disconnect peers because we think there's nothing to read on their sockets. This isn't explained by the supposed bug I note above, though my confidence in macOS' implementation of poll() isn't very high given the complexity of the kqueue implementation backing it.

    Other versions of BSD implement poll in a much simpler way, iterating through all the pollfd structs passed in and calling a filetype-specific poll function which returns revents immediately without having to deal with kevents, kqueues, or callbacks. This delegation eventually reduces to a socket-specific poll function.

    Anyway, this is a longwinded way of saying that macOS's poll() doesn't inspire confidence and that we should consider disabling its use on that platform. The other BSDs look okay to me.

  80. pstratem force-pushed on Nov 9, 2018
  81. pstratem commented at 7:06 PM on November 9, 2018: contributor

    I've modified to not use poll() on OSX. However that should probably by an automake m4 of some kind, but I'm not sure how to do that.

  82. pstratem force-pushed on Nov 10, 2018
  83. in src/net.cpp:1325 in 4963f3110e outdated
    1328 | +        interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS));
    1329 | +        return;
    1330 | +    }
    1331 | +
    1332 | +    std::map<SOCKET, struct pollfd> pollfds;
    1333 | +    for (SOCKET hSocket : recv_select_set) {
    


    sipa commented at 7:37 PM on November 10, 2018:

    Tiny nit: variable naming style for hSocket.

  84. in src/net.cpp:1343 in 4963f3110e outdated
    1346 | +        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347 | +    }
    1348 | +
    1349 | +    std::vector<struct pollfd> vpollfds;
    1350 | +    vpollfds.reserve(pollfds.size());
    1351 | +    for(auto it : pollfds)
    


    sipa commented at 7:37 PM on November 10, 2018:

    Tiny nit: space after for (and after if further).

  85. in src/net.cpp:1353 in 4963f3110e outdated
    1356 | +
    1357 | +    if (interruptNet)
    1358 | +        return;
    1359 | +
    1360 | +    for (struct pollfd pollfd : vpollfds) {
    1361 | +        if (pollfd.revents & POLLIN)
    


    sipa commented at 7:39 PM on November 10, 2018:

    Style nit: braces whenever the next lines are indented (or put the then clause on the same line as the if).

  86. in src/net.cpp:1413 in 4963f3110e outdated
    1428 | -        if (!interruptNet.sleep_for(std::chrono::milliseconds(timeout.tv_usec/1000)))
    1429 | +        if (!interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS)))
    1430 |              return;
    1431 |      }
    1432 |  
    1433 | +    for (SOCKET hSocket : recv_select_set)
    


    sipa commented at 7:42 PM on November 10, 2018:

    Style nit: braces when indenting, and spaces after if.

  87. in src/netbase.h:20 in 4963f3110e outdated
      16 | @@ -17,6 +17,10 @@
      17 |  #include <string>
      18 |  #include <vector>
      19 |  
      20 | +#ifndef WIN32
    


    sipa commented at 7:43 PM on November 10, 2018:

    It doesn't seem like the current code is using poll on Windows.


    pstratem commented at 9:36 PM on November 10, 2018:

    it isnt?


    sipa commented at 9:38 PM on November 10, 2018:
    #if defined(__linux__)
    #define USE_POLL
    #endif
    

    sipa commented at 9:38 PM on November 10, 2018:

    Oh, damn, I misread ifndef.


    Empact commented at 4:21 PM on November 22, 2018:

    Seems this include should be in netbase.cpp, given there's no publicly exposed reference.

  88. in src/net.h:35 in 4963f3110e outdated
      30 | @@ -31,6 +31,7 @@
      31 |  
      32 |  #ifndef WIN32
      33 |  #include <arpa/inet.h>
      34 | +#include <poll.h>
    


    sipa commented at 7:44 PM on November 10, 2018:

    It doesn't seem like the current code is using poll on Windows.


    pstratem commented at 9:36 PM on November 10, 2018:

    it isnt?

  89. pstratem force-pushed on Nov 10, 2018
  90. pstratem force-pushed on Nov 10, 2018
  91. pstratem force-pushed on Nov 10, 2018
  92. pstratem force-pushed on Nov 10, 2018
  93. pstratem force-pushed on Nov 10, 2018
  94. pstratem force-pushed on Nov 10, 2018
  95. in src/compat.h:98 in ce79907bd7 outdated
     101 | @@ -102,8 +102,12 @@ typedef void* sockopt_arg_type;
     102 |  typedef char* sockopt_arg_type;
     103 |  #endif
     104 |  
     105 | +#if defined(__linux__)
    


    sipa commented at 2:54 AM on November 11, 2018:

    Perhaps add a comment here listen the reason for excluding every platform not included here, so it is easy to re-asses later.

  96. in src/net.cpp:1351 in ce79907bd7 outdated
    1346 | +
    1347 | +    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
    1348 | +
    1349 | +    if (interruptNet) return;
    1350 | +
    1351 | +    for (struct pollfd pollfd : vpollfds) {
    


    sipa commented at 2:57 AM on November 11, 2018:
    net.cpp: In member function ‘void CConnman::SocketEvents(std::set<unsigned int>&, std::set<unsigned int>&, std::set<unsigned int>&)’:
    net.cpp:1351:10: warning: types may not be defined in a for-range-declaration
         for (struct pollfd pollfd : vpollfds) {
              ^~~~~~
    
  97. sipa commented at 3:01 AM on November 11, 2018: member

    Concept ACK

    You'll want to make the maxconnections limiting code in init.cpp:

    nMaxConnections = std::max(std::min<int>(nMaxConnections, FD_SETSIZE - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
    

    not depend on FD_SETSIZE when USE_POLL is enabled (just limit to 1000 or so).

  98. pstratem force-pushed on Nov 11, 2018
  99. pstratem force-pushed on Nov 11, 2018
  100. pstratem force-pushed on Nov 11, 2018
  101. pstratem force-pushed on Nov 11, 2018
  102. pstratem force-pushed on Nov 11, 2018
  103. pstratem force-pushed on Nov 11, 2018
  104. pstratem force-pushed on Nov 13, 2018
  105. pstratem force-pushed on Nov 16, 2018
  106. in src/net.h:35 in feca5fb7e4 outdated
      31 | @@ -32,6 +32,7 @@
      32 |  
      33 |  #ifndef WIN32
      34 |  #include <arpa/inet.h>
      35 | +#include <poll.h>
    


    Empact commented at 4:18 PM on November 22, 2018:

    Should this be behind #ifdef USE_POLL instead? To match the pollfd reference below.

  107. in src/init.cpp:960 in feca5fb7e4 outdated
     948 |      nFD = RaiseFileDescriptorLimit(nMaxConnections + MIN_CORE_FILEDESCRIPTORS + MAX_ADDNODE_CONNECTIONS);
     949 | +#ifdef USE_POLL
     950 | +    nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
     951 | +#else
     952 | +    nMaxConnections = std::max(std::min<int>(nMaxConnections, FD_SETSIZE - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
     953 | +#endif
    


    Empact commented at 4:30 PM on November 22, 2018:

    nit: would be nice to avoid the duplication between these lines, something like:

    #ifdef USE_POLL
        int fd_max = nFD;
    #else
        int fd_max = FD_SETSIZE;
    #endif
        nMaxConnections = std::max(std::min<int>(nMaxConnections, fd_max - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
    
  108. in src/net.cpp:1322 in feca5fb7e4 outdated
    1325 | +void CConnman::BuildPollfds(std::vector<struct pollfd> &vpollfds) {
    1326 | +    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1327 | +    if (!GenerateSelectSet(recv_select_set, send_select_set, error_select_set)) {
    1328 | +        interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS));
    1329 | +        return;
    1330 | +    }
    


    Empact commented at 4:37 PM on November 22, 2018:

    If this fails, we probs don't want to call poll in SocketEvents, right? Seems better to return false and handle the error in the caller.

  109. in src/net.cpp:1316 in feca5fb7e4 outdated
    1320 | -                         &fdsetRecv, &fdsetSend, &fdsetError, &timeout);
    1321 | +    return !recv_set.empty() || !send_set.empty() || !error_set.empty();
    1322 | +}
    1323 | +
    1324 | +#ifdef USE_POLL
    1325 | +void CConnman::BuildPollfds(std::vector<struct pollfd> &vpollfds) {
    


    Empact commented at 4:40 PM on November 22, 2018:

    This could overload GenerateSelectSet, maybe under a different name - e.g. GenerateEventSet


    pstratem commented at 11:02 PM on November 29, 2018:

    This is really just temporary to ensure there's no performance issues, the last commit will probably be removed actually.

  110. in src/net.cpp:1419 in feca5fb7e4 outdated
    1429 | +        if (!interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS)))
    1430 |              return;
    1431 |      }
    1432 |  
    1433 | +    for (SOCKET hSocket : recv_select_set) {
    1434 | +        if(FD_ISSET(hSocket, &fdsetRecv)) {
    


    Empact commented at 4:43 PM on November 22, 2018:

    There are a few whitespace issues, could do with running clang-format.

  111. pstratem force-pushed on Nov 29, 2018
  112. pstratem force-pushed on Nov 29, 2018
  113. pstratem force-pushed on Nov 29, 2018
  114. pstratem force-pushed on Nov 30, 2018
  115. Introduce and use constant SELECT_TIMEOUT_MILLISECONDS. 1e6afd0dbc
  116. Move GenerateSelectSet logic to private method.
    This separates the socket event collection logic from the logic
    deciding which events we're interested in at all.
    7e403c0ae7
  117. Move SocketEvents logic to private method.
    This separates the select() logic from the socket handling logic, setting up
    for a switch to poll().
    28211a4bc9
  118. pstratem force-pushed on Nov 30, 2018
  119. pstratem commented at 11:04 PM on November 30, 2018: contributor

    With 87 connections the effect of generating the pollfd list shows up on a profile but isn't significant.

    The poll() implementation is on the order of 1000 ns slower per call than the select() implementation.

    I don't think that's significant.

  120. in src/net.cpp:1347 in 20c9601e6b outdated
    1350 | +    vpollfds.reserve(pollfds.size());
    1351 | +    for (auto it : pollfds) {
    1352 | +        vpollfds.push_back(std::move(it.second));
    1353 | +    }
    1354 | +
    1355 | +    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
    


    jamesob commented at 3:50 PM on December 3, 2018:

    (non-blocking) Wondering if it's worth logging a negative error code here, eg

    diff --git a/src/net.cpp b/src/net.cpp
    index 7b8b6e5ea2..3b05aad6d1 100644
    --- a/src/net.cpp
    +++ b/src/net.cpp
    @@ -1344,7 +1344,10 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
             vpollfds.push_back(std::move(it.second));
         }
     
    -    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
    +    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) {
    +        LogPrint(BCLog::NET, "Poll failed (errno %d)\n", WSAGetLastError());
    +        return;
    +    }
     
         if (interruptNet) return;
     
    
    

    pstratem commented at 5:09 PM on December 3, 2018:

    It might be, but I'd like to cleanup the error logging more in the future, the select() code randomly spams the log with EINTR errors as well, so I'd say it's a separate issue.

  121. in src/net.cpp:1358 in 20c9601e6b outdated
    1361 | +        if (pollfd_entry.revents & POLLOUT)           send_set.insert(pollfd_entry.fd);
    1362 | +        if (pollfd_entry.revents & (POLLERR|POLLHUP)) error_set.insert(pollfd_entry.fd);
    1363 | +    }
    1364 | +}
    1365 | +#else
    1366 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    


    jamesob commented at 3:56 PM on December 3, 2018:

    (non-blocking)

    diff --git a/src/net.cpp b/src/net.cpp
    index 7b8b6e5ea2..03d165d701 100644
    --- a/src/net.cpp
    +++ b/src/net.cpp
    @@ -1355,6 +1355,7 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
         }
     }
     #else
    +// Use select() on platforms where poll() is unavailable.
     void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
     {
         std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    
  122. in src/netbase.cpp:277 in 20c9601e6b outdated
     272 | +                struct pollfd pollfd = {};
     273 | +                pollfd.fd = hSocket;
     274 | +                pollfd.events = POLLIN | POLLOUT;
     275 | +                int nRet = poll(&pollfd, 1, std::min(endTime - curTime, maxWait));
     276 | +#else
     277 |                  struct timeval tval = MillisToTimeval(std::min(endTime - curTime, maxWait));
    


    jamesob commented at 4:07 PM on December 3, 2018:

    (non-blocking) std::min(endTime - curTime, maxWait) could be deduplicated.


    pstratem commented at 7:26 PM on December 3, 2018:

    agreed

  123. in src/netbase.cpp:284 in 20c9601e6b outdated
     278 |                  fd_set fdset;
     279 |                  FD_ZERO(&fdset);
     280 |                  FD_SET(hSocket, &fdset);
     281 |                  int nRet = select(hSocket + 1, &fdset, nullptr, nullptr, &tval);
     282 | +#endif
     283 |                  if (nRet == SOCKET_ERROR) {
    


    jamesob commented at 4:12 PM on December 3, 2018:

    (non-blocking) This might be academic, but since this clause is handling both select() and poll() might be safer to say if (nRet < 0)? Though right now there's no difference in the return value for both APIs (and I doubt that'll ever change...).

  124. jamesob approved
  125. jamesob commented at 4:18 PM on December 3, 2018: member

    utACK https://github.com/bitcoin/bitcoin/pull/14336/commits/20c9601e6b7c4297f39de149d26292c368b63ff8

    Code looks good, feel free to ignore the small comments. I'm running tests at the moment and will report back with results within the next day - testnet happy-path behavior looks good so far.

  126. Implement poll() on systems which support it properly.
    This eliminates the restriction on maximum socket descriptor number.
    11cc491a28
  127. Increase maxconnections limit when using poll. 4927bf2f25
  128. pstratem force-pushed on Dec 3, 2018
  129. jamesob approved
  130. jamesob commented at 3:38 PM on December 4, 2018: member

    Tested ACK https://github.com/bitcoin/bitcoin/pull/14336/commits/4927bf2f257ac53569978980eaf1f61c2c6b04cc

    Tested on Linux 4.15.0-39-generic and macOS 10.12.1. Tested IBD on testnet as well as tweaking -maxconnections and observing connections with lsof -a -itcp -p $(pidof bitcoind).

    Only code change since my last utACK is a small cleanup in netbase.cpp.

    Thanks for the work here @pstratem.

  131. laanwj commented at 6:41 PM on December 13, 2018: member

    utACK 4927bf2f257ac53569978980eaf1f61c2c6b04cc

  132. laanwj merged this on Jan 2, 2019
  133. laanwj closed this on Jan 2, 2019

  134. laanwj referenced this in commit 62cf608e93 on Jan 2, 2019
  135. fanquake removed this from the "Blockers" column in a project

  136. BlockMechanic commented at 9:47 AM on October 12, 2019: contributor

    Crashes on android 7 8 and 9 armv7a. May need to implement a work around

  137. codablock cross-referenced this on Apr 7, 2020 from issue Backport poll() code from Bitcoin by codablock
  138. codablock referenced this in commit d901baebc7 on Apr 8, 2020
  139. codablock referenced this in commit 8481d6c110 on Apr 8, 2020
  140. deadalnix referenced this in commit 7ef2a7b761 on Jun 9, 2020
  141. deadalnix referenced this in commit 716ac6e899 on Jun 9, 2020
  142. deadalnix referenced this in commit 96827c2176 on Jun 9, 2020
  143. deadalnix referenced this in commit 898802efaa on Jun 9, 2020
  144. deadalnix referenced this in commit 6f41fa19a3 on Jun 10, 2020
  145. hebasto referenced this in commit ee7891a0c4 on Sep 25, 2021
  146. hebasto cross-referenced this on Sep 25, 2021 from issue doc: Remove outdated comments by hebasto
  147. laanwj referenced this in commit 8b523f2e55 on Sep 27, 2021
  148. sidhujag referenced this in commit e84c360598 on Sep 27, 2021
  149. hebasto cross-referenced this on Sep 29, 2021 from issue doc: Revert "Remove outdated comments" and place comment correctly by hebasto
  150. laanwj referenced this in commit dbbb7fbcc0 on Sep 30, 2021
  151. sidhujag referenced this in commit a2b232de94 on Sep 30, 2021
  152. rebroad referenced this in commit 147cd3022a on Oct 13, 2021
  153. bitcoin locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:54 UTC