Skip to content

net: shutdown gracefully#1164

Closed
indutny wants to merge 230 commits into
nodejs:v1.xfrom
indutny:fix/graceful-shutdown-in-tls
Closed

net: shutdown gracefully#1164
indutny wants to merge 230 commits into
nodejs:v1.xfrom
indutny:fix/graceful-shutdown-in-tls

Conversation

@indutny

@indutny indutny commented Mar 16, 2015

Copy link
Copy Markdown
Member

Wait for the shutdown request completion before emitting the finish
event and destroying the socket. While this might not be that relevant
in case of plain TCP sockets, the TLS implementation sends close-notify
packet during shutdown request. Destroying socket whilst this write is
in progress tends to cause ECONNRESET errors in our tests.

@indutny

indutny commented Mar 16, 2015

Copy link
Copy Markdown
Member Author

cc @piscisaureus should fix that nasty issue on windows. Going to test it now.

@calvinmetcalf

Copy link
Copy Markdown
Contributor

you beat me to opening this, through I was going call it _end for symmetry with the end method

@indutny

indutny commented Mar 16, 2015

Copy link
Copy Markdown
Member Author

Do not land this yet, it doesn't fix the issue on windows. Needs some investigation.

@indutny

indutny commented Mar 16, 2015

Copy link
Copy Markdown
Member Author

Going to push here a couple of commits just for my own testing, sorry about this!

@indutny

indutny commented Mar 17, 2015

Copy link
Copy Markdown
Member Author

Ok, now it seems to be ready. @piscisaureus : this fixes the issue on windows once and for free ;)

@Fishrock123

Copy link
Copy Markdown
Contributor

@piscisaureus

Copy link
Copy Markdown
Contributor

@indutny

First of all, great work for digging into this and finding a fix.

I have one question though. Back in the day there we deliberately chose not to gracefully close TCP connections when serving http, for performance reasons. Instead we'd just wait until all writes were completed, and then close the connection. The kernel then takes care of doing a graceful close (note that it's still a graceful close!).

This is also the source of the subtle differences between the various finalization methods for TCP connections:

  • end() flushes all the writes, then calls shutdown(3), then close(3).
  • destroySoon() flushes all the writes and calls close(3) to do a graceful shutdown.
  • destroy() does a hard close, potentially causing a RST packet to be sent (so that the other side gets an ECONNRESET error).

I think it would be best if the TLS wrap also supported this kind of behavior.

I am not sure I completely understand how this patch affects that behavior.

@bnoordhuis I would also like to hear some deep thoughts from you.

@indutny

indutny commented Mar 17, 2015

Copy link
Copy Markdown
Member Author

@piscisaureus I guess we could use it for TLS only then, putting this __flush method to TLSSocket.

@Fishrock123

Copy link
Copy Markdown
Contributor

Looks like this cropped up on both windows machines:

=== release test-tls-server-verify ===
Path: parallel/test-tls-server-verify
connecting with agent1
Running 'Do not request certs. Everyone is unauthorized.'
- unauthed connection: null
  * unauthed
events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: read ECONNRESET
    at exports._errnoException (util.js:734:11)
    at TLSWrap.onread (net.js:514:26)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\test\parallel\test-tls-server-verify.js

@indutny

indutny commented Mar 17, 2015

Copy link
Copy Markdown
Member Author

@Fishrock123 yeah, I see it on my windows box too. Looking.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

Ok, let's hope this one is a green now: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/316/

@bnoordhuis

Copy link
Copy Markdown
Member

What's the reason for always calling shutdown() on TCP sockets now? I have a gut feeling that's going to impact performance for no good reason.

@Fishrock123

Copy link
Copy Markdown
Contributor

@indutny

=== release test-child-process-stdout-flush-exit ===
Path: parallel/test-child-process-stdout-flush-exit
assert.js:87
  throw new assert.AssertionError({
        ^
AssertionError: false == true
    at ChildProcess.<anonymous> (c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-stdout-flush-exit.js:43:5)
    at emitTwo (events.js:87:13)
    at ChildProcess.emit (events.js:169:7)
    at maybeClose (child_process.js:984:16)
    at Process.ChildProcess._handle.onexit (child_process.js:1057:5)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-stdout-flush-exit.js

Edit: this doesn't seem related? Weird. Haven't seen this before/recently..

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

@Fishrock123 is it related or not?

@bnoordhuis I was trying to figure out shutdown in the middle thing, but it might be not worth it, indeed.

@indutny indutny closed this Mar 18, 2015
@Fishrock123

Copy link
Copy Markdown
Contributor

@indutny I think you'd be able to tell better than me, but I'd wager to say it might be, given it tests functionality related to flush and this changes some stream flush code.

@indutny indutny reopened this Mar 18, 2015
@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

@Fishrock123 I was wrong to close it yesterday :) Now I see that it should be still useful.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

CI again: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/324/

I don't see any failures except fs-access on my box.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

It appears to me that parallel/test-net-reconnect-error is just very slow on windows.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

Looking into other tests too.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

parallel/test-http-curl-chunk-problem works just fine without timing out.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

Same about parallel/test-process-kill-null, all of the rest works too.

@indutny

indutny commented Mar 18, 2015

Copy link
Copy Markdown
Member Author

@piscisaureus may I ask you to take a look at these? I'm not sure how I could reproduce...

@Fishrock123

Copy link
Copy Markdown
Contributor

@indutny Odd, that test failure did not manifest itself again.

The timeouts seem standard and almost certainly are not related to the PR. (That should be for a different issue.)

@indutny indutny force-pushed the fix/graceful-shutdown-in-tls branch from c57e567 to aff0cb3 Compare March 19, 2015 01:56
@indutny

indutny commented Mar 19, 2015

Copy link
Copy Markdown
Member Author

@jbergstroem

Copy link
Copy Markdown
Member

Looks like the os x box fails sequential/test-child-process-fork-getconnections.js. Related?

@Fishrock123

Copy link
Copy Markdown
Contributor

An OS X failure has popped up, I reported this one in #1100 as I had previously seen it locally. This is the first time I've seen it on the CI. (should be unrelated)

Trott and others added 9 commits June 4, 2015 11:38
It's not clear what additional tests are wanted.
The current malformed URL test seems adequate.

PR-URL: nodejs#1875
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
`printDeprecationMessage` is used to deprecate modules
and execution branches.

PR-URL: nodejs#1822
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
It makes no sense to allow people use constants from
`smalloc`, since it will be removed completely eventually.

PR-URL: nodejs#1822
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
When loading session, OCSP response, SNI, always check that the
`self._handle` is present. If it is not - the socket was closed - and we
should emit the error instead of throwing an uncaught exception.

Fix: nodejs/node-v0.x-archive#8780
Fix: nodejs#1696
PR-URL: nodejs#1702
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
* Destroy `SSL*` and friends on a next tick to make sure that we are not
  doing it in one of the OpenSSL callbacks
* Add more checks to the C++ methods that might be invoked during
  destructor's pending queue cleanup

Fix: nodejs/node-v0.x-archive#8780
Fix: nodejs#1696
PR-URL: nodejs#1702
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
PR-URL: nodejs#1702
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Add string encoding option for fs.createReadStream and
fs.createWriteStream. and check argument type more strictly

PR-URL: nodejs#1845
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
When using `iojs debug -p <pid>` with an invalid pid, the debugger
printed an internal error message because it wasn't smart enough
to figure out that the target process didn't exist.  Now it is.

PR-URL: nodejs#1863
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Fedor Indutny <fedor@indutny.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Yosuke Furukawa <yosuke.furukawa@gmail.com>
This allows `graceful-fs` to evaluate `fs` source
without access to internals.

This is a temporary workaround that makes npm work.

See: isaacs/node-graceful-fs#41
Fixes: nodejs#1898
PR-URL: nodejs#1903
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@bnoordhuis

Copy link
Copy Markdown
Member

@indutny Is this PR still relevant? I remember when I last reviewed it I didn't really get the point. Perhaps some TLS regression tests would help?

@indutny

indutny commented Jun 5, 2015

Copy link
Copy Markdown
Member Author

Yes, it is relevant. I will work on regression test for this.

saghul and others added 4 commits June 5, 2015 22:12
PR-URL: nodejs#1905
Refs: nodejs#1791
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
On case insensitive platforms, the rule was catching the debug module
under npm and eslint.

See: nodejs#1899 (comment)
PR-URL: nodejs#1908
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
os.homedir() calls libuv's uv_os_homedir() to retrieve the current
user's home directory.

PR-URL: nodejs#1791
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Rod Vagg <rod@vagg.org>
Wait for the `shutdown` request completion before emitting the `finish`
event and destroying the socket. While this might not be that relevant
in case of plain TCP sockets, the TLS implementation sends close-notify
packet during shutdown request. Destroying socket whilst this write is
in progress tends to cause ECONNRESET errors in our tests.
@indutny indutny force-pushed the fix/graceful-shutdown-in-tls branch from bd4daca to f3503ba Compare June 6, 2015 16:26
@indutny

indutny commented Jun 6, 2015

Copy link
Copy Markdown
Member Author

Thinking about it, it looks like it is no longer needed. Thanks for raising the question, @bnoordhuis .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

net Issues and PRs related to the net subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.