What are persistent logs?
In this context, a log is an ordered sequence of messages that you can append to, but cannot go back and cha...
For further actions, you may consider blocking this person and/or reporting abuse
Thanks for the article, but there are some important things missing.
One large missing point is reconnection and NATS/STAN restarts, which isn't trivial to implement.
Another is producer may need to store outgoing events into own persistent outgoing queue before trying to send them to STAN to avoid missing events on producer restarts.
This won't work unless updating this file can be done atomically with handling message. Which is impossible in general case, and usually can be done in cases like when handling message is done by updating data in SQL database and this ID is updated in same database and in same transaction.
Both the NATS and STAN clients reconnect automatically. One configurable setting is how many reconnects should take place and you set this to unlimited.
On a publish, the client will return an error if it cannot contact the server or if the server fails to ack. If you do this asynchronously (not in the request hot path for example), then you are correct you will need to handle this locally by storing the events and retrying perpetually. However if this is part of a request transaction, then presumably any effects could be compensated for and an error would returned to the caller.
Yes absolutely, the devil is in the details. The general case is that if you are doing at least two kinds of I/O in a transaction, you will need some kind of two-phase commit or consensus + quorum since any of the I/O can fail.
The intent for the examples here is to demonstrates patterns, not the failure cases (which are very interesting in their own right!) But thanks for pointing out a couple (very serious) failure cases!
NATS - maybe, but STAN doesn't reconnect.
MaxReconnects option for NATS, setting to -1 will cause the client to retry indefinitely. For the STAN client, you can achieve the same thing by passing in the NATS connection using this option. So..
The STAN client just build on a NATS connection, so any NATS connection options will be respected in the STAN client.
STAN doesn't reconnect automatically, just test it yourself. Also, if no nats connection provided to STAN then it create default connection with same -1 setting.
From the README, I read that the intent is for reconnection to happen transparently, but there are cases where this fails (there appears to have been quite a few changes and improvements since I wrote this article). It describes the case that the server doesn't respond to PINGs and the client decided to close the connection. Likewise if there is a network partition or something, the server may disconnect, but the client (in theory) could re-establish the connection after the interruption.
I am not arguing that it can't happen, but I am simply stating that the intent is for the client to reconnect. If you are observing otherwise, then that may be an issue to bring up to the NATS team.
Team is aware about this. I agree intent is to support reconnects, but for now you have to implement it manually, and it's unclear when (if ever) this will be fixed in library. Main issue with reconnects is needs to re-subscribe, and how this should be done is depends on application, so it's not ease to provide some general way to restore subscriptions automatically - this is why it's not implemented yet and unlikely will be implemented soon.
Good to know. Agreed, in the case of consumers how subscriptions need or should be re-established can vary based on the application.
Thanks for the write up. One thing I noticed in your code snippets - combination of log.Fatal and defer calls. Defer calls are never called after log.Fatal -> github.com/golang/go/issues/15402#...
Good catch and thanks for letting me know! I do know this but I seem to let that bad habit creep in for examples and such. But here it is a bigger issue because properly closing connections is very important. I will update the examples shortly!
I guess I didn't realize this is a bit of a pain to do. Not terrible, but easy to miss. Basically only
main
should callFatal
oros.Exit
and no defers should be used. More generally as long as no function in the stack above this call should use a deferred function.Edit
Why do you use
atomic.SwapUnint64(&lastProcessed, msg.Sequence)
? As far as I understand handler function (for single subscription) won't be executed in parallel, solastProcessed = msg.Sequence
should be safe here.Hi Alex, you are correct. This example doesn't have two threads competing for the
lastProcessed
variable. I did it to be explicit about the kind of operation it is suggesting. Its not obvious for many that a race condition could easily be introduced here if, for example, the value is being written to disk or a database or something to keep track of the offset.Thanks for your article! I have a question, how can I simultaneously realize delivering all messages available and controlling acks manually? If I only set options of DurableName and DeliverAllAvailable, I can receive all messages from the beginning after subscription restart. However, If I set one more option SetManualAckMode, I can just receive the message that was sending after subscription restart. I'm so confused about that.
Thank you! There are a few different concerns you are describing. One is defining the start message for a given subscription, such as the beginning for a new subscription or some other arbitrary position. The second is implicit vs. manual ack-ing of messages as the consumer processes them. If the manual ack mode is enabled, then the consumer is required to call
Ack()
in order to tell the server (NATS Streaming) that the message should not be redelivered. Not doing manual acks means the server will implicitly assume once it delivers the message to the consumer it has been handled and/or the consumer doesn't want to receive it again even if there was a problem processing it locally.Hopefully that helps a bit?
One of NATS Streaming's biggest drawbacks is the lack for clustering.
FYI ... Clustering is now available.
Full clustering is being actively worked on. However, it does currently support "partitioning" which enables configuring multiple NATS Streaming servers to handle different sets of channels. In this mode, they are "clustered" in that a client can connect to any server and it will be routed to the correct server for the given channel. So for scaling needs, that is one option. The best option for replication/fault tolerance of the data are to setup up a consumer that relays data from the various channels into a separate server on another host.
Hi Byron, thanks for the article! I have a question that if the durable queue group guarantees in-order messages after all consumers restart.