This is the last one of these posts I'll ever respond to, I promise. And I'll give you the same response I've given every other time before:
You should not base your decision of database (or anything else for that matter) on marketing copy. For something as important as your primary data store, you should at minimum read the full documentation and run some tests with dummy data to see if it will even plausibly work for your use case.
I used MongoDB successfully for years with a large data set (>1TB) and 100% production uptime for more than 3 years. I never lost data. Your claim that you will unavoidably lose data is baseless and without merit. In fact, every issue you listed has been fixed, again, counter to your claim.
Personally, these days I prefer TokuMX if I'm looking for something compatible with MongoDB, but these baseless attacks on MongoDB have to stop.
EDIT: Every time I make a post like this, I get some downvotes without responses. Please tell me why I'm wrong. If it's just that I'm abrasive... Well, you would be too if you were addressing the same thing for the Nth time.
Not the downvoter - but I can totally understand the downvote.
The fact your anecdotal evidence is that you did not lose any data doesn't mean the internet is not full of people who have lost data with Mongo. I have no idea what your workload is, but my experience with data loss and uptime has not been as great as yours.
I'm not for bashing things either - I think there are cases Mongo might be appropriate, I just don't like countering claims with "it worked for me on this one data set". If it drops writes for one out of 100 people that's still a big reason to avoid it if that's a big concern for you. As for "these issues have been fixed" you're welcome to open the issue tracker - no one at Mongo claims all of these issues have been fixed (then again, PostgreSQL has open issues too) so your claim that "these issues have been fixed" is kind of odd...
I only brought that up to counter his claim that data loss is inevitable. Of course my anecdote doesn't mean it's not common =) But anecdotes are all anyone else has, and every time I've read one about someone losing data, they either hadn't read the documentation, or just didn't understand the semantics of what they were doing. Very very rarely, especially these days, has it been an actual DB bug (though I will admit I got Mongo to core one time on 2.4 doing a compaction).
And it's a little disingenuous to point at the issue tracker -- as you say, everyone has open issues. The specific things that are mentioned though have been fixed: writes are checked by default now, the global lock has been broken up into per table locks, etc. There may still be common issues that aren't being addressed, but if there are, I'm not aware of them.
Anecdotes are not all one has. We have researchers, and it seems you might have missed yesterday's https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-read... (or previous posts on the matter). This link debunks the value of RTFM for Mongo or misunderstanding the API or the rarity of the bug being the DB.
I think that it is disingenuous to say that because a fuzzer found certain obscure scenarios where there are issues that automatically everyone is going to be affected by it and that the database has no merits.
Also, if I have to choose a datastore, although marketing shouldn't be important, funding is and MongoDB has had some huge funding rounds in the past. This gives me a lot more confidence in choosing it.
If I say "its inevitable that you will get into an accident while drunk driving" and you say "I've been drunk driving for years with 100% no accidents" I would assume you are being dense.
How is it not valid? There are documented tests of exactly how and why the database loses data (just like there are studies showing the effects of alcohol), and you have claimed that "it's fine, because it never happened to me". You said the claim was baseless when it wasn't - there is another very popular HN post recently documenting how someone ran a test, proved the database was losing data, and the issue was closed as wontfix (but later reopened). Is aphyr's entire article baseless (and the one he wrote 2 years ago).
In the face of actual data, and reproducible tests - isn't saying something like "well it didn't happen to me" dense?
The comparison might be insensitive, so excuse me for hurting your feelings, but I don't see how its invalid.
A more apt analogy then would be someone saying "My database runs with 100% uptime for 3 years, so there is no reason for me to keep backups"
I promise I'm not trying to troll. Given how the data loss can occur in MongoDB (partitions, silently lost writes) - how do you know that "I never lost data." How do you verify this?
I'm pretty sure that I could kill 0.01% of writes of any random application (which doesn't require extensive audits, like banking for example) and nobody would notice for a really long time. And if the effect was ever noticed, application code would be the first place to look at for the reason.
Amen brother. You should never base your db decisions on either marketing copy _or_ HN know-it-all complainypants. :)
We actually evaluated TokuMX extensively last year, pre pluggable storage engine. We might have pursued it if they had implemented a compatible oplog at the time, but with a migration path that consisted of dumping and importing production data -- and no way to re-elect the old primary if there were any production problems -- that made it simply a non-starter for me.
They did eventually implement a compatible oplog, which was a good product decision, but the entire TokuMX engineering team recently quit Tokutek en masse so it's still not a great option in my book. Too bad.
> You should not base your decision of database (or anything else for that matter) on marketing copy. For something as important as your primary data store, you should at minimum read the full documentation and run some tests with dummy data to see if it will even plausibly work for your use case.
The issue is that they even lie in their documentation[1].
Also Mongo not necessarily loses data in a catastrophic way, you might have some old or inconsistent data here and there. If you have an authoritative source of data I would compare the data in Mongo with it. Also [1] shows how you can get data corruption even with highest safety settings due to broken design.
You should not base your decision of database (or anything else for that matter) on marketing copy. For something as important as your primary data store, you should at minimum read the full documentation and run some tests with dummy data to see if it will even plausibly work for your use case.
I used MongoDB successfully for years with a large data set (>1TB) and 100% production uptime for more than 3 years. I never lost data. Your claim that you will unavoidably lose data is baseless and without merit. In fact, every issue you listed has been fixed, again, counter to your claim.
Personally, these days I prefer TokuMX if I'm looking for something compatible with MongoDB, but these baseless attacks on MongoDB have to stop.
EDIT: Every time I make a post like this, I get some downvotes without responses. Please tell me why I'm wrong. If it's just that I'm abrasive... Well, you would be too if you were addressing the same thing for the Nth time.