How Discord stores trillions of messages (2023)

foobazgt · 2024-09-29T00:04:26 1727568266

This blog post seems to blame GC heavily, but if you look back at their earlier blog post [0], it seems to be more shortcomings in either how they're using Cassandra or how Cassandra handles heavy deletes, or some combination:

"It was at that moment that it became obvious they deleted millions of messages using our API, leaving only 1 message in the channel. If you have been paying attention you might remember how Cassandra handles deletes using tombstones (mentioned in Eventual Consistency). When a user loaded this channel, even though there was only 1 message, Cassandra had to effectively scan millions of message tombstones (generating garbage faster than the JVM could collect it)."

And although the blog post talks about GC tuning, there's mention here [1] that they didn't do much tuning and were actually running on an old version of Cassandra (and presumably JVM) - having just switched over from CMS (!).

  0) https://discord.com/blog/how-discord-stores-billions-of-messages
  1) https://news.ycombinator.com/item?id=33136453

Aeolun · 2024-09-29T02:48:39 1727578119

But then it’s still nice that they’re using ScyllaDB and now it’s not a concern at all right?

Even if they were using their original solution wrong, I think the solution that cannot use wrong is superior.

ericvolp12 · 2024-09-29T04:39:29 1727584769

The funny part is ScyllaDB still uses tombstones for deletions, though they do have configurable compaction strategies and iirc Discord uses Scylla's Incremental Compaction Strategy that I suppose solves the specific issue they were dealing with. iirc that compaction strategy will trigger a compaction once a certain threshold of a partition is tombstones and then the table is rebuilt without the tombstoned content (which effectively pauses writes on that specific node and that specific table and partition for the duration of that process). Compacting a massive partition is really expensive. Scylla defaults to warning you that a partition is too large if it has at least 100,000 rows in it. My guess is when they moved to ScyllaDB they also adopted a new strategy for partitioning messages in a channel that keeps partition sizes reasonable so compactions don't take a super long time.

jhgg · 2024-09-29T16:41:42 1727628102

We did not change schema or partitioning strategy.

sroussey · 2024-09-29T18:12:31 1727633551

Good default configurations can mean quite a lot if people don’t tune them.

roenxi · 2024-09-29T04:46:47 1727585207

I don't see anything here that looks untoward. They increased their data storage by 3 orders of magnitude and decided to use a different DB system. Fair enough, maybe they've learned more about the nature of their data.

But that logic isn't sound. When dealing with huge amounts of data there are going to be trade-offs. Picking a system that makes different trade-offs to an existing system is not automatically helpful. Yes you don't have the old problems. However, you are about to discover new problems. There is always something of a gamble around which will be more of a problem to your business.

frr149 · 2024-09-30T08:44:25 1727685865

What's the problem with Scylla? Honest question, BTW

vips7L · 2024-09-29T15:13:47 1727622827

> having just switched over from CMS (!)

This is really interesting. CMS was removed in Java 14 after being replaced by G1GC in Java 9. They were probably running an antiquated Java 8 or 11 runtime. So that means that in 2022 they were either running a 4 year old Java 11 runtime or an 8 year old Java 8 runtime. They were really leaving a lot of performance on the table.

gorset · 2024-09-29T16:59:57 1727629197

They could also have gone the commercial route and gotten Zing with their pauseless GC. It’s been around forever and they even cover Cassandra in their marketing.

https://www.azul.com/technologies/cassandra/

pebal · 2024-10-07T19:54:21 1728330861

This is not pauseless GC.

leetrout · 2024-09-28T23:02:41 1727564561

Needs (2023)

That services layer reminds be of a big, fancy, distributed Varnish Cache... they don't mention caching and they chose the word coalesce so I assume it doesn't do much actual caching. But made me think of Varnish's "grace mode" and it's use to prevent the thundering herd problem (which is where I first heard of 'request coalescing') https://varnish-cache.org/docs/6.1/users-guide/vcl-grace.htm...

Also love to see consistent hashing come up again and again. It's a great piece of duct tape that has proven useful in many similar situations. If you know where something should be then you know where everything is gonna come look for it!

loloquwowndueo · 2024-09-28T23:07:29 1727564849

Coalescing and “origin shielding” tend to be more common terms for that - I’ve never heard of “grace” until today :)

atombender · 2024-09-30T20:50:24 1727729424

Varnish does call it coalescing. Grace is used for a specific situation: When a previously cached object has expired, Varnish won't evict it from the cache immediately, but will continue to serve the old content, while sending exactly 1 request to the background to refetch. How long an object can live after expiring is called the grace. The HTTP standard calls this behaviour "stale-while-revalidate".

mnutt · 2024-09-29T14:39:31 1727620771

Grace mode itself doesn’t prevent thundering herd; varnish coalesces all requests automatically and grace mode is used to increase the likelihood of clients receiving cached (albeit stale) responses.

hinkley · 2024-09-29T05:39:20 1727588360

Nginx always more businesslike.

    proxy_cache_use_stale updating;

dang · 2024-09-29T04:17:23 1727583443

Year added above. Thanks!

dorlaor · 2024-09-29T08:26:48 1727598408

Some additional nuggets by ScyllaDB co-founder: - Discord couldn't complete repair with Cassandra. Not the case with Scylla - Scylla has a lot in common with Cassandra, from a good reason, like the LSM tree, compaction etc. However, Scylla has a unique CPU&IO schedulers which allows us to prioritize the queries over compaction, and defer compaction to the half milisecond where we have enough idle bandwidth. We have plenty of articles about it - Scylla has a new (1.5 years) tombstone_gc=repair - a much safer mode - Scylla's new architecture of Raft and tablets was recently launched and is the next big thing for our users. Watch the cool youtube video of those tablet load balancing

aaptel · 2024-09-29T07:35:14 1727595314

This whole problem wouldn't exist if we used distributed chat protocols which have been around for over 40 years (IRC). With the added benefit of having an open specification and multiple implementations. No walled gardens.

And if you think IRC is too old for the modern world take a look at matrix or xmpp.

How did we let discord take over is a mystery to me, or rather a tragedy.

rollcat · 2024-09-29T08:13:36 1727597616

IRC does not store messages, it only relays them to clients. You need an add-on solution to store chat history, something we've been taking for granted for ~30 years.

IRC all but requires using a bouncer to follow a conversation from more than a single device.

IRC does not encrypt messages, only (optionally) the client<->server connection. Without E2EE, you have no privacy against the server/operator, which is an easily targeted SPOF.

Matrix (the protocol) is still in flux, and the implementations are lagging behind the spec. If you're not using Element, you're behind on features and security.

XMPP is (similarly to IRC) relying on optional protocol add-ons for basic things, like E2EE, which clients may or may not support fully or correctly.

I recommend reading these breakdowns by soatok: https://soatok.blog/2024/08/04/against-xmppomemo/ https://soatok.blog/2024/08/14/security-issues-in-matrixs-ol...

2013/Snowden happened 11 years ago. E2EE should by now be considered a basic feature, a commodity, something we should be calling for as relentlessly as we did for HTTPS. (Discord of course does not implement E2EE.)

grishka · 2024-09-29T11:44:09 1727610249

Truth is, E2EE isn't a "basic thing". It's an add-on feature that most people don't want. It is impossible to have E2EE that doesn't leak into the UX, and most people would rather have a streamlined UX than deal with key management. It is also much more complex to have robust E2EE in a group chat.

The thing that sets E2EE apart from HTTPS is that HTTPS requires nothing from the end user. It just works. And as a site owner, you just set it up once and forget about it.

rollcat · 2024-09-29T13:43:40 1727617420

> It is impossible to have E2EE that doesn't leak into the UX

True, but one is also free to study the UX solutions implemented on platforms such as iMessage, WhatsApp, and Signal, which all have strong E2EE and see plenty of mainstream usage.

> [...] HTTPS requires nothing from the end user.

Depends on how you define "nothing". We've collectively put an insane amount of work to bring HTTPS to where it is today. Also, HTTPS continues to rely heavily on each server operator's skills and diligence.

There's also plenty of edge cases where HTTPS clients need to go an extra mile, such as containers (many base images do not include a cacert bundle), IoT/retrocomputing/other underpowered devices, and so on. There's always a cost, but it's usually worth it.

grishka · 2024-09-29T15:41:51 1727624511

I should've said "true E2EE".

On iMessage, your keys are managed by Apple. You effectively fully trust them (which seems to be the assumption in most of Apple products anyway). I wouldn't call this a "real" E2EE implementation.

In WhatsApp, you're limited to one device logged into your account, and the rest are proxied through it. And message backups, those are annoying.

In Signal, you have all those stupid backups too, and while you're able to log into multiple devices (it seems), your past messages don't load "for your own security", and there's also this stupid time component so you get logged out on your computer if you haven't used the Signal desktop app for some weeks (which I don't).

Whereas on Discord, Telegram, Slack and other IM services without end-to-end encryption, you log in on a new device and that's it. You instantly get access to all your messages since the beginning of time, and stay logged in forever.

rollcat · 2024-09-29T20:10:42 1727640642

> On iMessage, your keys are managed by Apple. You effectively fully trust them (which seems to be the assumption in most of Apple products anyway).

I'd argue there are many scenarios in which this might be preferable to a lengthier/wider supply chain. Personally I'd sooner trust Apple than Microsoft+(Lenovo/HP/Dell/...)+(Intel/AMD/Qualcomm/Broadcom/...)+(every device with DMA (PCIe/TB), unless you trust your IOMMU)+(.../...)... (you get the point). And the alternatives to Microsoft are each its own kitchen sink.

> In Signal [...] your past messages don't load "for your own security" [...]

I agree that this is quite annoying. HTTPS clients resolved a somewhat similar problem (usage of self-signed certificates) by trusting the user to make an informed choice. I wish Signal would trust their user base to make their own choices there as well.

> Whereas on Discord, Telegram, Slack and other IM services without end-to-end encryption, you log in on a new device and that's it. You instantly get access to all your messages since the beginning of time, and stay logged in forever.

Same with iMessage. Whether this is a feature or a bug, depends on your threat model.

But we're in a situation where we don't even get to make an informed choice - every solution (as you pointed out) comes with its own bag of UX shortcomings. These trade-offs should be user choices, not something the vendor forces upon you. But these are not fundamental shortcomings of E2EE as a concept, but particular issues with its different implementations. WhatsApp shows you can restore messages from a backup; Signal shows you can have "real" multi-device presence; etc. If we could spend 1/100th of the effort we did to push HTTPS everywhere, E2EE could be just as ubiquitous today.

brobdingnagians · 2024-09-29T16:07:10 1727626030

Just spitballing, but couldn't you have a new device login as three fields, username, password, and encryption key? Then if you don't add the encryption key you don't get the history, but still access the account. Then if password managers really saved all three, then would simplify it for more people (at least those with password managers). But there still has to be a cultural shift for a lot of people to password managers asking non-tech people

iknowstuff · 2024-09-29T16:48:44 1727628524

I think whatsapp no longer proxies via a single device.

On iMessage, you can verify keys now.

NotPractical · 2024-10-01T03:15:11 1727752511

> On iMessage, your keys are managed by Apple. You effectively fully trust them

Not really? You can choose whether to upload your recovery key to iCloud or not. The software abstracts over the details of course, but Signal does that too. Unless you're arguing that it's impossible for closed source software to have "true E2EE", which may have some merit, but Discord is proprietary, and something is better than nothing.

saberience · 2024-09-29T14:12:55 1727619175

Yes but see the group size limits on iMessage which is 32!!

Effectively making it useless for so many people, the reason is due to e2e encryption.

In contrast, Telegram has groups with 1000s of participants, but only possible as they don’t use e2e encryption.

NotPractical · 2024-09-29T23:58:38 1727654318

iMessage.

AnonCoward42 · 2024-09-29T10:33:32 1727606012

> IRC does not encrypt messages, only (optionally) the client<->server connection. Without E2EE, you have no privacy against the server/operator, which is an easily targeted SPOF.

Same as Discord.

> Matrix (the protocol) is still in flux, and the implementations are lagging behind the spec. If you're not using Element, you're behind on features and security.

Discord also only has one reference client, but for me even with that client Matrix/Element was not as reliable. I still use and like it, but it's not a like for like in that regard.

> XMPP is (similarly to IRC) relying on optional protocol add-ons for basic things, like E2EE, which clients may or may not support fully or correctly.

But if you use current clients like Conversations or Dino or the likes it does work. There is no point in counting the clients that don't support it if these aren't the reference or biggest ones. The problem here is more that it's not meant to be used like Discord in any way. Not for big group chats/channels nor for big voice chats (not even sure this possible).

Zambyte · 2024-09-29T12:15:48 1727612148

> IRC does not encrypt messages, only (optionally) the client<->server connection. Without E2EE, you have no privacy against the server/operator, which is an easily targeted SPOF.

FWIW this point isn't relevant to the IRC vs Discord discussion, since Discord is also very not E2EE. That said, XMPP my preferred protocol that checks all of the boxes.

rollcat · 2024-09-29T13:24:09 1727616249

> [...] since Discord is also very not E2EE.

I have stated that at the end of my original comment. I'm not advocating for Discord (merely enumerating IRC's and XMPP's shortcomings), but I would like to point out once again, that post-2013 any solution that does not enable strong E2EE by default should not be advocated for - at all.

> That said, XMPP my preferred protocol that checks all of the boxes.

Read up soatok's breakdown on the design & status of OMEMO. I'm not a cryptographer, but I do trust a cryptographer when they say some protocol's design/crypto is broken.

vidarh · 2024-09-29T15:48:27 1727624907

Maybe for your your use. For my use, not a single thing that goes over discord are things I'd object to being posted on a public website. That includes DM's. Not having E2EE means something isn't a solution for actually private conversations, but a lot of conversations happens in setting that are not actually private in any sense.

collingreen · 2024-09-29T16:50:14 1727628614

I personally think I am unable to perfectly guess today what I will want/need to have private forever.

This is one of the tenets underpinning my thoughts about why privacy matters.

multjoy · 2024-09-29T19:11:30 1727637090

But Discord & IRC aren't generally private spaces. They're no different to web forums in that you would reasonably expect that something you write today would be accessible without reference to you in 10 years hence.

That's a very different proposition to a private/group message exchange in WhatsApp/iMessage etc.

vidarh · 2024-09-30T20:13:48 1727727228

It's really quite simple: Would I be happy to discuss it in a public space where people might record?

I wouldn't plan a controversial political movement in public, or on Discord. I would discuss videos game programming either place.

crtasm · 2024-09-29T14:32:13 1727620333

Nothing stopping a server also acting as a bouncer and storing messages: https://ergo.chat/about

timeon · 2024-09-29T17:47:42 1727632062

> IRC does not encrypt messages

Wasn't SILC later used for this instead of IRC?

voidnap · 2024-09-29T09:28:42 1727602122

> IRC does not store messages, it only relays them to clients.

Some people consider this a feature and prefer using IRC bouncers to discord.

OMEMO solved encryption for XMPP a decade ago. I haven't seen it on IRC yet though.

brysonreece · 2024-09-29T09:38:05 1727602685

Some (most) people want to easily talk to their friends or interest groups without having to worry about it.

voidnap · 2024-09-29T21:20:09 1727644809

I get that, I wasn't passing judgement. You guys must be super sensitive to be downvoting me for just sharing another point of view.

Personally, I find xmpp and IRC to be easier ways to talk to friends and interest groups when they use those networks. The software is simpler, faster, and a better experience for me.

Matrix is a bit of an exception where it's slow and buggy and barely hanging on.

But me and my friends don't care about discord stickers or nitro or giphy links or the discord store or any of that kinda stuff that you go to discord to use. And thats fine if you do.

People can want and enjoy different things and also "want to easily talk to their friends or interest groups without having to worry about it."

dakom · 2024-09-29T10:39:46 1727606386

I do consider it a feature, in hindsight. Learning to program by asking "dumb" questions was great, because chats were ephemeral, nobody cared if the same question was asked for the 10 millionth time or risk of embarrassment being like 12 years old and asking greybeards for help.

Nobody also felt bad saying "RTFM" because, whatever, it blows over in a minute, there's no permanent record of having a harsh moment, more free to just move on.

The same old questions being asked due to no search also provided more opportunities to answer those questions, so, newbies could start to learn by teaching.

So, yeah, I think something beneficial was lost, even if I wouldn't go back to that approach- it's more of a tradeoff than a definitive improvement

znpy · 2024-09-29T10:44:28 1727606668

> I do consider it a feature, in hindsight. Learning to program by asking "dumb" questions was great, because chats were ephemeral, nobody cared if the same question was asked for the 10 millionth time or risk of embarrassment being like 12 years old and asking greybeards for help.

I pity the new generations for not having this kind of opportunity: the opportunity to make mistakes, say dumb stuff and goof off with all these things vanishing in a matter of minutes, hours at most.

I miss the old internet: at any point you could pick a new nickname and get a fresh and clean new email address from many of the webmail providers and just start a new online life.

And it was considered normal. It was actually a "best practice" to never use nicknames.

I miss the old internet.

MichaelZuo · 2024-09-29T15:27:05 1727623625

This approach simply doesn’t work when users are allowed to vote or have any sort of scoring mechanism. Since bad actors will also create multiple “online lives” and manipulate those systems with a few clicks

sham1 · 2024-09-29T14:02:51 1727618571

Remember when phrases like "Never use your real name online" used to be near universal? Yeah, this is something I also miss about the old Internet.

Like, even back then you could absolutely tie your IRL identity up with your online identity, but the difference of course was that it wasn't a requirement of existing online, like it is now. Like yeah, you can stay anonymous but a) it's super difficult since the modern day assumption is that you're not doing that and b) that you're up to no good, because why would you be hiding who you are, unless you were doing something shady. And now even "normal" people lament just where we went wrong and what happened to online privacy. To the aware, privacy dying like this was clear as day, but I suppose most just didn't hear, or chose to ignore, the alarm bells.

And now everything is logged, analysed, and associates with the people who produced the messages and other sundry content. There is no ephemera, we need laws just to be forgotten by services (as an EU citizen, I'm glad about law existing here, but it shouldn't need to be a law, it ideally should be assumed), and we're constantly getting watched by both states and surveillance capitalists alike. Not actively in most cases, mind you, but passively, with our movements, our interactions online, and just what we do, just getting aggregated into these humongous data sets of Big Data, to train statistical models on. Mostly to surveil us even harder, or to manipulate us in the form of advertisement, which can be even more insidious in some ways.

I'm sure that stuff like the Cambridge Analytica fiasco could have occurred even without this destruction of privacy, anonymity, and ephemeral content, but I posit that it would have been way more difficult had people not been encouraged to put everything about themselves into services that would log them and build evermore complex models about them and their thoughts. And now this kind of stuff can be used to destroy democracies, and as alluded to earlier, manipulate for example our spending habits. And now we all wonder just where this all went wrong.

I miss the old Internet.

Ecoste · 2024-09-29T12:49:52 1727614192

> How did we let discord take over is a mystery to me, or rather a tragedy.

The fact that you're baffled why discord took over is exactly why it took over. You can't even acknowledge that the user experience is 10x better and it's suitable for a general non-technical audience.

mystified5016 · 2024-09-29T21:33:30 1727645610

New quest available! Buy nitro for stickers! Buy nitro as a gift! New quest available! New quest available! Restart to update. New quest available! Look at the new emojis you could use with nitro! New update available! New update available again! Third update today! New quest available! Look at these profile decorations you could use with nitro! Boost this server! *NEW QUEST AVAILABLE*

dewey · 2024-09-29T07:57:30 1727596650

I’m a huge IRC fan and I dislike Discord, but all these other services are way too clunky and IRC is really only usable through IRCCloud that has a relatively okay mobile app these days.

Recently a very technical group I’m part of migrated from Telegram to Matrix and the user experience is just not very good. The apps are buggy, don’t look good, then in the new “Element” app SSO isn’t supported so I can’t use my account with it. There’s lots of paper cuts that are okay for someone like me who likes to figure it out but I’d never try to convince my friends to use it.

nunobrito · 2024-09-29T11:37:54 1727609874

For telegram refugees then maybe SimpleX is an option, except it has no bots nor other options for clients at the moment.

What I personally use is the nostr protocol through a client like Amethyst or OxChat. Messages and groups can be E2EE private, or you can just use the public groups.

The biggest advantage is that you are joining a bigger community of apps and services built on top of the same protocol, rather than joining some isolated island (again).

dewey · 2024-09-29T11:43:34 1727610214

I recently listed to a nostr podcast and even people working in it said it would not be reasonable to recommend it for a secure messaging app at this point. Just because very early things like metadata leaking are not addressed yet. So not really an alternative.

nunobrito · 2024-09-29T13:04:03 1727615043

I don't know what podcast you are mentioning or the context. Anyone can say anything on youtube.

We are talking about a transition from telegram, when comparing to that platform then NOSTR is undoubtely more secure when noticing that telegram doesn't even encrypt conversations by default and this isn't informed to users. Whereas in NOSTR you are made aware when a conversation is private between both parties.

Metadata is fetchable for 99% of messaging apps out there. If you'd ask me about making a more secure app then this involves continuous streaming of data, padding of messages to avoid content guessing and avoid the usage of internet as data channel.

So it really depends on what you consider secure and what it is compared against. Compared to Telegram it is more secure. Compared to a piece of paper encrypted with a custom algorithm and delivered by a trusted human transporter? Not really.

high_na_euv · 2024-09-29T09:36:12 1727602572

>How did we let discord take over is a mystery to me, or rather a tragedy.

Orders of magnitude better product than anything competition had at the time?

doublerabbit · 2024-09-29T16:22:19 1727626939

> Orders of magnitude better product than anything competition had at the time?

Nah, it just comes down to non-techy folk wanting to play/chat with their friends in a just-work configuration.

Mumble, TeamSpeak were always janky, needed a hosted server. IRC is multiplayer notepad.

Geeks care about E2E, and all that glory but these folks don't. And that's what Discord dishes; as did Y!M, MSN, ICQ, AIM back in the day.

All discord has done is replaced those above as GitHub has replaced SourceForge.

We didn't care if the message were encrypted or not back then. Why do we now?

StableAlkyne · 2024-09-29T17:35:59 1727631359

> Geeks care about E2E

*Some* geeks. Specifically those who are into encryption.

There is nothing wrong with wanting an application to just work, especially when it's significantly better than what came before (contemporary competitors were Skype and IRC)

pphysch · 2024-09-29T18:05:30 1727633130

You're just describing why Discord was a much better product.

high_na_euv · 2024-09-30T09:19:00 1727687940

I used all

Ventrilo mumble ts3 skype

Discord was way better and had more features and was more safe than self hosted alternatives

Krasnol · 2024-09-29T07:42:15 1727595735

Usability did it.

You download an exe, install it, make an account and it runs. Just like that. Everybody can do it.

There are tons of useful and great software out there. Most of it is not easy for the public. Some (most?) of it doesn't even have an GUI. People rather sell their identity and even pay than suffer through too many hops.

Intralexical · 2024-09-29T10:59:21 1727607561

Not even a EXE. The web version is feature-complete, so you only need to click a link.

Krasnol · 2024-09-29T11:21:08 1727608868

You're right. I forgot about that.

I also forgot all those people who came from the TeamSpeak servers.

throw16180339 · 2024-09-29T16:01:40 1727625700

> How did we let discord take over is a mystery to me, or rather a tragedy.

Anyone can set up or join a Discord server. If you give users the choice between a complex open platform and an easy proprietary solution, they will pick the latter every time.

maccard · 2024-09-29T07:53:09 1727596389

If you want to know why, look at the App Store reviews for discord and tea speak and compare them.

Discord just works.

tannhaeuser · 2024-09-29T09:19:08 1727601548

There’s no lack of open chat protocols and federated services but those have mostly torpedoed themselves: by usability and discoverability problems, holier–than–you attitudes, and plain nerd attention wars. Such as XMPP (used a lot until around 2010 but easily dragged into the mud because XML and overengineering), Mastodon (saw a surge as twitter was faltering but then seemingly stopped to be everyone‘s darling as its limitations became obvious, among them Mastodon admins taking their audience hostage; also ActivityPub fans going around advertising it for each and everything when RSS is just fine for web sites, damaging news feeds alltogether in the process).

Where spamming, or the systematic exploitation of digital communication by the „ad industry“, was killing it in the past (Usenet, and arguably the web), today there‘s also the problem of being consumed by LLMs to push non-public messaging. Though I‘m not sure the latter is really a concern for many, as developers not only are giving away their code, but their entire activity log/issues and their solutions on github such that they can easily be digested and replaced by coding assistant LLMs, git being a distributed system in the first place.

Terr_ · 2024-09-29T09:33:46 1727602426

> among them Mastodon admins taking their audience hostage

I was excited first hearing all the "fediverse" stuff, but having to hand over control of your online identity to a particular node forever felt a little bit like "old boss, same as the new boss."

(Yes, I know some folks are working on the identity issue.)

nunobrito · 2024-09-29T12:25:21 1727612721

Reminds when I joined the largest mastodon server for my country. Advertised by the owner as a bastion for free speech, democracy and fair treatment. Then in 2020 started mass banning everyone "that went against science" on the covid fraudemia at our country.

Twitter on those days was bad, but that mastodon server sure became even worser. Nowadays found a fresh air of innovation with Nostr. No more servers with your data and followers locked inside.

You can silence the people you don't want to hear, you won't hostage them into forced silence any longer.

paulryanrogers · 2024-09-29T14:28:09 1727620089

Mastodon means you can at least pick your boss, be your own boss, and take your identity and followers to a new boss. (Possibly even taking your content too, though maybe not links)

MichaelZuo · 2024-09-29T15:33:07 1727623987

Picking a ‘boss’ in a system where the average ‘employee’ has no credible way of assessing or evaluating them, or their superiors, and zero prospects of ever getting a face to face meeting with, is effectively no different to having the boss picked by an anonymous shareholder meeting in SF.

If all of the potential bosses have roughly the same degree of accessibility… which is the case for Mastodon for anything over a few hundred users.

ThrowawayTestr · 2024-09-29T16:30:39 1727627439

What's stopping you from messaging server owners or stalking their profile to see they're ideologically compatible?

throw16180339 · 2024-09-30T13:03:27 1727701407

That's a lot more effort than using Discord and getting on with my life.

paulryanrogers · 2024-09-29T16:36:41 1727627801

Compared to closed gardens like Discord and Xitter, Mastodon is a significant improvement.

MichaelZuo · 2024-10-01T00:08:15 1727741295

But not in terms of the ‘choosing a boss’ aspect for the median user.

StableAlkyne · 2024-09-29T17:53:38 1727632418

Did they ever address the problem of migration from a bad server?

For example, a scenario where your server dies and does not return. Or a malicious actor takes over and bans the user base. Or a honeypot encouraging user account migration, followed by bans.

In all 3 cases, you are effectively screwed the moment you migrate to a malicious server, or your server becomes malicious.

I remember blue sky trying to address this by tying your identity to a DNS record or something, but it's a severe limitation in anything trying to be decentralized

elcomet · 2024-09-29T08:09:29 1727597369

IRC and distributed protocols un general had a big issue : you loose history every time you disconnect

menaerus · 2024-09-29T08:48:58 1727599738

In the age we are living this starts to sound more like a feature to me.

MatthiasPortzel · 2024-09-29T14:03:17 1727618597

The other reply goes to airplanes but there are much more common ways to get disconnected. Locking my phone or closing my laptop lid disconnects me from IRC. A lot of Discord users have desktops that are always on (since Discord originally advertised to gamers), but a lot of Discord users don’t.

Discord is fundamentally a very versatile platform. If you lose one seemingly unimportant, you lose a lot of versatility. Maybe I’ll write a blog post just with examples of how I’ve used it. It replaces IRC, but it also replaces Facebook groups, Skype, a lot of group texts, and a lot of email for me.

agumonkey · 2024-09-29T14:19:48 1727619588

It does alter the meaning of chat tremendously. In discord, often things become heavy, because we're not talking, we're accumulating information, and you have to stay on purpose so data is manageable and seekable.

The few times I join IRC I know we're only here to chat, it's semi-transient (a little bit more if logs are stored) and I feel lighter.

rtpg · 2024-09-29T10:32:37 1727605957

Is it really that much of a jump to say "I would like to see the chat that has happened between my friends between the time I got on a plane and then got back off"? Does that sound odd?

Imagine if you couldn't receive e-mail while you were offline!

This isn't to disparage IRC and friends too much, obviously there's huge value in it existing as a synchronous chat room. Just... async chat is a thing that totally happens for most people.

serf · 2024-09-29T12:22:16 1727612536

a non-technical person wouldn't consider the implications of a history log with regards to security or data hoarding, they just see it work and think of it as a convenience.

this value sell shifts in the mind of the non-technical person once they're told that the feature they want implies non-ephemeral data that will be systematically sifted through either for legal or financial benefit by a third party.

in other words : the reason why 'async chat is a thing that totally happens for most people.' is because a vast majority of people are simply unqualified to even see the problem, much less seek alternatives or solutions to the data hoarding that they must comply with.

this creates a social effect and pulls everyone into Discord, regardless of their beliefs on the matter, simply because it has become 'the only game in town'.

regardless of personal preference, centralization of these kind of things is BAD for the user in nearly all circumstances aside from convenience.

Shog9 · 2024-09-29T17:21:35 1727630495

Please stop pretending that "data hording" didn't / doesn't happen on IRC. There's nothing inherently friendly to security or privacy in the protocol; if anything, it's quite the opposite.

That you can, with augmentation and diligent op-sec, get something a bit better than Discord isn't a great selling point unless you have the time and resources and buy-in already, not just for yourself but from everyone in your group. At which point, there are still better options than IRC.

For decades now, the main draw of IRC has remained a fetish for conspicuous configuration, as it embodies a sort of brutalist architecture of communication software. The excuses change every few years, but the love for cobbling together a barely workable system from parts remains core.

menaerus · 2024-09-29T11:05:46 1727607946

Sure, the advantages of async communication are obvious but the crucial difference is that in that case vendor has to store your data somewhere in the data center. Reusing that data for unsolicited purposes is what many people will have a concern with.

indeyets · 2024-09-29T13:23:40 1727616220

But logs are stored on IRC as well. It’s not a part of standard protocol, but a lot of ir c-servers can do that automatically and there are boys which do that not to mention personal archives. The difference is that end-users don’t have easy access to this logs. And on discord they do (because it is a part of protocol)

cmiller1 · 2024-09-29T12:27:36 1727612856

How about a secure async chat where the vendor simply stores a list of message IDs, and then the client requests if anyone has a copy of any message you haven't received yet from the other users in chat when you log on

menaerus · 2024-09-29T14:26:09 1727619969

Such vendor would have a hard time finding a business model since plenty of chat-services are already existing on the market and all of them have access to the data of their users in one way or another. Thus I don't know what other type of leverage they would be able to pull off to sustain their business.

StableAlkyne · 2024-09-29T18:00:55 1727632855

You and your friends lost history, but the server owner never did :)

Intralexical · 2024-09-29T10:51:11 1727607071

> How did we let discord take over is a mystery to me, or rather a tragedy.

I think I'm reasonably technically competent, and I also dislike Discord's issues with privacy, data sovereignty, siloing information away from the open web, etc.

But you know what I think whenever I click a Matrix link, or IRC? I just don't want to deal with it. You get a list of apps you've never heard of, some of which may not be feature-complete, some with more than one version, some which are advertised using words like "GNOME", "Rust", "Qt5", and "C++" that have no meaning or relation to actually using them as a chat app, and all of which I guess are different and would need to be tried and learned separately. Then picking and clicking one tries to open an outside program which probably isn't installed and I don't want to install because I don't really know/care what it is. And if at that point, out of the dozen or so app options it showed you, you happened to choose one with a web version like Element, and you figure out you can click the "Continue in your browser" button out of the four or five unexplained buttons that pop up as a result ("XDG-Open", "Cancel", "FlatHub", "Download", and "Continue in Browser")— You get a static screen that shows just enough message history to not be useful, with a confusing UI you can't seem to interact with, hidden behind a login wall that still hasn't really explained what in the Internet tubes you're actually looking at.

E.G.: https://matrix.to/#/#invidious:matrix.org

If you try to Google "What is Matrix"— You get pages about math. So then you Google "What is Matrix chat". And all the results harp on using words like "open network", "decentralised", "protocol", "real-time communication", "open standard", "federated"— Which, again, may be technically interesting if you're into that, but doesn't actually have anything to do with how it directly serves the user as a chat app and how you can use it or sign up for it.

It takes way too many clicks, and you get bombarded with way too much information… To still not end up using the app, and in fact end up more confused than before about what a "Matrix" even is. Let's say you lose 15% of incoming users at each step. That rapidly scares off most of the mainstream, before they've even tried it. Maybe Matrix and Element are great. But it just seems like such an ordeal.

Compare that with Discord. You click a link. And then either you're already in the server, or it has a single text box and a single button you click to funnel you through making an account and joining the server.

It doesn't try to convince you to install a Desktop app until you're already fully using it in the web version. You get clear answers and reasons to use it if you search "What is Discord" or go to the website. It doesn't overwhelm you with options and then hound you with technical explainers that you didn't ask for.

IRC goes the other way in usability. People want voice chat, message history, different channels in the same "server", PM channels, etc.

/rant

weaksauce · 2024-09-29T17:43:33 1727631813

because the voice chat function is so leaps and bounds better than anything out there and it was primarily used for that to game in real time. the text was an afterthought for gamers.

EGreg · 2024-09-29T09:42:26 1727602946

I keep writing about this tragedy, but few people care. Even on HN:

https://cointelegraph.com/news/how-a-web-that-lost-its-way-c...

and

https://community.qbix.com/t/the-debate-about-end-to-end-enc...

philipwhiuk · 2024-09-29T13:12:13 1727615533

> Own this piece of crypto history

I would argue that the web lost it's way as much with "web3" as with the platforms of web 2.

EGreg · 2024-09-29T17:26:29 1727630789

I didn’t write that.

You must be quoting an ad, and dismissing everything else

RadiozRadioz · 2024-09-29T17:24:04 1727630644

There are loads of comments exactly like OP's, and they always make the mistake of mentioning IRC alongside XMPP and Matrix. Inevitably repliers can't help themselves and spend their replies discussing IRC's unsuitability for modern IM and how it's not federated. When IRC is mentioned, commenters ignore XMPP and Matrix and attack the point in terms of IRC. (Though this thread in particular is better than average).

Matrix and XMPP are the far more appropriate competitors for Discord, we need to steer the conversation toward them. I deliberately never mention IRC when I make these types of comments so people don't latch onto it and ignore everything else I said.

lofaszvanitt · 2024-09-29T08:08:40 1727597320

Discord wrapped irc in shiny paper.

urza · 2024-09-30T11:19:46 1727695186

100% !! It's so sad :(

jimkoen · 2024-09-28T23:15:25 1727565325

My takeaway from this is maybe somewhat different from what the authors intended:

> The last one? Our friend, cassandra-messages. [...] To start with, it’s a big cluster. With trillions of messages and nearly 200 nodes, any migration was going to be an involved effort.

To me, that's a surprisingly small amount of nodes for message storage, given the size of discord. I had honestly expected a much more intricate architecture, engineered towards quick scalability, involving a lot more moving parts. I'm sure the complexity is higher than stated in the article, but it makes me wonder, given that I've been partially responsible for more than 200 physical nodes that did less, how much of modern cloud architecture is over engineered.

romanhn · 2024-09-28T23:45:38 1727567138

They are talking about 177 database nodes, which is not an indicator of architecture complexity. I assume they have dozens/hundreds of services consisting of multiple highly available nodes each across various geographies.

Having seen a much smaller set of Cassandra nodes used to store billions (rather than trillions) of records, I can say that Cassandra was definitely a total PITA for on-call, and a cause of several major outages.

nicholasjarnold · 2024-09-28T23:34:59 1727566499

> ...how much of modern cloud architecture is over engineered.

I would wager a good majority of it is. The Stack Overflow architecture[0] sticks out to me in this regard as an example on the other end of the spectrum.

[0] https://news.ycombinator.com/item?id=34950843

hiyer · 2024-09-29T02:17:21 1727576241

Also bear in mind that they're now doing the same with just 72 nodes.

hiyer · 2024-09-29T02:16:12 1727576172

Very well-written article. I'm happy for them that part of the solution was switching from Cassandra to drop-in replacement Scylla, rather than having to deal with something entirely different.

dean2432 · 2024-09-29T08:23:37 1727598217

They make it literally impossible to delete your old messages. It's a privacy nightmare and I wonder why the EU hasn't stepped in.

Intralexical · 2024-09-29T13:29:36 1727616576

I do think there is a balance to be struck, because directed communication means the recipients of old messages are also stakeholders, such that maintaining a consistent record by default is a fundamental part of the "service" they offer. The message contents are different from e.g. secretly hoovering up click patterns. Matrix had some thoughts when they faced the same questions:

  The key question boils down to whether Matrix should be considered more like email (where people would be horrified if senders could erase their messages from your mail spool), or should it be considered more like Facebook (where people would be horrified if their posts were visible anywhere after they avail themselves of their right to erasure).

  Solving this requires making a judgement call, which we've approached from two directions: firstly, considering what the spirit of the GDPR is actually trying to achieve…

https://matrix.org/blog/2018/05/08/gdpr-compliance-in-matrix...

Xen9 · 2024-09-29T10:13:40 1727604820

In Discord culture, indeed, users usually share a shit-ton of PII in "introduction" messages from images to specific hobbies to medical information (EG "support" communities).

The problem from GDPR perspective is that Discoed makes it impossible to delete those, since once thet detect your interest in trying to delete any of your accounts' data, they will try to get to "anonymisize" it. Then at least publicly your username isdisconnected from thos messages, but they can still be traced back to specific persons. Now if this also is done server side, then they would be in a situation where you'd either have to go through ton of messages or to bulk delete past messages of all to enforce the GDPR demands of an user wanting their PII deleted.

EU Parliament is not a real Parliament in the sense that ONLY the Comission can propose new laws, and the elected parliament basically just votes on those. Who controls the Comission if not the people? The US State Department. Newsguard and non-Musk US bigtechs including Discord are in the same poli-financial bed of the establishment here. And they are full of previous state department workers.*

Unless there is public outrage, the EU-level bodies at least will probably be owned. But Public opinion is controlled by the cyberpunk establishment that trains their LLMs & targets their campaign ads using that illegal Discord data to get political advantage.

You in my view ought to "worry" about the fact that it's possible there will sooner or later no longer be escape from a permanent establishment, Orwell-style. Goes along with the theme that "cybersecurity" is the United States government level has been "war against hate speech" for years, and of course "hate speech" meaning "censorship of internal and external enemy speech."

Budd Dwyers if I recall correctly shot himself in TV after writing to Biden (???) that under some conditions (that became true), the Department of Justice should have "Justice" removed from its name.

---

Most of this I hold only at 50+% confidence of being broadly correct. Take with lots of salt.

r3d0c · 2024-09-29T13:30:29 1727616629

incoherant babbling

Xen9 · 2024-09-30T02:50:37 1727664637

Reasons as to why I should believe that the comment or parts of it were "incoherant babbling"?

I did express a low condifence.

My information is limited. You ought to expect to feel my points being "incohetent babbling"

intelVISA · 2024-09-29T14:53:29 1727621609

Given the sheer size and extent of the user data collected and processed one imagines the EU is working on a big case... quietly.

robmccoll · 2024-09-29T11:43:18 1727610198

Cassandra is essentially an append-mostly distributed fault-tolerant hash table. If you need specifically that with high write throughput, it's a good choice. I don't understand why people use it as a database. You run into it's limitations immediately and the pain of trying to use it like a database only gets worse with scale.

LeifCarrotson · 2024-09-29T15:19:46 1727623186

FTA:

> In Cassandra, reads are more expensive than writes.

This makes it insane as a message store for a chat server to me. It seems appropriate for a logging destination for a distributed system, one where you want lots of clients to dump data but most of the time you don't even need to audit the logs, so the number of reads for a given item is less than one. This is obviously not true for Discord messages.

atombender · 2024-09-30T21:01:34 1727730094

The sentence makes it sounds like Cassandra and Scylla are slow for writes, which isn't the case at all. It's just that writes require a bit less I/O. Reads are still very fast. If reads were slow, nobody would use Cassandra and Scylla for the purposes that they're being used for.

menaerus · 2024-10-01T07:32:59 1727767979

Actually read performance is one of the main challenges in LSM based storages.

Squeeeez · 2024-09-29T15:48:04 1727624884

Not too sure - I would have guessed that most of the messages are written once, read by the constant number of participants (say 1-100 or so) and then they disappear off the screen and are never accessed again, ever. Maybe a few people will scroll or search, or use some custom extension to load and export the history, but very rarely.

mianos · 2024-09-29T12:41:56 1727613716

All the Casandra documentation and web site say it is a database. You can't blame anyone from getting confused. In my experience, I have never seen a project that started to use it, continue to use it after a year or so it may take a year to run into its limitations before having to replace it, with a database, like Postgres.

PaulHoule · 2024-09-29T13:51:00 1727617860

How is they just can’t shard the thing? Isn’t each Discord ‘server’ isolated from the others (can’t send a message from one to the other?) Why can’t they address trillions of messages by having thousands of shards that each handle billions?

DylanSp · 2024-10-01T13:54:32 1727790872

The partition key included the channel ID, and they were still having problems with hot partitions even with that fine-grained sharing.

hun3 · 2024-09-29T14:00:51 1727618451

Last time I checked the Discord bot API, it had explicit provisions for sharding.

codexon · 2024-09-28T23:11:26 1727565086

> The ScyllaDB team prioritized improvements and implemented performant reverse queries, removing the last database blocker in our migration plan.

I wonder how much they paid ScyllaDB to do this before even using ScyllaDB.

jsnell · 2024-09-28T23:17:33 1727565453

The article says they were using ScyllaDB for everything except the message store two years before they did the migration for messages.

molszanski · 2024-10-08T14:36:31 1728398191

> In an afternoon, we extended our data service library to perform large-scale data migrations. It reads token ranges from a database, checkpoints them locally via SQLite, and then firehoses them into ScyllaDB. We hook up our new and improved migrator and get a new estimate: nine days!

How many machines this migrator was running on? One? :D Sounds absurdly amazing!

tcfhgj · 2024-09-28T23:16:34 1727565394

Storing is one thing. Performing data mining on them is another

philipwhiuk · 2024-09-29T13:13:30 1727615610

That's a separate problem with hugely different latency concerns, likely done on a separate copy.

CamperBob2 · 2024-09-28T23:20:26 1727565626

Also, people need to keep in mind that those trillions of messages are archived nowhere. Thanks to the walled gardens we're obsessed with building, far-future anthropologists will know more about Pompeii and Machu Picchu than San Francisco.

squigz · 2024-09-29T06:15:45 1727590545

Firstly, no they won't. That's silly.

Secondly, how would such an archive work? Who would pay for it? How would it be safeguarded in such a way that it can be read by 'far future anthropologists' but not the people paying for the storage?

geysersam · 2024-09-29T11:39:20 1727609960

If we're only talking about public chat rooms, it shouldn't be difficult to archive the content of those.

There are open repositories of the entire internet text content (common crawl). These scrapes are periodically repeated. That's orders of magnitude more data than all discord messages ever.

So technically it's not a problem making such an archive. The financing is of course always an issue, but not because the costs are large.

xboxnolifes · 2024-09-28T23:48:35 1727567315

I don't think every single individual message ever needs to be archived. Every text, every email, every post-it, every poke, every emoji, every reaction GIF...

ktosobcy · 2024-09-29T06:18:20 1727590700

Well, considerting annoying push for "let's resolve the issue on discord" it's very annoying. With things like github issues you can search for a problem and find a solution. Even ancient mailing lists most of the time have archives. Not so much with all those fancy "realtime" :/

klabb3 · 2024-09-29T10:00:00 1727604000

I agree with the sentiment but GitHub issues is not a good replacement. First, it’s also owned by a corporation and is available on the open web today because they let us (is it even scrape/api available today? Can people build tooling on top?). Anyway, this “openness” can easily be changed once the “value extraction knob” is turned.

Secondly, GitHub is a developer platform, not a user/enjoyer platform. Issue reports are high-barrier even for devs. People get upset if you’re asking a random question, don’t check for duplicates, etc. Some people even get upset about issues without a PR.

Again, I’m all for good open alternatives but when HN is like “you just configure Gentoo and type 30 commands” we don’t stand a chance to actually win users over, gotta accept reality before we can improve it…

ktosobcy · 2024-09-30T16:30:28 1727713828

GH was only an example of something quite common and seachable. It could be codeberg.org or similar

famahar · 2024-09-28T23:59:14 1727567954

Definitely not everything, but it's still wild to me that so many products and services have all their troubleshooting and customer support in a discord server.

proteal · 2024-09-29T06:28:30 1727591310

It makes sense to me. The number of people who actually create useful open source software is so vanishingly small compared to the number of people who use OSS, it seems obvious that we should optimize for their time, not the other way around. I agree with you that using mailing lists or GitHub issues or whatnot would be globally more efficient, but if I’m working on a product, I’m going to work in the way that is most efficient for my time. I owe my “customers” nothing because they are not paying for my work. We keep seeing discord as a means to communicate about products because devs see it as the best use of their time. The fact that so many people use it should be an indictment on the alternatives, not the devs who choose to use discord.

foobazgt · 2024-09-29T00:13:04 1727568784

Sadly, I can understand why Discord doesn't have a lot of incentive to do this. Maybe the community should popularize an open-source free/low-costing bot and hosting solution for exported chat? (I couldn't find one in a few minutes of searching).

tbrockman · 2024-09-29T06:08:01 1727590081

Here ya go: https://github.com/AnswerOverflow/AnswerOverflow

ekianjo · 2024-09-29T04:44:36 1727585076

Even FOSS communities. shame on the devs who decide to do so.

Kiro · 2024-09-29T06:57:18 1727593038

It used to be IRC channels on Freenode and I didn't see anyone complaining back then.

CamperBob2 · 2024-09-29T14:54:33 1727621673

That's the thing. No one ever complains at the time.

squigz · 2024-09-29T06:12:55 1727590375

Why do you and GP think so many FOSS projects choose to use Discord like this?

daedrdev · 2024-09-29T05:27:51 1727587671

For many people the fact that discord is not easily discoverable is a benefit, just like in many other messaging services

dang · 2024-09-29T04:16:50 1727583410

Discussed (a bit) at the time:

How Discord Stores Trillions of Messages - https://news.ycombinator.com/item?id=35048410 - March 2023 (10 comments)

bofaGuy · 2024-09-29T13:07:26 1727615246

I’m lost at why a DB (Cassandra) with better write performance than read performance was ever selected for a messaging system. I feel like it’s obvious that a message will be read more than it is written (once).

remram · 2024-09-29T15:05:01 1727622301

The fact that it has better write speed than read speed doesn't mean that it has bad read speed. It just happens to have even better write speed.

It's like how I connect my phone to my home's cable connection to send a big file. It is better at downloading than uploading, but that doesn't mean it's not the best solution for uploading.

SpikeMeister · 2024-09-29T13:27:18 1727616438

While it’s true that messages are read more, reading can be cached so not every read necessarily results in a DB call.

axelthegerman · 2024-09-29T14:36:30 1727620590

Which seems something they added recently but was not part of the original design of using Cassandra

cynicalpeace · 2024-09-29T00:41:28 1727570488

Is there a fundamental reason you wouldn't use postgres for something like this? Scale certainly wouldn't be it.

ericvolp12 · 2024-09-29T04:46:03 1727585163

ScyllaDB scales horizontally on a shard-per-core architecture with a ballpark throughput of 12,500 Reads and 12,500 Writes per second per shard. If you're running Scylla across a total of 64 cores (maybe on 4 VMs with 16 vCPUs each), you can get up to 800k Reads 800k Writes per sec of throughput with P99 writes of <500us and p99 reads of <2ms.

You will not be able to get that performance out of Postgres and the write scaling will also be impossible on a non-sharded DB.

If you're a company like Discord and are running dozens (70-something?) of ScyllaDB nodes, likely each with 32 or 64 vCPUs, you've got capacity for 50M+ reads/writes per second across the cluster assuming your read/write workloads are evenly balanced across shards.

jhgg · 2024-09-29T06:42:53 1727592173

Fwiw the benchmarked numbers are for writing very small rows. When doing the messages migration, with no read traffic, and the cluster/compaction settings tuned for writes we only managed approx 3m inserts/sec while fully saturating the Scylla cluster.

ericvolp12 · 2024-09-29T23:52:53 1727653973

Interesting, we've got to 5M+ reads/sec in realistic simulated benchmarks and ~2M reads/sec of real-world-throughput on our clusters that are <10 nodes (though really high density). I don't think I've pushed writes beyond 1M QPS in real-world or simulated loads yet though. Thankfully our partitioning schemes are super well distributed though and our rows are very small (generally 1-5k) so I don't think we'd have a problem hitting some big numbers.

menaerus · 2024-09-29T14:06:53 1727618813

How about per-node memory pressure, did it change in favor of Scylla? I ask because I would legitimately expect that GC-based system would have a larger pressure on the memory subsystem.

jhgg · 2024-09-29T16:38:44 1727627924

Scylla just eats all the ram it can with cache. So it's hard to say really. On Cassandra we allocated half the ram to the JVM which it gladly used up and left the other half to the OS for disk cache. On Scylla, since it uses direct io, there is no need for OS disk cache.

riku_iki · 2024-09-29T17:18:47 1727630327

> You will not be able to get that performance out of Postgres

if writes are batched, I get this and higher performance from postgres. If 800k on 64 cores is Scylla's best result, it is not that impressive.

But also you probably mean writes/reads to indexed table, then it is another story.

ryanjshaw · 2024-09-29T06:22:35 1727590955

Okay but this is where I get confused. Why does Discord need a single database system when discord servers are independent, right?

And the volume of traffic per Discord server must be human-processable or what would the point be? A Discord server doing 800k writes per second makes no sense.

So why not a RDBMS per Discord server, and if you want to ship all that out to a warehouse for analytics you do that as a separate problem?

Or is it that spinning up a Postgres instance per Discord server ends up being significantly more expensive than these mega distributed database systems?

jhgg · 2024-09-29T06:45:41 1727592341

There are ballpark of a few hundred million discord servers... do you really want to run that many Postgres instances? And even so what do you do about DM/GDMs? Easier to just run one big mega cluster for messages.

ryanjshaw · 2024-09-29T06:59:08 1727593148

Okay so the latter then - economies of scale. Surprised to hear that few hundred million figure - I thought it'd be 1/10th of that at most! Wow.

Although I did expect there'd be a very long tail, and you might choose to host a bunch of servers on a single RDBMS, at that scale yeah it wouldn't solve much.

Thanks for coming back to me, appreciate it.

Drew_ · 2024-09-29T16:21:30 1727626890

Apple kind of does something like this with iCloud however their per user "databases" are only virtual:

https://news.ycombinator.com/item?id=39028672

justnoise · 2024-09-29T01:38:33 1727573913

I'd guess that Discord's storage systems lean towards processing a lot more writes than reads. Postgres and other databases that use B-tree indexing are ideally suited for read heavy workloads. LSM based databases like Cassandra/Scylla are designed for write intensive workloads and have very good horizontal scaling properties built into the system.

Aeolun · 2024-09-29T03:45:54 1727581554

Would you actually have more writes than reads? Are messages read by fewer people than post them?

sadeshmukh · 2024-09-29T04:06:18 1727582778

When you send a message, afaik it sends to all people looking at it at the time. So there is no read when in a conversation, and maybe the reads are batched when reading multiple.

jhgg · 2024-09-29T06:55:30 1727592930

Read traffic is much higher than write traffic due to mobile clients needing to sync chat history more often as their sessions are much shorter lived. Also search queries execute 1 query per result. And don't forget people doing GDPR data dump requests. It adds up.

cowthulhu · 2024-09-29T00:47:04 1727570824

I’m not sure if Postgres would have enough horizontal scaling to accommodate the insane volume of reads and writes. I would be super interested to be proven wrong though… anyone know of a cluster being run at that scale?

riku_iki · 2024-09-29T01:18:02 1727572682

> Scale certainly wouldn't be it.

vanilla postgres can't scale to such size, you need some sharding solution on top, which likely will be much harder to maintain than ScyllaDB..

pavel_lishin · 2024-09-28T23:01:28 1727564488

Anyone else reading this and being quite happy that they're not working at this scale?

wavemode · 2024-09-29T01:00:04 1727571604

I don't mind scale. I mind the bureaucracy and promotion-driven-development that comes with working in a bloated engineering org.

pm90 · 2024-09-29T01:05:30 1727571930

+100

Many companies have products that operate at “scale”. They manage to do so with pretty boring techniques (sharding, autoscaling) and technologies (postgres, cloud storage).

Because of the insane blog driven tech culture, many of these teams get questioned by clueless leadership (who read these blogs) and ask why the company isn’t using cassandra / some other hot technology. And it always causes much consternation and wastage.

rnts08 · 2024-09-29T09:22:11 1727601731

Anyone wanting to introduce $new/$other language, database, library, deployment system, build system into a large enough system that doesn't solve any actual problem is a nightmare for someone working at this scale.

I don't mind the scale, I like it. I don't like having to fend off questions and complaints why we aren't deploying the latest shiny new thing in our core this week.

secondcoming · 2024-09-29T16:21:39 1727626899

Well we use Cassandra (actually ScyllaDB) because Redis no longer cut it.

Twirrim · 2024-09-29T00:32:40 1727569960

But that's where the really fun and complicated problems are. The ones that really make you stop and think, and not just think, but be creative.

95% of the work is still the same "treading in well trod paths", same old same old tech work, but that 5% is really something.

Olreich · 2024-09-29T12:46:04 1727613964

This was a “double-pump” migration to a faster database and building a caching service. There’s nothing particularly fancy or creative about their solutions. The migration efforts and working out issues with the reverse table scan were probably way more creative, but they didn’t get into that unfortunately.

pavel_lishin · 2024-09-29T12:41:01 1727613661

I think I can understand the appeal, but it's just not there for me. I have enough complicated problems outside of work, some of which are even fun to solve.

twelve40 · 2024-09-29T12:39:40 1727613580

I'm happy I'm currently not working at this scale. I'm not happy when idiots (including one of our self-important ex-Google VP's) set this as a benchmark for backend interviews (for careers that 99% likely will never come close to such problems).

mystified5016 · 2024-09-28T23:06:28 1727564788

Any time I read anything about any web-adjacent technology I'm incredibly thankful that I don't work anywhere near that industry.

Embedded can be complex, but web stuff is just a Lovecraftian nightmare in comparison

milesvp · 2024-09-29T00:14:11 1727568851

I have stared into the abyss and seen the eyes of cthulu. I am much happier writing embedded drivers than I was trying to make sense of why previous devs thought it was a good idea to move bounded tunable server side api calls to the client, allowing it to effectively write arbitrary sql calls across multiple databases.

bdcravens · 2024-09-29T01:52:42 1727574762

Fortunately the web is starting (very slowly) to return to sanity, pushing back towards the simpler server-rendered pattern with Javascript being relegated to specific use cases.

Aeolun · 2024-09-29T03:00:31 1727578831

I really like the client rendered UI part. It’s a lot more efficient than sending the whole page again every time.

bdcravens · 2024-09-29T03:40:27 1727581227

Which is precisely what is meant by specific use cases. We don't have to throw out the first 25 years of the web and reimplement all of our business logic in a minified JS blob. Even when client side code is necessary, the trend of pushing rendered HTML rather than JSON that must be parsed and rendered keeps us as close to browser primitives as possible.

Aeolun · 2024-09-29T04:24:07 1727583847

Why would you implement the business logic there? You can still keep (most of) that in the backend.

The client just does orchestration.

bdcravens · 2024-09-29T15:44:31 1727624671

Once you move beyond basic CRUD business requirements work their way into the UI. For instance, making fields read-only based on access level. Adding additional form fields, etc. Conditionally hiding and showing entire portions of the UI. All of which requires you to either pass around UI-directives in your data or implement business logic in your client code. Better to just ship HTML, and if we're worried about full page loads, just use one of the many over-the-wire options to only change small bits of a page.

This is before we get into having to implement application primitives like authentication on the client, and all of the state management that goes with. The absolute amount of scaffolding and plumbing we've built up just to save a few ms is always worth questioning. Doesn't mean the answer is no, just that we need to ask the question and not assume the default is carved in stone.

gonzo41 · 2024-09-29T03:08:12 1727579292

But you can cache the whole server side page and the cost is once. Whereas if you have the client side do the render then every client wears the cost.

Aeolun · 2024-09-29T04:24:56 1727583896

That’s your generation that happens once. The browser still needs to render it. Sure, rendering it on the client may cost the client a bit more, but the client generally has the computational power to spare.

bdcravens · 2024-09-29T03:44:14 1727581454

Which becomes a far more important issue when dealing with bandwidth or CPU constrained devices, or artificially imposed constraints due to data usage costs.

asynchronous · 2024-09-29T03:38:04 1727581084

We can also cache some of the dynamic JavaScript, depending on the scenario but your point stands.

iknowstuff · 2024-09-29T03:17:29 1727579849

You usually can't because of users who are signed in needing slightly different pages etc.

bdcravens · 2024-09-29T03:33:31 1727580811

While not as fast as a purely client cached page, the server can selectively cache content, even when some bits of the page are dynamic.

qudat · 2024-09-29T00:21:03 1727569263

Iteration speed is significantly fast on the client. Perf is an afterthought — for better or worse

swyx · 2024-09-29T01:53:24 1727574804

spoken like someone who doesnt deploy clients at discord scale?

the 200 backend nodes surely update significantly faster than the hundreds of millions of clients.

artursapek · 2024-09-29T01:14:20 1727572460

Sounds like a fun time lol

est · 2024-09-29T02:29:37 1727576977

I am happy that I dont have to deal with this.

I am sad that my business aren't as big as this scale.

Aeolun · 2024-09-29T02:51:02 1727578262

Honestly, 77 nodes doesn’t sound like a terrific scale? The more I scale things up, the more I realize that the tone of the problems doesn’t really change. You just get more layers to your data structures.

m-hodges · 2024-09-29T02:38:49 1727577529

Fun article. Also fun to think about how many people have decided to document their crimes in these Cassandra nodes.

7bit · 2024-09-29T07:26:02 1727594762

The blog posts shows how great the technical expertise is at Discord. I work in IT and in my company devs are so incompetent, they don't even know how to create an M365/Azure dev tenant and constantly request *.Read write.All to our production tenant. I'm so envious!

On the other hand, the HOME/END keys jump to the beginning of the input field rather than the line and the frontend devs are unable to fix this non-default behaviour for years, which makes it a fucking pain in the ass to use the Posts feature within a Discord channel. I believe the budget for the backend geniuses meant that frontend had to be juniors only.

crop_rotation · 2024-09-29T17:08:35 1727629715

Hiring good is probably the most important thing for a company and also one of the hardest problem. I have seen a team of competent engineers outperform their sibling teams by 5-10x as long as each member of the team is good enough. Just 2 bad hires will slow down a team drastically. One terrible hire can do -5x work of a normal engineer.

fastball · 2024-09-29T07:26:55 1727594815

In their defense, Azure is terrible.

7bit · 2024-09-30T17:36:38 1727717798

I haven't found a difference to AWS, for example. They are all terrible in their own ways. But if one or the other is what you earn your money with, then at least put in the effort to be proficient with it, and not a complete dumbass. (Not you as in "you!")

andrewstuart · 2024-09-29T06:48:51 1727592531

When you get to scale like this, I wonder if the access patterns of the application and its data might be best served by a custom data retrieval and storage application.

I may be wrong but I just wonder if efficiency is lost to the generalized nature of any data storage system.

The other question that comes to mind is, to what extent have the developers made a systematic effort to optimize how data is stored and retrieved? If you’re building a gigantic back end system and simply accepting that the system load is what it is then you might be missing a chance to dramatically impact the size of the task of managing that data.

lyu07282 · 2024-09-29T10:35:07 1727606107

They did give one example, if someone does a @everyone in a big channel, they specifically optimized their architecture to make that efficient using their custom data services.

znpy · 2024-09-29T10:40:28 1727606428

Interesting read on one had, a bit disappointing on the other: when the solution is just "we moved to this other product" it smells of lack of serious and rigorous investigation.

Also, having worked with the JVM and with GC issues I don't buy the "GC problems" point: there are a number of improvements in recent JVM release, the main being ZGC (and generational ZGC in particular).

ZGC is great, I've personally witnessed sub-millisecond GC pauses (and i mean sub-millisecond stop-the-world pauses) on machines serving millions of requests per second. Garbage Collection is largely a solved problem in the industry as of today, thanks to ZGC.

Other than this, also comparing latencies for machines with 9TB disks rather than 4TB disks is a bit like comparing apples and oranges: we will never know if issues at the storage layer were affecting tail latencies. Were the node having, i don't know, filesystem fragmentation issues? Does the 9TB storage configuration deliver higher iops than the previous 4TB storage configuration? Is the same kind of hardware underneat (same disk type? same disk bus? or are we talking ssd vs nvme?).

As somebody that's been doing performance engineering for work, this piece is a bit appalling.

Glad to see they've solved their issue though!

ozgrakkurt · 2024-09-29T14:43:17 1727620997

GC is a problem, and it always will be at some level. You can improve it but that doesn’t mean it is not a problem. Memory allocation and management is a problem even in c/c++ problems if you want to optimize your program, there is no universe where gc is not a problem

tonetegeatinst · 2024-09-28T23:10:32 1727565032

My love of embedded stuff is growing. I'm self teaching C and assembly....to get better at low level programming and interactions with hardware but it all seems much simpler than the big data systems. Granted I'm sure it call be broken down into steps and issues to solve like any programming issue but I'm happy focusing on low level stuff for now.

airocker · 2024-09-29T16:17:14 1727626634

Just wondering if anyone considered using Postgres or another relational db. I understand it won’t do multi master replication as well but it is much more stable and predictable if you give it right amount of traffic. I guess the team had to do that part anyways for ScyllaDB

crop_rotation · 2024-09-29T16:59:42 1727629182

I don't think anyone runs Postgress at that scale (unless very specialized sharding setup). Given the choice between using ScyllaDB like everyone else and using Postgres in a super specialized best in the world setup, the choice becomes clear. Also keep in mind that Discord is not a huge super profitable company, so for them to develop something like vitess for Postgress would not make sense. For a small company with huge data like discord, using existing data solutions makes a lot more sense.

airocker · 2024-09-29T18:12:47 1727633567

They could use vitess, citus or alloydb. They could use read replicas for read operations and single master in a shard for write. They would get many SQL features (upgrades, referential integrity etc) for free. It would allow them to extend their business logic considerably.

gigatexal · 2024-09-29T08:51:58 1727599918

What a fun write up and a huge confidence building post for me in ScyllaDB.

yas_hmaheshwari · 2024-09-29T11:31:57 1727609517

Does this article imply that don't use Cassandra. Use ScyllaDB when you think you want Cassandra

crakhamster01 · 2024-09-29T18:06:36 1727633196

Interesting technical read, but I appreciated the lighthearted jokes/comments the author threw in as well. Felt like they struck the right balance - nice work!

KaoruAoiShiho · 2024-09-29T01:30:36 1727573436

Did they go with ScyllaDB just because it was compatible with Cassandra? Would it make sense to use a totally different solution altogether if they didn't start with that.

jhgg · 2024-09-29T01:49:34 1727574574

Yes, we wanted to migrate all our data stores away from Cassandra due to stability and performance issues. Moving to something that didn't have those issues (or at least had a different set of less severe issues) while also not having to rewrite a bunch of code was a positive.

ericvolp12 · 2024-09-29T04:51:27 1727585487

Did you guys end up redesigning the partitioning scheme to fit within Scylla's recommended partition sizes? I assume the tombstone issue didn't disappear with a move to Scylla but incremental compaction and/or SCTS might have helped a bunch?

jhgg · 2024-09-29T06:51:12 1727592672

Nope. Didn't change the schema, mainly added read coalescing and used ICS. I think the big thing is when Scylla is processing a bunch of tombstones it's able to do so in a way that doesn't choke the whole server. Latest Scylla version also can send back partial/empty pages to the client to limit the amount of work per query that is run.