> *I asked them to create a library (...) I showed them a two-line solution with...

flohofwoe · on May 29, 2022

This is just another potential problem of "overengingeered" libraries: they often try to offer a general solution to a whole set of somewhat related problem, but each individual user may just need one specific problem solved, and the correct solution for this one problem may indeed just be a very simple "two-liner".

Best example are C++ stdlib classes like std::vector. Sometimes you just need a trivial growable array implemented in a few dozen lines of code, but std::vector pulls in roughly 20kloc of code into each compilation unit, and only if you're very lucky the compiler will condense this down to the same few dozen lines of actually needed code (the fabled "zero cost abstraction").

IMHO OOP languages are more prone to this sort of code bloat problem because they require (or at least "expect") to write a lot of "ceremony code" which is entirely unrelated to the actual problem (such as in C++: constructors, destructors, copy-, move-operators, etc etc... and the actually important "two-liner" problem-solving method is completely buried).

arinlen · on May 29, 2022

> Best example are C++ stdlib classes like std::vector. Sometimes you just need a trivial growable array implemented in a few dozen lines of code, but std::vector pulls in roughly 20kloc of code into each compilation unit, and only if you're very lucky the compiler will condense this down to the same few dozen lines of actually needed code (the fabled "zero cost abstraction").

This is actually a very poor and ill-thought example. C++'s standard template library is expected to provide generic components that are flexible enough to meet practically all conceivable usecases, so that they meet anyone and everyone's needs.

This, obviously, means it supports way more usecases than the naive implementation you can whip out with "a few dozen lines of code".

There are very good reasons why everyone just picks up std::vector over any other implementation, and only extremely rare edge cases (like stuff that sits in the hot path of some games) ever justify using something exotic.

flohofwoe · on May 29, 2022

> to meet practically all conceivable usecases

...and that is exactly the fallacy with the C++ stdlib design philosophy, one-size-fits-all classes like std::vector should either be split into several much more specialized classes, or its runtime behaviour should be much more configurable (but ideally both) - in any case there's no justification for pulling in such an enormous amount of code into each compilation unit.

MauranKilom · on May 29, 2022

> in any case there's no justification for pulling in such an enormous amount of code into each compilation unit.

Can you specify what exactly your problem with that is? So far you've handwaved at "I don't trust the optimizer to produce good code", but loc is not inherently tied to how many user-facing abstraction layers there are. In fact, you'll find that there's maybe one additional layer of data abstraction in the actual std::vector itself. There's not much to cut through for the optimizer, and in my experience it has zero trouble doing so.

In terms of compilation speed, you pay the frontend cost for those 20 kloc exactly once if you use precompiled headers.

> should either be split into several much more specialized classes

Could you elaborate what "specialized" (or runtime-configurable) versions of std::vector you are thinking of? Just the fact that some people care about exception guarantees, and some people care about move construction, and some people care about custom allocators, and some people care about emplace semantics, and that not all of these are the same people, doesn't mean that it's inherently good to have separate classes for these aspects.

cmroanirgo · on May 29, 2022

It's rather trivial to write your own specialised form of array management, that is both easier to read & debug & also produces less bloat. I've worked at several places where std:: is verboten.

Only some of the people coming onboard never got over the shock of such a rule, but for the most part, all developers embraced the different paradigm. Teams inter operated faster, iterated changes faster, & there were less issues with inter department libraries. Bugs were far fewer.

It really gives you freedom to have complete ownership of your code, rather than 100% relying on boilerplate libraries. I now see std:: as a disempowering limitation to modern development.

Of course, YMMV.

It also makes it easier to work on other platforms, like embedded, where resources are a premium. Writing your own vector & hashmaps then become part of your muscle memory & it's no longer daunting to write a custom allocator.

MauranKilom · on May 29, 2022

> Teams inter operated faster, iterated changes faster, & there were less issues with inter department libraries. Bugs were far fewer.

It's hard for me to come up with a scenario where even a single one of these follows from not using the STL. How do you "iterate changes faster" when you re-invent vectors and hashmaps so much that it becomes muscle memory? How does each team writing their own containers mean "teams interoperated faster"? How does the STL cause "issues with inter department libraries", and what bugs were caused by using standard library containers? At which point did you ever have to debug a std::vector?

I'm genuinely interested if you can retell what problems you ran into there. Maybe my creativity is limited, but I'm drawing blanks.

(To be clear, at the scale of Facebook or Google, writing folly or abseil can easily pay off because you can integrate, say, your profiling or debugging tooling more tightly. But that doesn't appear to be what you're alluding to. I'll also concede resource management on embedded devices.)

watwut · on May 29, 2022

Good thing about standard libraries is that I don't have to debug them, because they just work.

flohofwoe · on May 29, 2022

> Can you specify what exactly your problem with that is?

Compile times mainly. This quickly adds up in a C++ project using the stdlib and gets worse with each new C++ version, it's almost at a point now where each stdlib header includes everything else from the stdlib.

> Could you elaborate what "specialized" (or runtime-configurable) versions of std::vector you are thinking of?

First and foremost more control over growth (e.g. when growing is triggered, by how much the memory is grown). More control over what erasing an element means (e.g. whether the remaining elements are required to stay in order, or if the gap can be filled by swapping in the last element). A POD version which is allowed to replace piece-wise memory operations with bulk operations (here I'm actually not sure if optimizers are clever enough to replace many unique moves with a single memmove). An more straightforward way to define an allocator (most importantly, the allocator shouldn't be part of the vector's type signature - not sure if that's what the new polymorphic_allocator in C++17/20 is about though).

...those are just some obvious requirements I had in the past, but I'm sure other people will have different requirements.

orwin · on May 29, 2022

Actually, i wanted to ask this question to good software engineers.

I once had a coding problem interview where a half of the logic could be handled by an autobalncing tree. I\ve never really used autobalancing trees in real software before, but i knew how to make them from scratch, quickly as RB tree is a very common school problem. I spend twenty minutes choosing between coding one from scratch and picking an already existing solution. I ended up choosing gnl's RB trees, with all the makefile/autoinstall issue that i would have to fix instead. I did not gain any time, really, but i wanted to show i did not suffer the NIH syndrome. Was that a mistake? Should i stay within the stdlib during coding interviews (i don't know if they could run the code, i think the interviewer was running windows)

flohofwoe · on May 29, 2022

Depends on the interviewer I guess. Some people will be adamant to use the stdlib as much as possible because they have a war story of how something went terribly wrong by not using the stdlib, others will insist that the stdlib is rubbish because they have a war story of how something went terribly wrong by relying on the stdlib.

The problem is that both are right.

(but jokes aside, I guess that the interviewer is more interested in your ability to solve a problem from scratch instead of you ability to google for an existing solution - even if googling makes a lot of sense in the real world)

benibela · on May 29, 2022

I keep alternating between those positions

And I use FreePascal. Its stdlib is vastly larger than the C++ stdlib, but also completely untested/unusable, because no one is using it.

I wrote my own unicode handling functions for everything.

Yesterday I found a new test case, and noticed that my convert utf8 to lowercase function did not handle the Turkish İ correctly (it should turn into two codepoints rather than one for reasons. although I had a check for that symbol, but it was in the wrong branch). And my function had quadratic runtime, so it was also nearly unusable.

So I fixed my function. Then I thought, why am I even writing a convert to lowercase function? FreePascal already has a convert to lowercase function. There is a lesson here, do not write your own functions, you will miss cases.

So I loaded the stdlib Unicode functions to compare my function to their function. And, segmentation fault. Not in the convert to lowercase function, but just loading that Unicode part of the stdlib broke something.

Although while writing this post, I thought, perhaps I test it again, just the convert to lowercase function, without the crashing part. It also fails to handle the İ. And worse, it returns a string that is one byte too large, like a garbage null terminator. There is a lesson here, do not use the stdlib, it is just broken.

Then I compared it to a compare to lowercase function from another library I had included. Twice as fast as even my new fixed function. But it also does not handle the İ symbol. There is a lesson here, do not use other libraries, they do not do what you need them to do.

iex_xei · on May 30, 2022

Sorry for the İ/i I/ı bug. Even modern software in other languages fall to that trap time to time, so probably it's not handled well in other languages either.

mvgoogler · on May 29, 2022

I would ask the interviewer and use it as an opportunity to show your knowledge of data structures and your ability to reason about trade-offs and talk through priorities.

I would say something like: "I think a balanced tree, such as an rb-tree, would be useful here for <reasons that make sense given the problem and the properties of rb trees>. I've written rb trees before and think I could write a basic one in 10-15 minutes or I could use <class from the std library, which uses a balanced tree>. Which would you prefer?"

Assuming what you said made sense I would take an interaction like that as a positive signal.

wruza · on May 29, 2022

Part of this problem is underconstrained requirements. If you do not specify that “a” can’t be zero in “ax2”, it’s reasonable to behave sensibly in that case. One thing is a free designer who states requirements and constraints. Another is an implementer who was given an isolated task and no more details. You have to be either full-defensive or renegotiate edge cases which probably could lead to unnecessary complexity.

spacechild1 · on May 29, 2022

> and only if you're very lucky the compiler will condense this down to the same few dozen lines of actually needed code

The code consists almost entirely of templates and inline functions and every modern C++ compiler will optimize that away. In a release build there is basically no overhead compared to using raw pointers.