Is this a response to Alma Kitten? In any event I don't understand why frame poi...

adrian_b · 2024-11-04T15:50:04 1730735404

In reality all this discussion has its origin in a design mistake made by Intel already in the 8086 CPU, when it was launched in 1978.

They have designed the instruction set in such a way that two distinct registers were necessary for fulfilling the roles of the stack pointer and of the frame pointer.

In better designed instruction sets, for example in IBM POWER, a single register is enough for fulfilling both roles, simultaneously being both stack pointer and frame pointer.

Unfortunately, the Intel designers have not thought at all about this problem, but in 1978 they have just followed the example of the architectures popular at that time, e.g. DEC VAX, which had also made the same mistake of reserving two distinct registers for the roles of stack pointer and of frame pointer.

In the architectures where a single register plays both roles, the stack pointer always points to a valid stack frame that is a part of a linked list of all stack frames. For this to work, there must be an atomic instruction for both creating a new stack frame (which consists in storing the old frame pointer in the right place of the new stack frame) and updating the stack pointer to point to the new stack frame. The Intel/AMD ISA does not have such an atomic instruction, and this is the reason why two registers are needed for creating a new stack frame in a safe way (safe means that the frame pointer always points to a valid stack frame and the stack pointer always points to the top of stack).

rwmj · 2024-11-04T14:16:17 1730729777

The real benefit is being able to turn on profiling when a problem is spotted, or in some cases to be able to profile continuously in production (as apparently they do at Netflix).

thegeomaster · 2024-11-04T14:29:56 1730730596

I get it. This frustrated me to no end. But still I did what I had to do --- recompiled random software throughout the stack, enabled random flags, etc. It was doable and now I can do it much faster. I don't think it's fair for upstream to disable a useful optimization just so I don't have to do this additional work to fix and optimize my system.

rwmj · 2024-11-04T14:45:27 1730731527

Doing real world, whole system profiling, we've found performance was affected by completely unexpected software running on the system. Recompiling the entire distribution, or even the subset of all software installed, is not realistic for most people. Besides, I have measured the overhead of frame pointers and it's less than 1%, so there's not really any trade-off here.

Anyway, soon we'll have SFrame support in the userspace tools and the whole issue will go away.

thegeomaster · 2024-11-04T17:07:44 1730740064

In one of my jobs, a 1% perf regression (on a more stable/reproducible system, not PCs) was a reason for a customer raising a ticket, and we'd have to look into it. For dynamically dispatched but short functions, the overhead is easily more than 1% too. So, there is a trade-off, just not one that affects you.

Dylan16807 · 2024-11-05T03:39:48 1730777988

If 1% shows up out of nowhere, it's very much worth investigation and trying to fix it. You shouldn't let them freely happen and pile up.

But there are some 1% costs that are worth it.

Brian_K_White · 2024-11-04T14:54:22 1730732062

I think it comes down to numbers. What are most installed systems used for? Do more than 50% of installed systems need to be doing this profiling all the time on just all binaries such that they just need to be already built this way without having to identify them and prepare them ahead of time?

If so, then it should be the default.

If it's a close call, then there should be 2 versions of the iso and repos.

As many developers and service operators as there are, as much as everyone on this page is including both you and I, I still do not believe the profiling use case is the majority use case.

The way I am trying to judge "majority" is: Pick a binary at random from a distribution. Now imagine all running instances of that binary everywhere. How many of those instances need to be profiled? Is it really most of them?

So it's not just unsympathetic "F developers/services problems". I are one myself.

recursivecaveat · 2024-11-04T16:11:15 1730736675

Everyone benefits from the net performance wins that come from an ecosystem where everyone can easily profile things. I have no doubt that works out to more than a 1% lifetime improvement. Same reason you log stuff on your servers. 99.9% pure overhead, never even seen by a human. Slows stuff down, even causes uptime issues sometimes from bugs or full discs. It's still worthwhile though because occasionally it makes fixes or enhancements possible that are so much larger than the cost of the observability.

Brian_K_White · 2024-11-05T14:37:50 1730817470

This at least makes sense. Thanks.

nemetroid · 2024-11-04T15:07:35 1730732855

Do 50% of users need to be able to:

* modify system services?

* run a compiler?

* add custom package repositories?

* change the default shell?

I believe the answer to all of the above is "no".

redox99 · 2024-11-04T15:13:52 1730733232

All those things are free in terms of performance though.

Brian_K_White · 2024-11-04T15:14:23 1730733263

I don't see how this applies. Some shell has to be the default one, and all systems don't pick the same one even. Most systems don't install a compiler by default. Thank you for making my point?

nemetroid · 2024-11-04T15:20:48 1730733648

All these things are possible to do, even though only developers need them. Why shouldn’t the same be true for useful profiling abilities? Because of the 1-2% penalty?

Brian_K_White · 2024-11-04T15:33:07 1730734387

Are you serious?

Visa makes billions per year off of nothing but collecting a mere 2%-3% tax on everything else.

sfink · 2024-11-05T06:59:19 1730789959

Visa sells money for money, skimming off a percentage.

CPUs spends cycles for features (doing useful work). Enabling frame pointers skims off a percentage of the cycles. But it's the impact on useful work that matters, not how many cycles you lose. The cycles are just a means to an end. So x% of cycles is fundamentally incomparable to x% of money.

oasisaimlessly · 2024-11-04T16:49:55 1730738995

I don't see how Visa is in any way relevant here.

Brian_K_White · 2024-11-04T17:44:47 1730742287

I don't see why not.

The whole point of an analogy is to expose a blind spot by showing the same thing in some other context where it is recognized or percieved differently.

loeg · 2024-11-04T16:53:37 1730739217

Meta also continuously profiles in production, FWIW.

Brian_K_White · 2024-11-04T14:43:48 1730731428

Then Netflix can enable it for their systems? Are they actually still profiling cat and ls that come from the os or are they profiling their own applications and the interpreters and daemons they run on?

This does not explain why a distribution should have such a feature on by default. It only explains why Netflix wants it on some of their systems.

soraminazuki · 2024-11-04T15:01:34 1730732494

People across the industry are suffering from incomplete stacktraces because their applications call into libraries like glibc or OpenSSL that has frame pointer optimization enabled by their distro. It's pretty ridiculous to have to pull off a Linux from Scratch on CentOS just to get a decent stacktrace. Needless to say, this has nothing at all to do with profiling cat and ls.

pkhuong · 2024-11-04T15:06:33 1730732793

OpenSSL is the worst because some configurations execute asm generated by a specialised program. That code clobbers the frame pointer (gotta go fast!) but isn't annotated with dwarf unwinding info (what do you mean you want to know what lead to your app crashing in OpenSSL?)...

the_mitsuhiko · 2024-11-04T14:46:24 1730731584

> Then Netflix can enable it for their systems?

And they did.

The question is though why only Netflix should benefit from that. It takes a lot of effort to recompile an entire Linux distribution.

Brian_K_White · 2024-11-04T15:11:07 1730733067

Quoting my other comment in this thread:

---

I think it comes down to numbers. What are most installed systems used for? Do more than 50% of installed systems need to be doing this profiling all the time on just all binaries such that they just need to be already built this way without having to identify them and prepare them ahead of time?

If so, then it should be the default.

If it's a close call, then there should be 2 versions of the iso and repos.

As many developers and service operators as there are, as much as everyone on this page is including both you and I, I still do not believe the profiling use case is the majority use case.

The way I am trying to judge "majority" is: Pick a binary at random from a distribution. Now imagine all running instances of that binary everywhere. How many of those instances need to be profiled? Is it really most of them?

So it's not just unsympathetic "F developers/services problems". I are one myself.

---

"people across the industry" is a meaningless and valueless term and is an empty argument.

Dylan16807 · 2024-11-05T03:45:01 1730778301

> such that they just need to be already built this way without having to identify them and prepare them ahead of time

I think a big enough fraction of potentially-useful crash reports come from those systems to make it a good default.

brenns10 · 2024-11-04T17:14:36 1730740476

> I don't understand why frame pointers need to be in by default instead of developers enabling where needed

If you enable frame pointers, you need to recompile every library your executable depends on. Otherwise, the unwind will fail at the first function that's not part of your executable. Usually library function calls (like glibc) are at the top of the stack, so for a large portion of the samples in a typical profile, you won't get any stack unwind at all.

In many (most?) cases recompiling all those libraries is just infeasible for the application developers, which is why the distro would need to do it. Developers can still choose whether to include frame pointers in their own applications (and so they can still pick up those 1-2% performance gains in their own code). But they're stuck with frame pointers enabled on all the distro provided code.

So the choice developers get to make is more along the lines of: should they use a distro with FP or without. Which is definitely not ideal, but that's life.

ithkuil · 2024-11-04T14:00:27 1730728827

It's useful to be able to profile on production workloads