It looks for knowledge islands and relates those to frequently modified code, to identify hotspot, or areas of high risk due to low knowledge distribution in areas of high change.
Another use is if someone hands in their notice you can easily see all the code that only they know, so your handover planning is mapped out easily.
I’ve never thought of it being used maliciously, it’s for visibility. It would be a shitty manager that would use it that way and if they’re already shitty then this tool won’t change that.
>I’ve never thought of it being used maliciously, it’s for visibility. It would be a shitty manager that would use it that way and if they’re already shitty then this tool won’t change that.
You are a member of the intelligence community of a country, let's call it Tussia, which has been locked out of the leading kernel for military hardware in the world. Let's call that Kinux.
You know that the guy down the office has started a project to fork that kernel for your countries own internal usage. You're an over achiever and want a promotion before he gets one. You call acquisitions for 8 female agents with special training for intimacy with nerds, you also make a back up call for 8 doses of polonium in case the agents aren't successful.
In case you think the above is fiction I know a CEO of a unicorn startup who got the first part of the treatment when he was looking for seed funding.
> I’ve never thought of it being used maliciously, it’s for visibility. It would be a shitty manager that would use it that way and if they’re already shitty then this tool won’t change that.
I've had three jobs where Pluralsight Flow was introduced. At two of them, the managers immediately started using the metrics for feedback, performance reviews, employment decisions. At the third, the developers saw this coming a mile away and refused to engage with or evaluate the tool.
Unfortunately, the absurd pricing of these tools means that people who approve them have to get some sort of ROI. Since they don't have a good way to measure productivity/output/knowledge silos, they instead turn to "Well Jose had less PRs this week..."
Excellent points about visiblity, as long as you can keep it in that domain.
But this always lurks in the near shadows:
>>I’ve never thought of it being used maliciously, it’s for visibility. It would be a shitty manager that would use it that way
Therein lies the problem, on both sides. It would just become another arms race, as the developers would use it to identify and move into target project areas/components to get themselves on the list of un-fireable workers. Ideally, the workers would ensure work together to ensure that the truck_factor was zero, i.e., none of them could be fired.
Of course all of this rapidly becomes a (nearly)complete waste of time, proving the blogger's friends original point: >>"My coworkers said it would immediately hit Goodhart’s Law. "
Amazon has these numbers easily accessible as reports on their code systems runnable at any manager level, and many other ways to inspect what the team is doing and the risks you might have. I find them useful.
Bus factor is one way to think of it. Another is it lets you spot silos, or engineers who aren't working with others, or places where you can't as easily move engineers around (so you can fix that).
Some developers fear fungability, they think that that one system only they know is job security. I see it the other way, I see that as a technical risk, but also a thing that might be keeping a great engineer from working on more important projects. Or the way to work on something else when you get fed up with that one system you hate.
> Some developers fear fungability, they think that that one system only they know is job security. I see it the other way, I see that as a technical risk, but also a thing that might be keeping a great engineer from working on more important projects. Or the way to work on something else when you get fed up with that one system you hate.
I don't fear fungibility - if a place doesn't want me there I don't want to be there either.
I dislike idea of fungibility because it creates huge overhead and avoids using talented people to full potential - because they are not fungible.
In some places that's warranted - but in others - the process overhead is far greater risk to your project success than bus factor.
Each department in my company can designate someone as a "critical man" so they can't change teams. However, those people usually get the highest possible ratings and raises. I've only seen it used maybe one time.
This is the other reason I want people to be somewhat fungible... so management can't shoehorn them. Really the ability to move is a good thing for both the company and the employee. Moving doesn't HAVE to happen, but it's really much better when everyone has options.
But a promotion also means change, maybe people are happy where they are? After all, you're most productive in a familiar codebase. Then again, if a codebase is "done", productivity is less important and often work dries up. There's plenty of people that are happy with working 10 but getting paid for 40 hours a week.
Some companies have parallel career progressions for individual contributors and managers. For example, Amazon has a distinct career progression for software development engineers (SDE): SDE I, SDE II, Senior SDE, Principal SDE, Senior Principal SDE. Sr Principal is a VP level position, but has no direct reports. It’s an advisory role.
I've noticed that, while these tracks are still technical, you will still find yourself in more and more meetings and writing less code. I think everyone needs to come to terms with whether or not they want a promotion regardless of the track.
Only if you aren't at a terminal level already, and you are easy to replace. When I was there one could find plenty of SDE IIIs (and a smattering of SDE IIs) who had found nice quiet niches of the codebase to hide out in for 5 years or more...
I don’t want to frame it positively. In my experience, people who hoard knowledge and/or relationships (as with other teams or customers) are toxic.
I hate all sorts of gatekeeping behavior but I especially despise having to wheedle information or effort out of someone like that. They almost invariably have an ego that is out of proportion to their contributions.
Side note: Occasionally I have seen people who take on the thankless task of being a maintainer of something no one else wants- those people tend to welcome others and share as much as others are willing to listen to, so being the single person who knows something doesn’t make you a gatekeeper.
honestly every system I saw someone corner themselves in didn't take very long to have someone else figure out. you just toss some smart motivated folks at it. paging them every time something breaks also seems to help the knowledge transfer. I know it motiviated me to get in there and do a gut rehab.
I've spent my entire career trying to make myself and every other cog-head as replaceable as possible. Part of this is because that's the job in large parts of digitalisation, but another big part is because it's annoying to deal with knowledge silos. One of the reason we use Typescript for a lot of our back-end stuff is because it's the one language that our small team has to know. This means our front-end dev can go on a vacation and actually disconnect because other people can cover. It also means it's less of a pain when someone changes jobs.
It has never once been an issue and I think fungability is a part of any healthy system. I spent a few years on the other side of the table, and one of the first lessons you learn in management is that everyone is replaceable and that's solely a matter of cost. Because of this, high levels of knowledge may also work against you as management will likely work on reducing the risk you present. Another thing you'll learn is also how random layoffs can be, especially if they happen for economic reasons.
That being said. I avoid working at places with these silly metrics. The more red tape you put in the way of good work the less likely I'll want to work with you. There are a lot of reasons for this but the primary one is that these sort of things tend to create work cultures which just aren't good for productivity and quality as people "game the metrics" instead of doing good work.
At the end of the day the engineer has a valuable skill and you're the one telling them they need to be more replaceable. You're the enemy. Of course you're going to think the tools that let you dehumanise your employees at scale are good things.
Hard disagree. I think a huge part of our job as engineers is to build systems that can outlive us and comfortably change hands (without the next team cursing the former).
Maybe this is born from spending so many years in Amazon (with it's high turn over and near-quarterly re-org shuffling), but what's getting called "replaceable" here I'd call "writing maintainable software."
The goal is to get knowledge out of your head and into the codebase so everyone can reap the benefits. Knowledge hoarding is lame.
> Why does gnu parallel use all the cores when we tell it to use only 8? We only saw eight git clone processes at a time, but a large number of git index-pack processes that maxed out all 32 cores on my laptop. I’m guessing git’s index-pack is a forked subprocess and allows parallel to start another git clone?
Parallel is running 8 git clone jobs at once (as asked for) and each git clone is starting as many index-pack threads as it wants. Temporarily setting pack.threads to 1 (via git config) would help here.
This is an increasingly common problem: two layers, both trying to utilise the entire machine by parallelising by the number of CPUs/cores, and so the inner layer gets N² threads/processes. Being quadratic growth, this means that the problem becomes worse the more CPUs you have. With 32 cores, 32² = 1024, though in practice Parallel was told 8 so it’d cap out at only 256 index-pack processes. But that’s a lot of memory needed to support that, especially given that it’s doing no good.
The solution is for only one of the two layers to parallelise.
—⁂—
Given that you talked about pack.threads, here’s its description from `man git-config`:
> Specifies the number of threads to spawn when searching for best delta matches. This requires that git-pack-objects(1) be compiled with pthreads otherwise this option is ignored with a warning. This is meant to reduce packing time on multiprocessor machines. The required amount of memory for the delta search window is however multiplied by the number of threads. Specifying 0 will cause Git to auto-detect the number of CPUs and set the number of threads accordingly.
Reminds me of a Zig proposal I saw recently to make the std.Thread.Pool API respect the Linux jobserver and macOS dispatcher automatically out of the box:
Those are some really interesting proposals. I agree that a lot of code ignores resource cleanup, especially when it comes to driver-userspace communication. For example, driver authors just implement a netlink/ioctl system which manages persistent state in kernel space, even though they could bind the state to a device file descriptor, which automatically gets cleaned up upon process exit (but still can be handed to another process, if really needed)
The solution is a global scheduler (the one thing that Kubernetes does well). It would be aware of what every program needs, not just in terms of CPU, but also memory, devices, time etc. and split the available resources fairly.
I don't like this take. This post is for any engineering leader.
The Bus Factor is how hard your team would suffer if you - or anyone else on the team - got hit by a bus.
Ideal Bus Factor for all team members is 0. This might sound counter-intuitive at first, almost like "make everybody expendable", but it's quite the opposite and kind of the point.
Teams should be good enough that they are a) autonomous and b) there are no mysteries. In the ideal state, everyone understands how everything works. New employees should hit the ground running and be able to produce value immediately. Departing employees should feel comfort in knowing that there are no unknowns.
An ideal team with 0 BF across the board is desirable. It means that team members are fungible. It means that every single team member can fill in the gaps if someone is ill, or on vacation, or actually leaves or is removed.
More importantly, a 0 BF is a reflection of simplicity. The software, its build/test/deployment pipelines, documentation, and support, should all be cohesive and coherent. Siloing information in team members is bad, everyone should be able to build and deploy.
0 BF is a healthy metric, but it is absolutely 100% not measured in email rate, commit rate, PR rate, lines of code, timeliness, GitHub heatmaps, etc. Those metrics indicate nothing at all. Quite the oppositve. They are harmful, awful metrics.
Measuring people by these metrics is just monkeys on typewriters. More startups need to hear this.
> The Bus Factor is how hard your team would suffer if you - or anyone else on the team - got hit by a bus.
> Ideal Bus Factor for all team members is 0. This might sound counter-intuitive at first, almost like "make everybody expendable", but it's quite the opposite and kind of the point.
I've always heard bus factor described in the inverse fashion, as in "how many people would need to get hit by a bus for the project not to be able to continue", with the optimal number being the same as the number of people on the team. It sounds like the idea is the same, but I'm surprised to find out that the number people to convey the concept isn't always the same.
A number between 0 and 1 can easily scale to whatever company you have. A number that is different depending on how many is on your team is harder to compare between teams. I guess it depends on if you ask a programmer, administrator or mathematician what a logical system would be :-)
For sure! The simplicity of having the same ideal bus factor for all sizes definitely appeals to me, but maybe due to familiarity, I think I prefer the well-defined units from the version I cited. It's a bit of a https://xkcd.com/883/ situation in terms of how bad your imagination is for what "maximum suffering" would be.
> Teams should be good enough that they are a) autonomous and b) there are no mysteries. In the ideal state, everyone understands how everything works. New employees should hit the ground running and be able to produce value immediately. Departing employees should feel comfort in knowing that there are no unknowns.
I've worked on projects where we had engineers who were one of a countable handful of people in the world with their particular skillset.
The bus factor was most certainly 1 at that point.
> More importantly, a 0 BF is a reflection of simplicity. The software, its build/test/deployment pipelines, documentation, and support, should all be cohesive and coherent.
For projects which push the frontiers of what is possible, simplicity isn't an option. (Granted these are a small % of overall software projects that exist!) When something has never been done before, you aren't worried about keeping the code As Simple As Possible, you are worried about how the hell you can even do this particular thing.
I'm not saying the code should be low quality! However sometimes doing hard stuff involves complex code, and maybe a couple generations later people have figured out design patterns so the hard stuff can have less complex code, but that may be a decade down the line!
> grug understand all programmer platonists at some level wish music of spheres perfection in code. but danger is here, world is ugly and gronky many times and so also must code be
If you only build things that are so simple that anyone can understand the code on day one and you don't need any domain knowledge, then what is your value proposition?
If the most complex thing you can build is a todo-app, then I think you don't produce much value to society.
Being able to write code that is able to be understood by someone new to the project is a skill set. It is certainly one that is not universal. And it is certainly one that should be admired. Solving hard problems in the simplest way, with clear information about why/how it works is one of the most important skills of a developer, imo.
Not day one, but anyone should be able to follow the code using a debugger and understand it. If you write spaghetti code segments, it's high time to change them.
If you ask society what has helped them the most, you will be surprised to learn how many claims the todo-list (paper or app in whatever time frame) is their main way of actually getting anything done.
Um, I would argue that what has helped society the most is agriculture, sewage systems, healthcare, electricity and heating, etc. All of which are technological innovations.
Also, how many variations of a to-do app does the world need?
Agreed. People who are good at their jobs and confident in what they do actively try to make their Bus Factor as low as possible. If you have a high Bus Factor that means your employer keeps you because of what you have done in the past, not your potential to do great stuff in the future.
The military operates that way with 99% of their personnel, who are grunts, expected to only ever follow orders, to never think for themselves. They're expendable cannon fodder - think of them as pieces of hardware in a software company. But with the 1% at the very top (basically just generals), I'd say the bus factor comes into play, same as in any other organisation - certain individuals have all the knowledge of certain domains, and if enough of those individuals are taken out, the wheels grind to a halt. That's why targeted assassinations happen to the top brass.
Sure, if you manage to assassinate the entire command chain, things will go pear shaped.
I dare say you could likely assassinate half the command chain, and the military will still managed to get where they need to be, when they need to be there. Military command chains have levels of redundancy that civilian organisations wouldn't dream of.
As a concrete example, it's estimated that the British lost ~40% of their officers in the Battle of Albuera, and they still managed to repel Napolean's forces.
> Our estimation relies on a coverage assumption: a system will face
serious delays or will be likely discontinued if its current
set of authors covers less than 50% of the current set of
files in the system
An author of a file is defined as a user who has made significant (based on pre-calculated weights) contributions to a file
Yep, we have a tech lead who has great code writing abilities. But he has terrible leadership, feedback, etc skills. He would be much better as a senior dev instead of a tech lead. I can't imagine how miserable his team would be if he was a manager.
Well, it certainly isn't "development". It's a hoop we jump through so that management can tick a box labelled "all commits were reviewed by another engineer" on the SOC2 audit...
Just going to second this. Good code reviews (not just typo nitpicking) can be a great way to simplify down code, and spread knowledge horizontally across the org. Not to mention catching bugs.
Unit test aren't a substitute because unit tests check that the success paths are good. That's a good start, but it's not the same as verifying all the possible ways code could go wrong in a complex system, and one of the cheapest ways to spot those problems is with people familiar with complex system looking at new code.
Code review give you the double benefit of building more people who understand the whole system, and having the code looked at by people who understand the whole system.
If a non-trivial percentage of your code review feedback is about code quality and bugs, you are severely underinvesting in autoformatters/linters/strong type systems/testing/continuous-integration. It's not a cost-effective use of (expensive) software engineers to have them scanning every PR on the off-chance they notice a typo.
I'll grant you they can help break down silos, but the question you should be asking, is why your codebase is so convoluted that silos are developing in the first place?
On the one hand, I would be very surprised if this sort of dashboard metric didn't already exist in some enterprise software somewhere. My management at a previous company literally asked me if we could do a daily report on who sent and received the most email in our department.
I refused to do it because I didn't like where I saw it going, but another coworker did. Unsurprisingly, the person who got the most email and sent the most email was the Sysadmin whose email account was set as sender for all the automated emails from the various servers, as he would literally email himself hundreds of alert emails a day, on top of all the crap newsletters and digests he subscribed to.
On the other hand, all of your coworkers asking you to not do something that could potentially impact their jobs, and you do it anyway as a hobby project? Sounds like kind of a jerk move.
The sad irony of this is that its still the wrong question.
When startups have to do layoffs, the question isn't "who can I fire and keep the existing business going" it is instead "who is the team I want building the next version of the product quickly enough to not go out of business"
But every fork in the road is just exactly that, and not choosing a path quickly enough has been the death of many
CPAN has been tracking the Bus Factor for a long time now. For example, https://metacpan.org/pod/Moose shows a Bus Factor of 5 in the left column info.
Once the big tech employers started having security march employees out of the office the moment they handed in their notice, a lot of folks stopped giving them two weeks notice.
Also, it's not guaranteed that your management will actually tell you if they did - one employer asked me not to tell my team I was leaving until the last day for morale...
> Ye using sudden death as an eupheism for changing employer leaves a bad taste.
Well, see, we can't just straight-up tell employees that we're not going to give them promotions or raises, so they'll have to jump ship in a year... That'd be a disaster for morale!
I really don’t like the term "bus factor" because I lost someone in a bus accident.
On the other hand, millions of lives have literally been lost on various hills on various battlefields, yet no one seems to mind when someone says, "this is the hill I’m willing to die on." :)
(I am old - I start with the poor bus driver scenario - but then try to inject the new phrasing into the conversation - as typically I am the only technical person on my team and they begin to rely on me for EVERYTHING)
i worked a software job that sometimes involved literally going to the busyard to debug on-board equipment. people at that job used the phrase attrition risk rather than bus factor because the latter was just a little too "real".
From TFA:
>In 2015 or so, my employer had layoffs. One of them was the only contributor to part of the codebase that made money for our company.
Maybe I missed it, but I didn't see the author mention what came of this. I'm very curious: did someone else take it over, or did the employer go down in flames?
Probably, but I'm always hoping to hear a story about a company laying off a highly valuable employee(s) (usually for cost-cutting reasons) and then going bankrupt as a result.
Not only is this a metric that doesn’t make any sense, simply floating the concept that this metric corresponds to anything in real life is harmful.
Teams are social constructs, and you simply cannot apply an algo to some observable code metric and get any kind of proper result. People leave, others step up, or don’t. End of story.
The problem with these kinds of metrics is that even if people know they are ridiculously off what they mean, some still think that the idea they convey is correct: ie. That if x people were to leave, the project would stall. That premise is simply not true.
Don’t even think about this stuff. It’s stupid. If you want to know more about these risks, talk to the team. People on the team know if anyone is irreplaceable.
There's a very quick fix to this, which also radically improves the strength and potency of any organization:
Do not ever accept management directives from someone who couldn't do your job.
Yes, you heard me. If your manager cannot do what they are asking you to do, fire them immediately.
This has the added bonus of ensuring that Peters Principle doesn't become a major mountain instead of a mole-hill.
This does not mean that the manager has to do your job. It does mean that making sure your manager knows how to do your job is your responsibility and something that you, yourself as a worker-cog in a large machine, actually do control.
I have shipped software all over the planet in all kinds of markets for all kinds of users. The most successful project is composed of individuals who have great deals of trust in each others' ability to perform, and who share the load in ways adequate to the task. In the most successfully managed projects, those managers who I knew could do my job, but didn't (because I did it), were absolutely the best to work for .. whereas those who had no idea how to wrangle a single line of code, yet gave themselves the altitude required to be a 'manager' were, across the board, a catastrophe.
BTW, if you feel 'seen' by this comment - i.e you are a manager who feels a bit imposter'ish - don't worry, this is Peters Principle at work, and you can easily fight this by better communication with the folks whose work you cannot do ..
The title and the first paragraph mentioned a plugin but the article is not about a plugin but about trying to replicate some old results, and they've failed at that since they didn't 100% respect the method of original authors.
Wonder how many managers are thinking about the bus/lottery numbers of all the open source projects their developers are relying on, and then doing something about it.
This is what I was most interested in with this project (especially in light of the https://opensourcepledge.com/). We didn't run it against any new libraries, so it would be interesting to see the state of the most starred libraries.
While it is the platonic ideal for operating an existing service, this is also a great way to kill your velocity if you are an early-stage project which needs to deliver results yesterday, and/or pivot on a dime, in order to stay funded...
Or kill your velocity by playing with creaky, fragile hacks only you understand that duplicate code and are a huge mess. I've seen countless wannabe startups that stupidly accumulated so much technical debt with a "let's worry about that tomorrow" techbro attitude that they had a relative velocity of zero when asked to make minor changes.
"Bus" and "Truck" terminologies invite you to imagine your coworkers being crushed to death. This isn't great.
I've always preferred the term "lottery factor". If this employee won the lottery (almost certainly leaving their day job behind), would you be able to survive their sudden departure?
Over 30-ish years of work, I’ve lost about 0 coworkers without warning or transition to lotteries or similar events, and more than a few (whether permanently or for an extended period) to unexpected sudden death, injury (traffic related being particularly common), or other unheralded misfortune (law enforcement involvement, on a short out of country trip and then encountered visa issues, whatever.)
“Bus factor” is a more realistic reflection of the threat model, and resonates with experiences anyone who has been working more than a short time probably has more than “lottery factor”.
Put a positive spin on it. We don’t talk about coworkers getting hit by a bus, we talk about them winning the lottery and quitting on the spot due to their newfound fortune.
Same discussion but it’s less morbid, and doesn’t end up sounding like your prioritizing the health of your project about the literal lives of your employees.
The implications are different. There has to be a high euro sum you can offer to get someone who won the lottery to still brief a colleague. You can throw money at a dead person all you want but that will not have the same result. It is a sudden end in knowledge that can not be restored with time or money (like running out of lottery funds)
Also, the idea that the only reason people work on the project is that they need the money surely seems less appealing than that the only thing stopping them from working on it is death? Or are we now just okay with saying the quiet part out loud and acknowledging the exploitative nature of the economic system?
> Also, the idea that the only reason people work on the project is that they need the money
No one has presented this idea. Your arguing that we should stop paying employees, because they will obviously continue to work unless they are only working because they need the money?
Saying that if someone wins the lottery they’ll likely leave to Perdue the opportunities that bag of money will present them to is not the same as saying the only thing keeping them at the job is money.
You don’t need to plan for people leaving the company if you have a emergency reserve big enough to literally compete with the lottery payout wise in emergency. If someone leaves you can just hire a couple of hundred people to replace them and still save money.
It looks for knowledge islands and relates those to frequently modified code, to identify hotspot, or areas of high risk due to low knowledge distribution in areas of high change.
Another use is if someone hands in their notice you can easily see all the code that only they know, so your handover planning is mapped out easily.
I’ve never thought of it being used maliciously, it’s for visibility. It would be a shitty manager that would use it that way and if they’re already shitty then this tool won’t change that.
reply