More

mariusae · 2024-03-02T18:59:51 1709405991

> Page and Brin […] and are often at each other's throat.

Citation needed.

mariusae · on May 20, 2021

If you want to work with Go, check out bigslice [1] (and bigmachine [2]), which is built on a similar architecture.

[1] https://github.com/grailbio/bigslice/ [2] https://github.com/grailbio/bigmachine

mariusae · on May 17, 2021

Reflow [1] is a similar attempt at a slightly different domain: bioinformatics and ETL pipelines. Reflow exposes a data model and programming model that reclaims programmability in these systems, and, by leaning on these abstractions, gives the runtime much more leeway to do interesting things. It unties the hands of the implementer.

[1] https://github.com/grailbio/reflow

mariusae · on Oct 31, 2019

See also bigslice (https://bigslice.io) for another take on this.

mariusae · on May 9, 2018

> I want an Airflow that runs and scales in the cloud

I'd encourage you to look at Reflow [1] which takes a different approach: it's entirely self-managing: you run Reflow like you would a normal programming language interpreter ("reflow run myjob.rf") and Reflow creates ephemeral nodes that scale elastically and that tear themselves down, only for the purpose of running the program.

> has extensive observability (monitoring, tracing)

Reflow includes a good amount of observability tools out of the box; we're also working on integrating tracing facilities (e.g., reporting progress to Amazon x-ray).

> has a full API, and maybe some clear way to test workflows.

Reflow's approach to testing is exactly like any other programming language: you write modules that can either be used in a "main" program, or else be used in tests.

[1] https://github.com/grailbio/reflow

mariusae · on May 9, 2018

Reflow [1] is also well-suited for ETL workloads. It takes a different tack: it presents a DSL with data-flow semantics and first-class integration with Docker. The result is that you don't write graphs, instead you just write programs that, due to their semantics, can be automatically parallelized and distributed widely, all intermediate evaluations are memoized, and programs are evaluated in a fully incremental fashion:

[1] https://github.com/grailbio/reflow

mariusae · on Jan 19, 2017

That's false. With anonymous closures you can implement select. See concurrent ML, or https://github.com/twitter/util/blob/develop/util-core/src/m... (which also happens to be lock free)

atombender · on Jan 19, 2017

He's referring to the "select" keyword, which is built into Go. You can't use it with anything other than the built-in channel type.

Matthias247 · on Jan 19, 2017

The great hopac library for F# which is inspired by concurrent ML also supports channels with select.

There now also seems to be an experimental channel implementation for C#/.Net which supports this feature: https://github.com/dotnet/corefxlab/tree/master/src/System.T...

mariusae · on Sept 23, 2013

yes, they were recorded, and will be posted.

mariusae · on July 25, 2013

More or less 100% of asynchronous composition is done via Futures[1] whose default implementation[2] does this for you.

[1] https://github.com/twitter/util/blob/master/util-core/src/ma... [2] https://github.com/twitter/util/blob/master/util-core/src/ma...

dpratt · on July 25, 2013

I mean this as politely and constructively as possible, but this looks like it has quite a few easy to fall into failure modes. If I'm not using a twitter Future, or if I've been given a Future from somebody else, it would look like I need to ensure to surround every compositional operation with this save and restore, otherwise my context is permanently lost. Additionally, I have to trust that any code I hand a closure to will do the right thing, otherwise my context is potentially lost since I have no guarantee on which thread that closure will actually be executed on. It seems like it would be virtually impossible to avoid this happening at least once in even a simple codebase, and when context is lost, it happens silently.

mariusae · on July 26, 2013

You are right that this fails when you move outside of the model. However, we don't.

You may be surprised (amazed?) to learn that, internally, 100% of composition happens in this manner. We have a massive code base, and we've not seen this be an issue.

Further, we've worked with the Scala community to standardize the idea of an "execution context" which helps make these ideas portable, the particular of the implementation transparent to arbitrary producers and consumers of futures, so long as they comply to the standard Scala future API. (Twitter futures will when we migrate to Scala 2.10.)

dpratt · on July 26, 2013

This is true, and I look forward to the Typesafe guys fixing both the default implicit ExecutionContext as well as any contexts that Akka creates. I actually implemented an ExecutionContext that does exactly what is described above, but ultimately we had to abandon it, since pretty much any library that deals with scala standard Futures in 2.10 has places where we could not provide our custom context. I can't wait until you guys get that ported upstream, because until then, I've explicitly banned the use of thread locals across our entire stack.

theatrus2 · on July 26, 2013

Yes, if you "go off the reservation" and outside of the Twitter Future, you will lose your automatic trace identifiers.

This isn't a unique problem, but using consistent libraries goes a long way (which works well internally at Twitter).

mariusae · on July 25, 2013

(For fun, I just searched through our entire codebase. The only manipulation of locals anywhere is in the util library.)

mariusae · on July 25, 2013

Also make sure you read Eddie Kohler's LaTeX usage notes:

  http://www.read.seas.harvard.edu/~kohler/latex.html