That looks great, but it's hard to position it as a regex competitor. I should b...

YeGoblynQueenne · on Sept 9, 2019

"My thing here" is a grammar so it describes a class of strings. Perhaps this is why it looks to you unlike what it describes?

You can make a DCG rule as specific or as general as you like. As a for instance, this is a vim regex I retrieved from my recent history:

  \[13\/13,15\/12,24-24]

You could write this like so in DCG notation:

  s --> ['[',13,'/',13,,,15,'/',12,,,24,-,24,']'].

And that would match the string "[13/13,15/12,24-24]", no less, no more.

DCGs are Turing-complete, so you can go all the way from programs to finite automata when you write a pattern to match.

They're not really a regex competitor. They were invented in the '70s as a formalism for context-free grammars to be used in representing natural language. They fit right into the logic programming language Prolog that was created soon after (and by some of the same people).

andrewflnr · on Sept 9, 2019

Well, we started this subthread looking for alternatives to regex, so if DCGs aren't in the running I'm not sure what your point is. Also, I happen to believe Turing-completeness is a misfeature for this job.

YeGoblynQueenne · on Sept 9, 2019

The OP asked for "something as easy to use as regular expressions that can processes context-free grammars". You requested ease-of-use. I proposed DCGs. They are not an alternative to regexes in the sense that regexes can't represent anything beyond regular grammars, including CFGs. They can be as simple or as complicated as you want them and they are easy to use.

I'm not sure what is a "misfeature". What do you mean?

Edit: Apologies if I sound too terse. I'm confused by the terminology of "competitor", "alternative" etc. Are we in some kind of competition to find a technological solution that will take some prize? If so, I'd like to know the rules before I commit to any solution. What exactly are we trying to achieve here?

andrewflnr · on Sept 10, 2019

My interpretation of "as easy to use as regular expressions" is that the desired... formalism be usable in the same contexts and with the same casual ease as regex. Think Unix command lines, hairy ad-hoc Python parsing scripts, etc.

One of the things that makes regex so well suited to that role is that a string is a regex that matches itself, and you can iteratively add sophistication from there. At the very least, you would want to maintain (or improve, Cthulhu knows there's plenty of room) that incremental quality in any proposed replacement, while increasing its power.

It looks like with a DCG, you have to know you're writing one up front, and have to think about what you're parsing at a much more abstract level. The average sysadmin could probably not casually toss one off for a log parsing task. If there's an alternate syntax that gets around that problem, then I'm interested. OTOH...

Re "misfeature": one of the cool things about existing parsing formalisms is all the known terminating, mostly (all?) polynomial algorithms for analyzing them. If DCGs are Turing complete, they don't have those algorithms. Turing completeness is not usually a property I want in my parser. That's what I meant by saying it's a misfeature.

YeGoblynQueenne · on Sept 10, 2019

I don't quite understand what you mean by "a regex that matches itself". Could you explain? What you describe sound smore like a quine.

Regarding Turing completeness and termination- the DCG formalism (thank you) itself is expressive enough to represent anything from finite automata to UTMs, but that doesn't mean that every grammar you write using DCG notation is Turing-complete. So, for instance, if you write a right-regular grammar as a DCG, that DCG will not be Turing-complete, it'd just be a right-regular grammar.

The DCG examples I gave above are a CFG grammar for a fragment of natural English and a single-rule grammar that matches a single string. Those are definitely not Turing-complete and parsable in polynomial time.

The difference with regexes is that you can't express CFGs or above using regexes, but you _can_ represent both regexes and CFGs using DCG notation.

>> If there's an alternate syntax that gets around that problem, then I'm interested. OTOH...

May I make a personal comment? I think your insistence on competitive language, like "i'm [not] interested", "competitor to regexes" etc, is a case of Déformation professionnelle. You sound just like a software developer hyper-focused on finding tools to maximise productivity in the office, and nothing else.

You should perhaps consider the possibilty that what we are discussing here goes a bit beyond a product that you can package and sell as an alternative to a popular tool. I mean, personally, when I realised that DCGs are executable grammars that can be run as both recognisers and generators- well let's say it shifted my understanding of what is possible to do with a computer and a programming language.

In any case I don't see the complexity you seem to see in DCGs. Like I say above, they're basically BNF. I struggle to think of a sysadmin worth her salt (so, one who knows perl, eh?) who would sweat it to write and read BNF. The kind of sysadmin I have in mind, the problem would be to drag them away from the keyboard, once they started writing a DCG to parse a log file- and realised they could write one to parse _all_ log files ever.

andrewflnr · on Sept 12, 2019

"foo" is a string. It's also a regex that matches exactly the string "foo".

Regarding Turing-completeness, the entire point is about the termination of algorithms that examine parsers, not the runtime of the parsers themselves.

I also don't know how many times I need to point out that the reason I'm focusing on regexes is that that was the context for this conversation. Under any other context I'm quite interested in new parsing techniques.

Lastly, I would point out that regexes have uses far beyond a certain arbitrarily decided set of sysadmins and developers "worth their salt". Even if it's true that a meaningful subset of sysadmins are capable of writing basically sound BNF (which is not something I would count on even for CS grads, but maybe you work with smarter people than I do), there are lots of other people who could use a little more parsing power if it was offered in the right way.