MEGAPARSEC Mac OS
Published on August 27, 2018
For a while now I’ve been working on Megaparsec 7. Due to the fact that myschedule is more saturated these days, the work hasn’t been progressing asquickly as I expected, but nevertheless I tried to spend my rare free hourson advancing it, and finally I can say that Megaparsec 7 is close torelease.
- Mac OS X & macOS names. As you can see from the list above, with the exception of the first OS X beta, all versions of the Mac operating system from 2001 to 2012 were all named after big cats.
- Global Nav Open Menu Global Nav Close Menu; Apple; Shopping Bag +.
Ea games for mac From high fantasy to competitive sports – you can tap into the excitement of EA's hottest Mac games! Unleash your imagination in The Sims 4, rise to power and fight epic battles in Dragon Age II, build a living world where every choice matters in SimCity, and more.
The post is about the most obvious things a user will run into whenupgrading. It does not attempt to walk through all the changes, for thatthere is a detailed changelog available. Thus, we will talkabout breaking changes and new ways of doing certain things. Finally, therea bit of benchmarking bravura, because yes, we’re now faster than ever(sometimes a bit faster than Attoparsec).
Simple changes
The good but boring changes you need to know about are the following…
parser-combinators
grows, megaparsec
shrinks
Megaparsec always contained quite a bit of code that could work with anyParsec-like library. I felt like a shame not to make it available for otherpackages to use. So, some time ago I started theparser-combinators
package which provides commonparsing commbinators that work with any instance of Applicative
,Alternative
, Monad
. It’s quite general and depends virtually only onbase
. Recently I included the code to do parsing of permutation phrasesand expressions, so we’re now able to drop Text.Megaparsec.Perm
andText.Megaparsec.Expr
from Megaparsec itself:
Text.Megaparsec.Perm
→Control.Applicative.Permutations
Text.Megaparsec.Expr
→Control.Monad.Combinators.Expr
This actually means that you can use these modules with e.g. Attoparsec (Ihaven’t tried though). I think it’s pretty cool.
General combinators have been moved
There were a few combinators in Text.Megaparec.Char
andText.Megaparsec.Byte
that are actually not specific to input stream typeand should live in the Text.Megaparsec
module. So they have been moved.And renamed.
Now there is the
single
combinator that is a generalization ofchar
for arbitrary streams.Text.Megaparsec.Char
andText.Megaparsec.Byte
still containchar
as type-constrained versions ofsingle
.Similarly, now there is the
chunk
combinator that is a generalization ofstring
for arbitrary streams. Thestring
combinator is stillre-exported fromText.Megaparsec.Char
andText.Megaparsec.Byte
forcompatibility.satisfy
does not depend on type of token, and so it now lives inText.Megaparsec
.anyChar
was renamed toanySingle
and moved toText.Megaparsec
.notChar
was renamed toanySingleBut
and moved toText.Megaparsec
.oneOf
andnoneOf
were moved toText.Megaparsec
.
Parse errors story
Megaparsec 6 added the ability to display offending line from original inputstream when pretty-printing parse errors. That’s good, but the design hasalways felt as an afterthought to me:
There are three functions to pretty-print a
ParseError
:parseErrorPretty
,parseErrorPretty'
, andparseErrorPretty_
. The lastwas added becauseparseErrorPretty'
actually doesn’t allow specifyingtab width which is necessary to know for proper displaying of lines withtabs.The functions that try to display the relevant line from input streamrequire the input stream to be passed to them. Having to keep input streamaround just to be able to display nice error messages is a bitinconvenient. In one package I even had to define a product of
ParseError
andText
to work around this.I think mmark is a nice example of what Megaparsec can do. But italso showed the limitations of the parsing library.
mmark
can reportseveralParseError
s at once, and when they are pretty-printed, wedisplay an offending line per error from the original input stream. If wejust use the functions that are provided out-of-the-box, we’ll betraversing the input stream N times, where N is the number ofParseError
s we want to display. Not nice at all!
It looks like we want:
A bundle type
ParseErrorBundle
that functions likeparse
will return.The type should include everything that is necessary to pretty-print aparse error: tab width, input stream to use, etc.
There will be only one function to pretty print such a bundle, let’s callit
errorBundlePretty
.The bundle should be able to contain several
ParseError
s which aresorted. During pretty-printing it should traverse input stream only once.
So here we go:
PosState
is defined like so:
This is a helper data type that allows to pretty print several ParseError
sin one pass. Functions like runParser
or parse
always return only oneParseError
in a bundle, but we can add more ourselves, which is what Ithink mmark
will be doing.
There is a but more about PosState
though, and it has to do with theperformance improvements in Megaparsec 7.
Performance improvements
I was thinking how to make Megaparsec 7 faster and simpler. One thing I didis dropping stacks of source positions, which felt good, butnot enough. So I figured: updating SourcePos
in State
is expensive, butpretty much a useless thing to do if a parser doesn’t fail.
Why is it useless?
We only care about
SourcePos
when we want to presentParseErrors
tohumans. For everything else a simpleInt
offset as the number ofconsumed tokens so far is perfect.Given input stream and things like tab width, an offset determinesuniquely the corresponding
SourcePos
anyway, so keepingstateTokensProcessed
andstatePos
at the same time is a waste.We already traverse input stream when we pretty-print parse errors. Wecould at the same time calculate
SourcePos
from offsets while doingthat.
So that’s the idea:
Store
Int
offset instead ofSourcePos
position inParseError
s.Infer
SourcePos
when necessary on pretty-printing.
Guess what, this gives about 100% of speed-up on microbenchmarks (not on allof them, but on many, and that’s impressive), and this does transform intoperformance improvements for real parsers too.
Megaparsec Mac Os Catalina
Here is the older benchmark comparing Attoparsec andMegaparsec. I used it to compare Attoparsec vs Megaparsec 6 vs Megaparsec 7.Here is a table which shows simplified results (run on my laptop):
Megaparsec Mac Os Download
Benchmark | Attoparsec 0.13.2.2 | Megaparsec 6.5.0 | Megaparsec 7.0.0 |
---|---|---|---|
CSV (40) | 99.62 μs | 137.2 μs | 82.75 μs |
Log (40) | 429.4 μs | 577.4 μs | 453.8 μs |
JSON (40) | 27.01 μs | 48.81 μs | 33.68 μs |
Notably, Megparsec 7 beats Attoparec on the CSV benchmark now. It’s writtenquite naively of course, if I remember correctly I stole it from someAttoparsec or Parsec tutorial, but still it demonstrates that the machineryin the foundation of the library is getting quite speedy.
Megaparsec Mac Os Update
Memory (showing allocations because max residency is constant and quite lowin all cases):
Benchmark | Attoparsec 0.13.2.2 | Megaparsec 6.5.0 | Megaparsec 7.0.0 |
---|---|---|---|
CSV (40) | 397,952 | 557,312 | 357,208 |
Log (40) | 1,181,120 | 1,485,776 | 1,246,496 |
JSON (40) | 132,488 | 233,328 | 203,824 |
Now you probably understand the temptation. But there was also theconservative part of me which said: “but hey, people are going to want toget source position from a working parser to attach it to AST or something,and what about indentation-sensitive parsing which needs to know columnnumbers…”.
Megaparsec Mac Os X
Hell, that’s right. But we’re not going to let that spoil the party, are we?
We could always calculate SourcePos
incrementally and on demand. Re-usingPosState
we plug it into parser State
:
Exploiting the fact that we can only move forward in input stream, we canwrite:
Where reachOffset
is a new method of Stream
that replaces all the oldmethods that had to do with keeping track of source position. At the sametime reachOffset
fetches String
representation of the right line ininput to show in parse errors. And it’s tuned to be incremental, so onlynot-previously-traversed part of input will be processed. I have confirmedon projects like mmark
that even if you use getSourcePos
, there is noperformance regressions, performance stays the same in that case (that’s ifyou don’t call getSourcePos
on every token, which is a bad idea).
Conclusion
I think that these two changes (parse error bundles and using offsets)complement each other rather well and make the library a lot nicer.
Let me know what you think. It’ll take some time to finish up the wholething, so if you have a concern about the changes I described, please tellme about it. Once again, the full changelog (so far) is here.
Please enable JavaScript to view the comments powered by Disqus.
|