MEGAPARSEC Mac OS

haskell

Published on August 27, 2018

For a while now I’ve been working on Megaparsec 7. Due to the fact that myschedule is more saturated these days, the work hasn’t been progressing asquickly as I expected, but nevertheless I tried to spend my rare free hourson advancing it, and finally I can say that Megaparsec 7 is close torelease.

Mac OS X & macOS names. As you can see from the list above, with the exception of the first OS X beta, all versions of the Mac operating system from 2001 to 2012 were all named after big cats.
Global Nav Open Menu Global Nav Close Menu; Apple; Shopping Bag +.

Ea games for mac From high fantasy to competitive sports – you can tap into the excitement of EA's hottest Mac games! Unleash your imagination in The Sims 4, rise to power and fight epic battles in Dragon Age II, build a living world where every choice matters in SimCity, and more.

The post is about the most obvious things a user will run into whenupgrading. It does not attempt to walk through all the changes, for thatthere is a detailed changelog available. Thus, we will talkabout breaking changes and new ways of doing certain things. Finally, therea bit of benchmarking bravura, because yes, we’re now faster than ever(sometimes a bit faster than Attoparsec).

Simple changes

The good but boring changes you need to know about are the following…

`parser-combinators` grows, `megaparsec` shrinks

Megaparsec always contained quite a bit of code that could work with anyParsec-like library. I felt like a shame not to make it available for otherpackages to use. So, some time ago I started theparser-combinators package which provides commonparsing commbinators that work with any instance of Applicative,Alternative, Monad. It’s quite general and depends virtually only onbase. Recently I included the code to do parsing of permutation phrasesand expressions, so we’re now able to drop Text.Megaparsec.Perm andText.Megaparsec.Expr from Megaparsec itself:

Text.Megaparsec.Perm → Control.Applicative.Permutations
Text.Megaparsec.Expr → Control.Monad.Combinators.Expr

This actually means that you can use these modules with e.g. Attoparsec (Ihaven’t tried though). I think it’s pretty cool.

General combinators have been moved

There were a few combinators in Text.Megaparec.Char andText.Megaparsec.Byte that are actually not specific to input stream typeand should live in the Text.Megaparsec module. So they have been moved.And renamed.

Now there is the single combinator that is a generalization of charfor arbitrary streams. Text.Megaparsec.Char and Text.Megaparsec.Bytestill contain char as type-constrained versions of single.
Similarly, now there is the chunk combinator that is a generalization ofstring for arbitrary streams. The string combinator is stillre-exported from Text.Megaparsec.Char and Text.Megaparsec.Byte forcompatibility.
satisfy does not depend on type of token, and so it now lives inText.Megaparsec.
anyChar was renamed to anySingle and moved to Text.Megaparsec.
notChar was renamed to anySingleBut and moved to Text.Megaparsec.
oneOf and noneOf were moved to Text.Megaparsec.

Parse errors story

Megaparsec 6 added the ability to display offending line from original inputstream when pretty-printing parse errors. That’s good, but the design hasalways felt as an afterthought to me:

There are three functions to pretty-print a ParseError:parseErrorPretty, parseErrorPretty', and parseErrorPretty_. The lastwas added because parseErrorPretty' actually doesn’t allow specifyingtab width which is necessary to know for proper displaying of lines withtabs.
The functions that try to display the relevant line from input streamrequire the input stream to be passed to them. Having to keep input streamaround just to be able to display nice error messages is a bitinconvenient. In one package I even had to define a product ofParseError and Text to work around this.
I think mmark is a nice example of what Megaparsec can do. But italso showed the limitations of the parsing library. mmark can reportseveral ParseErrors at once, and when they are pretty-printed, wedisplay an offending line per error from the original input stream. If wejust use the functions that are provided out-of-the-box, we’ll betraversing the input stream N times, where N is the number ofParseErrors we want to display. Not nice at all!

It looks like we want:

A bundle type ParseErrorBundle that functions like parse will return.
The type should include everything that is necessary to pretty-print aparse error: tab width, input stream to use, etc.
There will be only one function to pretty print such a bundle, let’s callit errorBundlePretty.
The bundle should be able to contain several ParseErrors which aresorted. During pretty-printing it should traverse input stream only once.

So here we go:

PosState is defined like so:

This is a helper data type that allows to pretty print several ParseErrorsin one pass. Functions like runParser or parse always return only oneParseError in a bundle, but we can add more ourselves, which is what Ithink mmark will be doing.

There is a but more about PosState though, and it has to do with theperformance improvements in Megaparsec 7.

Performance improvements

I was thinking how to make Megaparsec 7 faster and simpler. One thing I didis dropping stacks of source positions, which felt good, butnot enough. So I figured: updating SourcePos in State is expensive, butpretty much a useless thing to do if a parser doesn’t fail.

Why is it useless?

We only care about SourcePos when we want to present ParseErrors tohumans. For everything else a simple Int offset as the number ofconsumed tokens so far is perfect.
Given input stream and things like tab width, an offset determinesuniquely the corresponding SourcePos anyway, so keepingstateTokensProcessed and statePos at the same time is a waste.
We already traverse input stream when we pretty-print parse errors. Wecould at the same time calculate SourcePos from offsets while doingthat.

So that’s the idea:

Store Int offset instead of SourcePos position in ParseErrors.
Infer SourcePos when necessary on pretty-printing.

Guess what, this gives about 100% of speed-up on microbenchmarks (not on allof them, but on many, and that’s impressive), and this does transform intoperformance improvements for real parsers too.

Megaparsec Mac Os Catalina

Here is the older benchmark comparing Attoparsec andMegaparsec. I used it to compare Attoparsec vs Megaparsec 6 vs Megaparsec 7.Here is a table which shows simplified results (run on my laptop):

Megaparsec Mac Os Download

Benchmark	Attoparsec 0.13.2.2	Megaparsec 6.5.0	Megaparsec 7.0.0
CSV (40)	99.62 μs	137.2 μs	82.75 μs
Log (40)	429.4 μs	577.4 μs	453.8 μs
JSON (40)	27.01 μs	48.81 μs	33.68 μs

Notably, Megparsec 7 beats Attoparec on the CSV benchmark now. It’s writtenquite naively of course, if I remember correctly I stole it from someAttoparsec or Parsec tutorial, but still it demonstrates that the machineryin the foundation of the library is getting quite speedy.

Megaparsec Mac Os Update

Memory (showing allocations because max residency is constant and quite lowin all cases):

Benchmark	Attoparsec 0.13.2.2	Megaparsec 6.5.0	Megaparsec 7.0.0
CSV (40)	397,952	557,312	357,208
Log (40)	1,181,120	1,485,776	1,246,496
JSON (40)	132,488	233,328	203,824

Now you probably understand the temptation. But there was also theconservative part of me which said: “but hey, people are going to want toget source position from a working parser to attach it to AST or something,and what about indentation-sensitive parsing which needs to know columnnumbers…”.

Megaparsec Mac Os X

Hell, that’s right. But we’re not going to let that spoil the party, are we?

We could always calculate SourcePos incrementally and on demand. Re-usingPosState we plug it into parser State:

Exploiting the fact that we can only move forward in input stream, we canwrite:

Where reachOffset is a new method of Stream that replaces all the oldmethods that had to do with keeping track of source position. At the sametime reachOffset fetches String representation of the right line ininput to show in parse errors. And it’s tuned to be incremental, so onlynot-previously-traversed part of input will be processed. I have confirmedon projects like mmark that even if you use getSourcePos, there is noperformance regressions, performance stays the same in that case (that’s ifyou don’t call getSourcePos on every token, which is a bad idea).

Conclusion

I think that these two changes (parse error bundles and using offsets)complement each other rather well and make the library a lot nicer.

Let me know what you think. It’ll take some time to finish up the wholething, so if you have a concern about the changes I described, please tellme about it. Once again, the full changelog (so far) is here.

Please enable JavaScript to view the comments powered by Disqus.

MacOSX 10.5 (Leopard) and above (with Intel processor) - With the introduction of MEGA5, a MacOSX version has been released. You can download it on the main website.
MacOSX 10.4 and below / PPC processor - Users who want to use MEGA on Mac Classic / MacOSX 10.4 or below are encouraged to use a virtual machine to emulate Microsoft Windows. We have found that MEGA performs very well running within Virtual PC version 5 or higher on systems running either OS 9 or OS X. If you are using an Intel-based Mac, then you can also use the Parallels Desktop virtualization system, VmWare Fusion, VirtualBox or Apple's Boot Camp solution.