Introducing Pfarah
This document is a quick overview of the most important features of Pfarah. You can also get this page as an F# script file from GitHub and run the samples interactively. Type annotations are used in examples to ease understanding
There are two ways to parse data. The first way is to parse a specific file, and the second is to parse a given string. The norm should be parsing a file, but for the sake of the tutorial it is shown the second way.
This library takes inspiration from FSharp.Data JSON, Chessie, Chiron, Fleece, Haskell, and json4s.
Quickstart
We're going to start with querying parsed data and as an example, the data will represent a ship named bessie with 22 men onboard.
1: 2: 3: |
|
The next step is extracting the information. There are many ways to accomplish this and each one will be demonstrated. Which one you choose will be personal preference. As the demonstrations become concise they introduce more advanced concepts, so initial examples will use simple constructs.
ParaValue
is a discriminated union and the first example queries it
directly for more information.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: |
|
Whoa! Those are a lot of data types. To see each of the data types in use, see the Data Format page.
Instead of exhaustively enumerating all the cases every time we query the data, the next example will use F# default case and print the name of the ship
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
By now, being flustered with how cumbersome and verbose the examples is ok. This will be fixed by introducing some additional concepts and types. First, the previous example rewritten and an explanation after.
1: 2: 3: 4: 5: |
|
Already we've gone from nine lines down to five, and it's all possible because
the values returned by the functions are now wrapped in ParaResult<T>
, which
is a very simple type encapsulating either a value or an error. The
ParaValue.get
either returns the value of the field with a key of "name" or
an error. This error, which is not an exception, could be anything from obj
not being a record to there not being a single field with a key of "name"
For similar data types (to name a few) see:
- Rust's Result
- Scala's Either
- Haskell's Either
- Go's multiple return values with the error type
- F#'s Choice
In fact, ParaResult is defined as an alias for Choice<'a,string>
, so any
libraries or utilities that work with choices like
ExtCore can interface seamlessly.
There is one possible question remaining for the unaccustomed and that is
ParaResult.bind
:
1:
|
|
Bind checks to see if the result passed in (nameVal) is an error or a result.
If nameVal is an error (get
failed earlier) then the error is propogated.
Else if there is a value contained, a function is applied to the value
(in this case ParaValue.String "bessie"). The function being applied is
ParaValue.asString
, which unwraps the value so that just "bessie" is
exposed.
The implementation of bind is quite concise and may prove illustrative
1: 2: 3: 4: 5: 6: 7: 8: |
|
Bind allows Pfarah to define a custom computation expression, which
induces syntax sugar using let!
and return!
so that one doesn't have
to deal with ParaResult explicitly
1: 2: 3: 4: 5: 6: 7: 8: |
|
While the number of lines of code in the example grew, computation expressions start to shine when the queries become complex. Instead of extracting just the name, extract the number of men on the ship (strength)
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
The potential is starting to show. Still some cruft is getting in the way, which can be solved by defining custom operators:
?
is aliased toParaValue.get
>>=
is aliased toParaResult.bind
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: |
|
Dealing with Multiple Ships
Bessie isn't the only ship in the world. Our data is about to get more complex, but don't worry, Pfarah will be there every step of the way.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: |
|
Those who like the computation builder don't have to miss out!
1: 2: 3: 4: 5: 6: |
|
But why the different functions?
collect vs getAll: Both accept a ParaValue and look for a properties of a certain key, but collect will also work on ParaValue.Array by iterating over each element looking for the key. collect returns a ParaValue.Array, which allows subsequent calls to be chained together:
1: 2: 3: 4: |
|
Defined in the operator module there is the /
operator that will delegates
to ParaValue.collect
. For those that are familiar with xpath, this should
appear similar
1: 2: |
|
Going back to the commputation builder vs the pipeline method, another
difference is the function that parseShip
is passed to. The computation
expression only works with arrays of ParaValue whereas like collect
,
the pipeline method operates on the values of Records, and can map
singular values like ParaValue.String, etc.
Knowing which one to use is sometimes only a matter of taste.
Optional Data
Not all objects of a given instance will have the exact same property keys. Some may only have a limited subset of the properties wanted.
In our ship example, we'll have an optional property, patrol, that denotes if a ship is on patrol. If absent, the ship is assumed to not be on patrol.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: |
|
Finding Optional Data
Knowing the data is the first step to any type of analysis. This is made
difficult when there can be thousands of objects, each one having a subset of
the properties available. findOptional
fixes this problem by dissecting a
list of supposedly similar objects and returning the properties that it knows
are always present and the ones that are optional.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
|
Deserialization
Working with primitives like string and ints are fine, but programs become much more powerful when compositive data types come into play. While the previous methods allow for manual deserialization Pfarah offers another step of convenience.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: |
|
Full stop. There's a lot of magic in the previous example including a couple unseen operators.
First, the type annotation on ships
is critical, without it the compiler
won't know what to deserialize the type to and raise a compiler error. Pfarah
know how to deserialize an array, so it then proceeds to look at the element
type. Primitives like string and ints are no problem, but Ship
is new. As
long as Ship
implements a function FromPara
, Pfarah can deserialize it.
This is known as statically resolved type parameters, and it is a very
dark corner of F#.
But let's take a step back because we can use this magic in baby steps
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: |
|
That probably looks and feels a lot better for the uninitiated. We can rewrite the magic parts with our new function.
1: 2: 3: 4: 5: 6: 7: 8: |
|
Pretty neat right? We can throw in our optional patrol pretty easily if you know the right function!
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
Binary Data
The examples that we have been working with have been plain text, but Clausewitz files can come compressed and in binary form. To parse these files, we'll need a few things:
- The file path.
- The header if it is binary file. For instance, for EU4, the header is "EU4bin".
- The header if it is plain text file. For EU4, the header is "EU4txt".
-
Since binary files use two byte tokens instead of strings for identifiers,
we'll need a dictionary of two byte tokens to strings so that the binary
file can be queried exactly like a plain text file. There are many types of
tokens that can be encountered, so as not to impose a memory tax
unnecessarily if it is a plain text file, the dictionary is
lazy
The following code sample will work for a file that is in any format (plain text/binary and compressed/uncompressed)
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
val obj : ParaValue
Full name: Tutorial.obj
--------------------
type obj = Object
Full name: Microsoft.FSharp.Core.obj
module ParaValue
from Pfarah
--------------------
type ParaValue =
| Bool of bool
| Number of float
| Hsv of float * float * float
| Rgb of byte * byte * byte
| Date of DateTime
| String of string
| Array of elements: ParaValue []
| Record of properties: (string * ParaValue) []
override ToString : unit -> string
static member Load : stream:Stream * binHeader:string * txtHeader:string * lookup:Lazy<IDictionary<int16,string>> -> ParaValue
static member Load : file:string * binHeader:string * txtHeader:string * lookup:Lazy<IDictionary<int16,string>> -> ParaValue
static member LoadBinary : stream:Stream * lookup:IDictionary<int16,string> * header:string option -> ParaValue
static member LoadText : file:string -> ParaValue
static member LoadText : stream:Stream -> ParaValue
static member Parse : text:string -> ParaValue
static member private Prettify : value:ParaValue -> indent:int -> string
static member Save : stream:Stream * data:ParaValue -> unit
static member private collect : prop:string -> obj:ParaValue -> ParaValue
static member ( / ) : obj:ParaValue * propertyName:string -> ParaValue
Full name: Pfarah.ParaValue
--------------------
type ParaValue<'a> = ParaValue -> ParaResult<'a> * ParaValue
Full name: Pfarah.ParaValue<_>
val string : value:'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = String
Full name: Microsoft.FSharp.Core.string
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.map
Full name: Microsoft.FSharp.Core.Operators.fst
Full name: Microsoft.FSharp.Core.Operators.failwith
Full name: Microsoft.FSharp.Core.bool
type DateTime =
struct
new : ticks:int64 -> DateTime + 10 overloads
member Add : value:TimeSpan -> DateTime
member AddDays : value:float -> DateTime
member AddHours : value:float -> DateTime
member AddMilliseconds : value:float -> DateTime
member AddMinutes : value:float -> DateTime
member AddMonths : months:int -> DateTime
member AddSeconds : value:float -> DateTime
member AddTicks : value:int64 -> DateTime
member AddYears : value:int -> DateTime
...
end
Full name: System.DateTime
--------------------
DateTime()
(+0 other overloads)
DateTime(ticks: int64) : unit
(+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : unit
(+0 other overloads)
val float : value:'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float = Double
Full name: Microsoft.FSharp.Core.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
val byte : value:'T -> byte (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.byte
--------------------
type byte = Byte
Full name: Microsoft.FSharp.Core.byte
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.tryFind
Full name: Tutorial.nameVal
module ParaResult
from Pfarah
--------------------
type ParaResult<'a> = Choice<'a,string>
Full name: Pfarah.ParaResult<_>
Full name: Pfarah.ParaValue.get
Full name: Tutorial.name
Full name: Pfarah.ParaResult.bind
Full name: Pfarah.ParaValue.asString
val Ok : x:'a -> ParaResult<'a>
Full name: Pfarah.ParaResultImpl.Ok
--------------------
active recognizer Ok: ParaResult<'Ok> -> ParaResult<'Ok>
Full name: Pfarah.ParaResultImpl.( |Ok|Error| )
val Error : x:string -> ParaResult<'a>
Full name: Pfarah.ParaResultImpl.Error
--------------------
active recognizer Error: ParaResult<'Ok> -> ParaResult<'Ok>
Full name: Pfarah.ParaResultImpl.( |Ok|Error| )
Full name: tutorial.name
val string : value:'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = System.String
Full name: Microsoft.FSharp.Core.string
Full name: Tutorial.bind
Full name: Tutorial.name2
Full name: Pfarah.ParaBuilder.para
Full name: Tutorial.data
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Full name: Pfarah.ParaValue.asInteger
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printf
from Pfarah
module ParaValue
from Pfarah
--------------------
type ParaValue<'a> = ParaValue -> ParaResult<'a> * ParaValue
Full name: Pfarah.ParaValue<_>
--------------------
type ParaValue =
| Bool of bool
| Number of float
| Hsv of float * float * float
| Rgb of byte * byte * byte
| Date of DateTime
| String of string
| Array of elements: ParaValue []
| Record of properties: (string * ParaValue) []
override ToString : unit -> string
static member Load : stream:Stream * binHeader:string * txtHeader:string * lookup:Lazy<IDictionary<int16,string>> -> ParaValue
static member Load : file:string * binHeader:string * txtHeader:string * lookup:Lazy<IDictionary<int16,string>> -> ParaValue
static member LoadBinary : stream:Stream * lookup:IDictionary<int16,string> * header:string option -> ParaValue
static member LoadText : file:string -> ParaValue
static member LoadText : stream:Stream -> ParaValue
static member Parse : text:string -> ParaValue
static member private Prettify : value:ParaValue -> indent:int -> string
static member Save : stream:Stream * data:ParaValue -> unit
static member private collect : prop:string -> obj:ParaValue -> ParaValue
static member ( / ) : obj:ParaValue * propertyName:string -> ParaValue
Full name: Pfarah.ParaValue
Full name: Tutorial.shipsObj
Full name: Tutorial.parseShip
Full name: Tutorial.pips
Full name: Pfarah.ParaValue.collect
Full name: Tutorial.extract
Full name: Pfarah.ParaValue.flatMap
Full name: Tutorial.sorted
Full name: Pfarah.ParaResult.map
Full name: Microsoft.FSharp.Collections.Array.sortByDescending
Full name: Microsoft.FSharp.Core.Operators.snd
Full name: Microsoft.FSharp.Collections.Array.iter
Full name: Pfarah.ParaValue.getAll
Full name: Pfarah.ParaValue.reduce
Full name: Tutorial.patrolData
Full name: Tutorial.parseShip2
Full name: Microsoft.FSharp.Core.option<_>
Full name: Pfarah.ParaValue.tryGet
Full name: Pfarah.ParaResult.defaultOpt
Full name: Pfarah.ParaValue.asBool
val obj : ParaValue
--------------------
type obj = Object
Full name: Microsoft.FSharp.Core.obj
Full name: Microsoft.FSharp.Collections.Array.filter
Full name: Microsoft.FSharp.Collections.Array.map
Full name: Pfarah.ParaValue.asRecord
Full name: Pfarah.ParaExtensions.findOptional
Full name: Microsoft.FSharp.Collections.Seq.iter
Full name: Tutorial.multipleShips
{Name: string;
Strength: int;}
static member Create : name:string -> strength:int -> Ship
static member FromPara : Ship -> ParaValue<Ship>
Full name: Tutorial.Ship
Full name: Tutorial.Ship.Create
Full name: Tutorial.Ship.FromPara
Full name: Tutorial.ships
Full name: Pfarah.Functional.deserialize
Full name: Tutorial.parseShip3
Full name: Pfarah.Functional.fromPara
Full name: Tutorial.parseShip4
Full name: Pfarah.Functional.pget
Full name: Tutorial.parseShip5
{Name: string;
Strength: int;}
static member FromPara : Ship2 -> (ParaValue -> ParaResult<Ship2> * ParaValue)
Full name: Tutorial.Ship2
Full name: Tutorial.Ship2.FromPara
from Pfarah
Full name: Pfarah.ApplicativeParaValue.wrap
{Name: string;
Strength: int;
Patrol: bool option;}
static member FromPara : Ship3 -> (ParaValue -> ParaResult<Ship3> * ParaValue)
Full name: Tutorial.Ship3
Full name: Tutorial.Ship3.FromPara
Full name: Pfarah.Functional.tryPget
Full name: Tutorial.path
Full name: Tutorial.( binary header )
Full name: Tutorial.( text header )
Full name: Tutorial.tokens
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.dict
Full name: Tutorial.game
static member ParaValue.Load : file:string * binHeader:string * txtHeader:string * lookup:Lazy<Collections.Generic.IDictionary<int16,string>> -> ParaValue
Full name: Pfarah.ParaValue.asDate