Remote execution and map-reduce: distributed computing for Transient https://github.com/transient-haskell/transient-universe
|Version on this page:||0.3.5.1|
|LTS Haskell 11.10:||0.4.6.1|
|Stackage Nightly 2018-03-12:||0.4.6.1|
|Latest on Hackage:||0.5.0.0|
See the Wiki
transient-universe is the distributed computing extension of transient and uses transient primitives heavily for parsing, threading, event handling, exception handling, messaging etc. It support moving computations between Haskell closures in different computers in the network. Even among different architectures: Linux nodes can work with windows and browser nodes running haskell compiled with ghcjs.
The primitives that perform the moving of computations are called
teleport, the names express the semantics. Hence the name of the package.
All the nodes run the same program compiled for different architectures. It defines a Cloud computation (monad). It is a thin layer on top of transient with additional primitives and services that run a single program in one or many nodes.
main= keep . initNode $ inputNodes <|> mypPogram myProgram :: Cloud () myProgram= do nodes <- local getNodes guard $ length nodes > 1 let node2= nodes !! 1 r <- runAt node2 . local $ waitEvents getLine localIO $ print r
This program will stream and print any text that you input in the console of the node 2.
To know how to initialize the nodes, see the section of the Tutorial
Browser nodes, running transient programs compiled with ghcjs are integrated with server nodes, using websockets for communication. Just compile the program with ghcjs and point the browser to http://server:port. The server nodes have a HTTP server that will send the compiled program to the browser.
Distributed Browser/server Widgets
Browser nodes can integrate a reactive client side library based in trasient (package axiom). These widgets can create widgets with HTML form elements and control the server nodes. A computation can move from browser to server and back despite the different architecture.
Widgets with code running in browser and servers can compose with other widgets. A Browser node can gain access to many server nodes trough the server that delivered the web application.
These features can make transient ideal for client as well as server side-driven applications, whenever distribution and push-driven reactivity is necessary either in the servers or in the browser clients.
The last release add
- Hooks for secure communications: with transient-universe-tls package, a node can use TLS to connect with other nodes, including web nodes. If the connection of a web node is initiated with "https" the websocket connection uses secure communications (wss). The only primitive added is
- Client websocket connections to connect with nodes within firewalled servers: a server node can connect with another situated after a HTTP server. All the process is transparent and add no new primitive; First
connecttries a TCP socket connection if it receives other message than "OK", it tries a connection as a websocket client. This is important for P2P connections where a central server acts as coordinator. websocket connections can use TLS communications too.
- No network traffic when a node invokes itself
transient-universe implements map-reduce in the style of spark as a particular case. It is at the same time a hard test of the distributed primitives since it involves a complex choreography of movement of computations. It supports in memory operations and caching. Resilience (restart from the last checkpoint in case of failure) is not implemented but it is foreseen.
Look at this article
There is a runnable example: DistrbDataSets.hs that you can executed with:
It uses a number of simulated nodes to calculate the frequency of words in a long text.
Services communicate two different transient applications. This allows to divide the running application in different independent tiers. No documentation is available yet. Sorry.
General distributed primitives
teleport is a primitive that translates computations back and forth reusing an already opened connection.
The connection is initiated by
wormhole with another node. This can be done anywhere in a computation without breaking composability. As always, Everything is composable.
Both primitives support also streaming among nodes in an efficient way. It means that a remote call can return not just a single response, but many of them.
All the other distributed primitives:
clustered etc are rewritten in terms of these two.
How to run the ghcjs example:
See the distributed examples in the transient-examples repository
See this video to see this example running:
The test program run among other things, two copies of a widget that start, stop and display a counter that run in the server.
The Wiki is more user oriented
My video sessions in livecoding.tv not intended as tutorials or presentations, but show some of the latest features running.
The articles are more technical:
- Philosophy, async, parallelism, thread control, events, Session state
- Backtracking and undoing IO transactions
- Non-deterministic list like processing, multithreading
- Distributed computing
- Publish-Subscribe variables
- Distributed streaming, map-reduce
These articles contain executable examples (not now, since the site no longer support the execution of Haskell snippets).
The only way to improve it is using it. Please send me bugs and additional functionalities!
-I plan to improve map-reduce to create a viable platform for serious data analysis and machine learning using haskell. It will have a web notebook running in the browser.
-Create services and examples for general Web applications with distributed servers and create services for them