Parallel distributed discrete event simulation module for the Aivika library http://www.aivikasoft.com
|Latest on Hackage:||1.4|
This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.
This package extends the aivika-transformers  package and allows running parallel distributed simulations. It uses an optimistic strategy known as the Time Warp method. To synchronize the global virtual time, it uses Samadi's algorithm.
Moreover, this package uses the author's modification that allows recovering the distributed simulation after temporary connection errors whenever possible. For that, you have to enable explicitly the recovering mode and enable the monitoring of all logical processes including the specialized Time Server process as it is shown in one of the test examples included in the distribution.
With the recovering mode enabled, you can try to build a distributed simulation using ordinary computers connected via the ordinary net. For example, such a distributed model could even consist of computers located in different continents of the Earth, where the computers could be connected through the Internet. Here the most exciting thing is that this is the optimistic distributed simulation with possible rollbacks. It is assumed that optimistic methods tend to better support the parallelism inherited in the models.
You can test the distributed simulation using your own laptop, although the package is still destined to be used with a multi-core computer, or computers connected in the distributed cluster.
There are additional packages that allow you to run the distributed simulation experiments by using the Monte-Carlo method. They allow you to save the simulation results in SQL databases and then generate a report or a set of reports consisting of HTML pages with charts, histograms, links to CSV tables, summary statistics etc. Please consult the AivikaSoft  website for more details.
Regarding the speed of simulation, the recent rough estimation is as follows. This estimation may change from version to version. For example, in version 1.0 the rollback log was rewritten, which had a significant effect.
When simulating sequential models, the speed of single logical process of the distributed module in comparison with the sequential aivika  module varies and depends essentially on the number of simultaneously processed discrete events, or the number of simultaneously running discontinuous processes, which is very close. If there are many simultaneous events, then the distributed module can be slower in 4-5 times only. The more simultaneous events are defined in the model, the less is a gap in the speed between modules. But if the simultaneous events are rare, then the distributed module can be slower even in 15 times, where the sequential module can be exceptionally fast. At the same time, the message passing between the logical processes can dramatically decrease the speed of distributed simulation, especially if the messages cause rollbacks. Then it makes sense to define the time horizon parameter. Thus, much depends on the distributed model itself.
When residing the logical processes in a computer with multi-core processor, you should follow the next recommendations. You should reserve at least 1 core for each logical process, or even reserve 2 cores if the logical process extensively sends and receives messages. Also you should additionally reserve at least 1 or 2 cores for each computational node. These additional processor cores will be used by the GHC run-time system that includes the garbage collector as well. The Aivika distributed module creates a huge amount of short-living small objects. Therefore, the garbage collector needs at least one core to utilize efficiently these objects.
You should compile your code with options -O2 and -threaded, but then launch it with run-time options +RTS -N.
Added the leaveSimulation function that allows leaving the simulation prematurely. It was made in addition to an ability for new logical processes to enter the already running simulation.
Added the dioProcessDisconnectingEnabled flag to allow logical processes to disconnect when the monitoring is enabled, but the time server strategy implies the unregistering of disconnected logical processes. But then none process can continue sending messages to the disconnected process.
- Fixed the implementation of the fault-tolerant mode.
- Fixed a leak of monitor references that are used in the fault tolerant mode only.
- Updated the main page documentation with new recommendations.
- Fixed the documentation.
Added the time horizon parameter.
Increased a frequency of the global virtual time synchronization.
Optimized the rollback log.
Increased the default rollback log threshold.
Returned the size threshold for the output message queue.
- No more restriction on the number of output messages, which would lead to throttling.
- Provided a more precise estimation of speed of simulation.
- Updated the estimaton of speed in the description after recent changes in the sequential module.
- A more graceful termination of the time server in case of self-destruction by time-out.
- Updated so that external software tools could monitor the distributed simulation.
- Improved the stopping of the logical processes in case of shutting the cluster down.
- Added the time server and logical process strategies to shutdown the cluster in case of failure by the specified timeout intervals.
- Fixed the use of the LP abbreviation.
- Using the mwc-random package for generating random numbers by default.
Added functions expectEvent and expectProcess.
Added the Guard module.
Added an ability to restore the distributed simulation after temporary connection errors.
Better finalisation of the distributed simulation.
Implemented lazy references.
Started using Samadi’s algorithm to synchronize the global virtual time.
The logical processes must call registerDIO to connect to the time server.
Increased the default synchronization time-out and delay.
Increased the default log size threshold.