Distributed-computing Model

Unprecedented scalability and ability to write/analyze huge data volumes.

Designed from scratch for very fast writing/retrieving

DataSonar’s storage engine, while able to replicate the classical relational entity-relationship model and implement typical database operations (e.g. inserting, indexing, caching, retrieving, filtering, sorting), intentionally leaves it aside to implement a lean, actor-based storage strategy. DataSonar has been bench-tested against, and shown to outperform, all storage engines, including relational, NoSQL (MongoDB and BerkeleyDB) and NewSQL.

To achieve the highest efficiency, DataSonar:

  • brings computing to data rather than data to computing whenever possible.
    DataSonar retrieves large amounts of data quickly, but because it depends on the limited resources available on the local computing platform, it does not scale beyond those resources. Distribution of data and computation are thus key to scalability in DataSonar, each node taking care of its own stored data. Usually known as data sharding, this balances out the system regarding data storage and computing resources.
  • delays computing until it’s actually needed, rather than verifying later that it wasn’t.
    DataSonar’s stream management saves time.
  • stores a computation so it may be reused.
    Computation results are stored under a key, in the memory of processing nodes, or persistently on the database. If a computation is needed again, it is retrieved, not recomputed. DataSonar has mechanisms to store large amounts of data under a key that identify specific computations.

Distributed computation

A common strategy to distribute computation over many nodes is MapReduce, which maps computations on single nodes and then reduces the partial results in a second step on master nodes that may or may not be part of an additional mapping step.
More often than not, this is an easy but not the best approach to tackle many problems. A serious shortcoming is that MapReduce can only be used for incremental algorithms; it cannot be used to perform derivatives, integrals, Fourier transforms, and other mathematical techniques heavily used in statistics and machine learning. This is where DataSonar is different.

Complex Event Processor

DataSonar’s Complex Event Processor models the system as a directed graph where vertices represent computation nodes through which intermediate results are passed. Vertices may all perform the same task (and would then represent MapReduce’s mappers) or may be specialized to perform specific tasks. The Complex Event Processor engine keeps track of state, so that each node is able to forward its result to the correct follow-up node.
Events may be complex, that is, they depend on the input at hand, the global system state and the local state of the computing node.

 

See DataSonar Machine Learning features.

datasonar_logo_azul
© DataSonar 2016