Mulgara Semantic Store

Data in electronic form is flourishing. It is, in fact, growing at a rate that makes it hard to manage. Organizations often have so much information in electronic form that it can be hard to find, access, share and reuse. Mulgara is an important part of a solution to this problem.

Metadata is information about data. For example, metadata for a word-processing document or an electronic mail message might include the author, the recipients, the subject, keywords, concepts addressed, people named, dates or places mentioned. Mulgara stores this metadata and creates relationships between it.

Mulgara implements many of the World Wide Web Consortium's Semantic Web concepts (RDF, [http://www.w3c.org/2001/sw Semantic Web). Mulgara databases hold metadata in the form of short subject-predicate-object statements, which supports the W3C's Resource Description Framework (RDF) standard.

Using SPARQL or TQL (Tucana Query Language) commands, you can query Mulgara databases and receive results that match the query. SPARQL and TQL is similar to the Structured Query Language (SQL) used to query relational databases, with some significant differences due to the way data is stored in Mulgara. Like relational databases, Mulgara can be used as an underlying data repository for software applications. TQL pre-dates SPARQL by several years and offers different functionality, while SPARQL has the advantage of being a W3C standard.

Overview

This is a brief list of important features.

General

  • Native RDF support
  • Multiple databases (models) per server
  • Simple SQL-like query language
  • Small footprint
  • Full text search functionality
  • Datatype support
  • Supports and tracks W3C Specifications and guidelines

Performance and Scalability

  • Large storage capacity
  • Optimized for metadata storage and retrieval
  • Multi-processor support
  • Independently tuned for both 64-bit and 32-bit architectures
  • Low memory requirements
  • On-disk joins
  • Streamed query results

More information is available in the Scalability section below.

Reliability

  • Full transaction support
  • Clustering and store level fail-over
  • Permanent integrity

Connectivity

  • Sesame SAIL API
  • JRDF
  • SOAP
  • HTTP
  • Software Developers Kit (SDK)

Manageability

  • Near zero administration
  • Web based configuration and monitoring tools

Cross OS/Platform Support

  • Microsoft® Windows NT®, Windows® 2000 and XP
  • UNIX® and Linux®
  • Solaris (TM)
  • Mac OS® X
  • IRIX®

Scalability

The storage engine of Mulgara is a transactional triplestore known as the XA Triplestore. Much of the scalability of Mulgara is due to the following features of the XA Triplestore.

64-bit Data Structures

All relevant fields of in-memory and on-disk data structures are 64 bits wide, thus ensuring that Mulgara can store very large amounts of data up to the limits imposed by the host operating system.

Multiple Sessions with no Lock Contention

A single writing session in addition to multiple reading sessions can access the triplestore concurrently without the reading sessions being required to acquire a global lock while processing a query. This completely avoids the possibility of any lock contention. In general, each session executes in its own thread. The lack of lock contention means that the maximum number of active reading sessions is only limited by the concurrency of the host operating system and I/O subsystem.

When a session initiates a query, which may involve multiple requests to the triplestore, it first takes a snapshot of the entire database. This ensures that all requests to the triplestore during the processing of the query see the database in a consistent state.

The triplestore is designed such that obtaining a snapshot is a very quick operation and does not cause any I/O to be performed. It should take less than a millisecond on current hardware, regardless of the size of the database.

The session must hold a global lock only during this brief period while it obtains the snapshot. Once the snapshot is obtained, no further locking is required regardless of the number of triplestore operations that must be performed or the amount of time required to execute the query.

The existence of a snapshot does not by itself cause any additional storage to be consumed but it will cause any modifications to use copy-on-write semantics. The on-disk data structures of the triplestore are designed to minimize the amount of copying required to perform a modification thus improving performance while also maximizing the amount of storage shared between snapshots.

A snapshot is released once the query processing is complete. Any disk storage used by the snapshot and not shared with any other snapshot is immediately available for reuse. Releasing a snapshot is just as quick as obtaining a snapshot but the session does not even need to hold the global lock during this operation.

A separate global lock (the write lock) is used to ensure that there is only one writing session at any given time. The write lock is released after the writer either commits or rolls back the current transaction.

On-Line Backups

The XA Triplestore allows modifications and queries to proceed concurrently with a backup operation. The session performing the backup acquires a snapshot of the entire database as it would if it was performing a query.

Permanent Integrity

System crashes caused by power failures and some types of hardware fault will not cause data corruption.

The on-disk data structures of the triplestore are designed to be kept in a consistent state at all times while minimizing the overhead required to achieve this. Disk writes during a write transaction are unordered thus preserving good write performance. Write ordering is imposed only during a commit operation.

Use of Java NIO

The XA Triplestore uses the Java (TM) NIO (new I/O) API which was introduced in Java 2 SDK Version 1.4. The NIO API provides access to advanced I/O facilities which were previously only available to native C programs. The use of NIO allows the XA Triplestore to provide transactions, permanent integrity and good performance while still remaining a pure Java implementation.

Some of the features of NIO that are used by the triplestore include:

  • Positioned reads and writes
    NIO file channels allow multiple threads to concurrently read and write different parts of the same file without having to use thread synchronization to protect the current file position.
  • Forcing out dirty buffers to physical storage
    The NIO force operation can be used to ensure that all written data has been forced out to physical storage and can also be used to impose write ordering. This is an essential feature for providing permanent integrity and implementing transaction support.
  • Memory mapped file I/O
    The NIO API can be used to map files into virtual memory. Once a file has been mapped its content is accessed through a NIO ByteBuffer as if it had been loaded into memory. This form of I/O can be much more efficient than I/O that uses explicit read and write calls because it uses the virtual memory paging hardware to eliminate some system call overhead and the overhead of copying data between the operating system buffer cache and the application's buffers.

    On 32-bit platforms the amount of virtual memory that is available for mapping files is usually limited to less than 2 GB. As this would impose a restriction on the maximum size of database that can be used by Mulgara on 32-bit platforms, the XA Triplestore has an I/O abstraction layer that allows the file I/O mechanism for accessing a file to be selected when the file is opened.

    Mulgara can be started in one of three modes: all files mapped, index files mapped and no files mapped. Each of these modes allow Mulgara to use successively larger databases. By trading off database size for performance in this way it is possible to use databases of any size on 32-bit platforms while still retaining maximum performance for smaller databases.