At last we have a candidate for the DB. After the Xesam dude (Mikkel Kamstrup Erlandsen) joined the team he took my last proposal of a new DB design and did his voodoo to end up with.

Again please forgive me for using UML class diagram tools for designing the DB.
Here you can read more and get into details. I will try to finish the implementation until monday leaving the current interface unharmed.
Xesam as described by their page is:
short for eXtEnsible Search And Metadata and is an umbrella project with the purpose of providing unified APIs and specs for desktop search- and metadata services. We are collaborating with several projects such as Tracker, Strigi, Beagle, Pinot, Recoll, and Nepomuk-KDE.
This will allow us to use annotation as tags, bookmarks,and custom comments. Also its a good preparation for a optional Tracker backend as well as co-operation with other FLOSS Projects involved with RDF semantics while still being a project of our own. I am worried however how it is going to scale. But I guess nothing that some indexing and tweaking wont fix.
What’s the reason for creating your own database instead of using, for example, Tracker’s? I think tight integration with Tracker would be a big plus.
Some people don’t use tracker! How should they use zeitgeist! This DB module allows us to use tracker as an OPTIONAL backend!
Note that in branch tracker-store we are working on separating the storage and query engine from the indexing, crawling and monitoring. This means that the tracker-store service will be separately packagable and wont even have any inotify things running. It’s merely a SPARQL+Nepomuk, and SPARQL UPDATE+Nepomuk serving service that’ll be activatable by DBus and that can even be taken down during unactivity.
Which means that you can see tracker-store as a database. Except that instead of SQL and a normalized schema, it talks SPARQL, RDF and Nepomuk. It means that it’s designed to cope with truly large amounts of data within a mobile setting (we decomposed, or denormalized, our storage for better performance – we opted for fast query, but a bit slower storage -).
Meanwhile its being designed specifically for mobile purposes. This of course doesn’t mean that Tracker ain’t useful for a desktop. It just means that our (Nokia) team’s focus is a mobile one.
For GNOME I’m not sure why you’d still want your own store. I’d just opt for Tracker, personally. That way you can focus on your creativity and ideas for Zeitgeist instead of on the boring tasks of storing things yourself. Zeitgeist should for example start using SPARQL internally too. You probably don’t want to implement SPARQL yourself.
As for raw throughput speed I have been experimenting with a light unix-socket IPC mechanism which allows you to INSERT data at a throughput rate of 100,000 items per 1.3 seconds (scales linear). Ensured storage (with the COMMIT) is of course a bit slower than this throughput, but we’ll enqueue your store requests in a queue (and we call you back when final storage and COMMIT have succeeded). Take a look at the branch tracker-store-ipc for that experimental stuff.
Again at this point we have some depanents. So until October we have to stick to what we know. Dependng on tracker at the moment even the storage is to risky. Especially that we have a gsoc project running and some deadlines. The current DB will help a merge later. But please understand that full dependencies are not in our interest at he moment. We do intend to work with you guys just else we would not have Natan working on the optional tracker backend.
Hi Philip,
As Seif said, we’re very excited by the work that’s been done with tracker lately, but we don’t want to lock ourselves into one backend. (Especially not before Tracker 0.7 is released.)
In another two weeks, as soon as my exams are over, I’m going to get back to working on the Tracker backend. We’ll definitely consider using it by default on computers where Tracker is already installed.
Seif, Natan: that’s great to hear. Thanks.
hey, fyi from the “freedesktop” perspective beyond the current task at hand:
the situation here is similar to what happened in kde 4.0 in 2007/2008. There, the NEPOMUK backend was added as a metadata/search store with similar goals as tracker has. Also the technology is very similar (nepomuk’s store is also rdf+sparql+nepomuk, as is tracker).
The result is that on KDE the performance of the store is “ok” and is currently improved to “better”, but many applications are now moving to the new metadata store. so thinking about this early can help two years later, when a whole desktop/mobile platform can be integrated.
also, if the db is too slow, its possible to combine it with a relational db, such as openlinksw’s open source virtuoso store does (hardcore combination of sparql and sql in one query language and store implementation)