Friday, October 30, 2009

Mashing up data from many, many site collection in SharePoint

One large architectural hurdle our team faced recently was displaying real-time data from across thousands of site collections. A normal query across this many site collections (or even sites) would be unacceptable (can easily go over 30 seconds).

This is useful for a variety of reasons, such as if data is needed for a variety of dynamic and summary displays on a user's homepage (dashboard).

Part of the solution for us was to query the search index for the data. This was already 'queried' and indexed as part of the seaech engine's regular crawl operations, so the data was there and available to query via the object model or a web service. This won't however, give you a real-time view of the data, only the data since the crawler last ran a crawl (this could be every 5 minutes, or much longer).

So the second part of our solution involved creating a cache of the list from across the site collections. Each list got an event receiver to create a cache of list items whenever added, changed or deleted.

There was a master cache list on a top-level site collection, and now we can query that and the search (take whichever is newer), and get real-time mashups of data, from across thousands of site collections.