Lars Fastrup, Independent Consultant

SharePoint 2010 Search…What’s Next for SharePoint Pros

Sponsored by BA-Insight www.ba-insight.net March 2010

 

 

 

 


 

 

SharePoint 2010 Search - What’s next for IT-Pros

SharePoint 2010 Enterprise Search includes many relevant improvements for IT professionals. The biggest change is without doubt the new and more scalable deployment architecture. Beyond that, SharePoint 2010 offers many other useful changes including an improved administration dashboard, built-in administration reports for monitoring the performance of the search engine over time and complete PowerShell support enabling scriptable administration. Last but not least, SharePoint 2010 also ships with an improved connector framework allowing IT-Pros to easily configure indexing of remote repositories.  The following sections will introduce you to more technical details on these improvements for IT professionals.

New Scale-Out Architecture

The search engine in Microsoft Office SharePoint Server 2007 suffered from a number of scalability problems that have all been addressed in SharePoint 2010 by the introduction of a new scale-out architecture. The scalability issues found in MOSS 2007 and resolved in SP2010 include the following:

·         High query latency and slow crawls when the search index grows to millions of items. The official limit of 50 million items per index does not perform well in practice.

·         Non-redundant index server role making it a single-point-of-failure and a performance bottleneck with respect to crawl speed.

·         Non-redundant property database making it a single-point-of-failure and a performance bottleneck with respect to crawl speed as well as query latency.

The SharePoint 2010 search engine introduces a new and highly componentized deployment architecture to resolve these scalability issues. The available components that IT-Pros must learn how to deploy include; 1 Administration Component, 1 Administration Database, 1+ Query Component, 1+ Crawl Component, 1+ Property Database and 1+ Crawl Database. This componentization of the search engine offers the following features and benefits:

·         Index Partitioning enabling a search index to be partitioned across multiple query servers, which will in turn work in parallel on each query. This enables deployment architectures with sub-second query latency up to about 100 million indexed items.

·         Index Mirroring enabling query failover by cross mirroring the search index on the query servers (passive mirroring) or mirroring it to a parallel set of query servers (active mirroring).

·         Multiple Stateless Crawlers offering improved crawl performance and high availability of crawls. Stateless refers to the fact the crawlers are redundant and they do not keep a copy of the index on the server as was the case with the index server in MOSS 2007. Consequently, crawlers have a low disk space requirement.

·         Multiple Crawl Databases for improved crawl performance. Supports native SQL mirroring for failover.

·         Multiple Property Databases for improved query performance. Supports native SQL mirroring for failover.

Figure 1 shows a sample deployment with a partitioned and mirrored search index, multiple property databases, multiple crawlers and multiple crawl databases.

Figure 1: Sample SharePoint 2010 Search deployment.

Improved Administration experience

The consolidated administration dashboard introduced with the MOSS 2007 infrastructure update has been carried along and improved in SharePoint 2010. Hence, the search administration experience will be very familiar to search administrators familiar with the MOSS 2007 administration experience. The dashboard provides IT administrators with a quick overview of the state of the search engine and easy access to its configuration. Significant improvements include:

·         Topology editor for adding, updating and removing search components in a deployment.

·         Support for managing custom content sources directly in the Web UI (Required custom code in MOSS 2007).

·         Support for regular expressions in Crawl Rules.

·         Ability to prioritize Content Sources.

·         Improved Web analytics reports for monitoring search usage.

·         New administration reports to monitor the performance of query components and crawl components in a deployment.

·         Web part based dashboard page allowing for easy customization with custom Web parts.

·         Advanced monitoring though Microsoft System Center Operations Manager (SCOM).

The screen shot seen in Figure 2 illustrates the look and feel of the dashboard and Figure 3 shows a sample report on the crawl rate over time.

Figure 2: Consolidated Administration Dashboard


 

Figure 3: Report on crawl-rate over time

 

PowerShell Support

Say goodbye to STSADM and hello to Microsoft PowerShell - virtually every administrative operation in SharePoint 2010 is now scriptable through a rich palette of PowerShell Cmdlets[1]. Enterprise Search is no exception here – it ships with 100+ PowerShell Cmdlets enabling scripted administration of search artifacts like:

·         Search Service Application

·         Crawl, Query and Database components

·         Content sources

·         Crawl rules

·         Crawled metadata properties

·         Managed metadata properties

·         Search scopes

·         Ranking model

·         And much more…

Executing PowerShell commands is easy; simply login to the server and launch the SharePoint 2010 Management Shell from the Windows start menu and type the PowerShell command or script to execute. Figure 4 below shows how to add a new Content Source for crawling a file share using the Cmdlet named New-SPEnterpriseSearchCrawlContentSource.

Figure 4: Sample command executed from the SharePoint 2010 Management Shell

To list all SharePoint 2010 Cmdlets, type:

Get-Command –pssnapin “Microsoft.SharePoint.PowerShell” | format-table name

To view the usage of a Cmdlet, type:

Help <Name of Cmdlet>

To view the detailed usage of a Cmdlet, type:

Help <Name of Cmdlet> -full

 

Improved Connector Framework

The SharePoint 2010 Enterprise Search Engine also ships with a new connector framework leveraging the new Business Connectivity Services (BCS)[2] to index external content. The framework does along with improved tool support in SharePoint Designer 2010, enable administrators to configure the indexing of external content through the following generic connectors:

·         Database connector

·         Windows Communication Foundation (WCF) / Web Services connector

·         .NET connector with callouts to custom code

Developers can additionally develop custom connectors in managed code (.NET) to efficiently index any custom repository not supported by the BCS. The connector framework supports indexing of structured content (rows and columns) and unstructured content (documents) along with security descriptors (ACLs) on each item. The latter enables automatic security trimming of search results at query time. This is a big improvement over the Business Data Catalog in MOSS 2007, which can only index structured data without associated security descriptors.

These improvements over the BDC eliminate the need to develop complex Protocol Handlers to index documents and security information from custom repositories. However, the Protocol Handler connectivity framework is still present and used by SharePoint 2010 to index SharePoint content, File shares, Web sites and People profiles. But the new connector framework is leveraged when indexing content from Lotus NotesTM, Exchange Public Folders and DocumentumTM.

Figure 5 outlines the overall architecture of the new connector framework.

Figure 5: Connector Framework Architecture

 

 



[1] A cmdlet is a lightweight command that is used in the Windows PowerShell environment.

[2] Business Connectivity Services (BCS) is the evolution of the Business Data Catalog (BDC) in Microsoft Office SharePoint Server 2007.