Sitecore Search: Getting started

Welcome to part-2 of sitecore search series. In first blog we had introduction to sitecore search, why sitecore search, its benefits etc. In this blog we will get started with sitecore search CEC portal and explore its all features.

To get started with CEC portal, you should have login credential for CEC poral- https://cec.sitecorecloud.io/.

Navigating the CEC Portal

Site performance — The CEC homepage: a quick overview of site performance analytics and health.
Pages — Manage your search pages and their associated widgets that render search experiences.
Widgets — Create and configure reusable UI components (widgets) that display search results and features.
Analytics — Access reports and metrics that show how your search implementation is performing.
Global resources — Configure system-level resources and global widget settings used across pages.
Content collection — Browse and validate all indexed content and visitor data in your domain.
Sources — Define and configure content sources (feeds, crawlers, uploads) and their extractors for indexing.
Developer resources — Tools and API references for developers to test and work with Sitecore Search services.
Administration — Manage domains, users, roles, and other administrative settings for your tenant.

Types of Sources & Connectors in Sitecore Search

In Sitecore Search, a Source tells the platform where and how to pull in content. Each source type is designed for different content scenarios:

Web crawler — simple pull crawler that follows links (or a sitemap) and indexes HTML/PDF/Office content; good for single-locale public websites.
Advanced web crawler — enhanced crawler with domain restrictions, parallel workers, JS rendering, auth, multiple-entity extraction and locale handling (use when site is complex or gated).
Feed crawler — ingest structured files (CSV/JSON/delimited) or SFTP uploads and transform rows/objects into index documents (use for batch product/catalog feeds).
API crawler — pulls JSON from API endpoints and uses JSONPath/JS extractors to create documents (use for headless CMS APIs, XM/Edge, or product APIs).
API push (API Push) — push-only source that creates an empty index you populate via the Ingestion API (ideal for real-time/event-driven content or frequent small updates).

Document extractor

A Connector in CEC is the selected ingestion mechanism (the “type” you pick when creating a Source) — e.g., Web Crawler, Feed Crawler, API Crawler, or API Push. It determines how Search reaches your content (pull vs push) and which extraction/transform features are available.

XPath — HTML/XML extraction (Web crawler / Advanced web crawler).
CSS selector — HTML/CSS extraction (Advanced web crawler).
JavaScript function / Cheerio-style JS — complex/custom extraction or transformations (Advanced web crawler, API crawler, some feed scenarios).
JSONPath — parse API/JSON payloads (API crawler; also used when extracting from JSON feeds).

Now that we understand sources, connectors, and document extractors, let’s walk through creating a Web Crawler source step by step.

Create and configure your first source

or this blog post, I’ll show you how to create a Web Crawler source, as that is the most common source you’ll use.

Get Jainhanish’s stories in your inbox

Join Medium for free to get updates from this writer.

Go to sources and click on add source.

Provide source name, connector and short description and save it.

The source will be created and you can see some basic information of the source, like the Source ID, the status and the last run status & time (empty since the source was never run). The source is still in Draft status and the Publish button is disabled, so there is required configuration that needs to be completed before we can publish it.

The source comes with pre-configured attribute extraction rules. The default ones are name, url, image_url, description and type attributes to the Open Graph tags in the HTML markup. In addition to this or if you want you can remove/add/update extraction of value form your specific html tag and update this extraction by clicking on edit button. As I have modified it with some additional attributes and removed default ones.

Next in the Web Crawler Settings section click on edit and enter the URL to your website in the URL field. Also there are additional settings like max depth of url allowed, max no of urls etc confifurations can be done.

Next, we can also set the scan frequency. This will ensure that our website is automatically re-scanned periodically, to ensure that any content updates on our website are correctly crawled and reflected in our search index.

last step, click on the ‘Publish’ button in the top-right corner to publish our new web crawler source.

Once published your crawled content will be available in the content collection. Navigate to content collection, expand your entity under which you have created your source then filter your source from top navigation(by default you will see all the sources’s content). You will be able to see all your crawled pages content mapped to attribute fields.