Welcome to part-2 of sitecore search series. In first blog we had introduction to sitecore search, why sitecore search, its benefits etc. In this blog we will get started with sitecore search CEC portal and explore its all features.
To get started with CEC portal, you should have login credential for CEC poral- https://cec.sitecorecloud.io/.
Navigating the CEC Portal
Press enter or click to view image in full size

- Site performance — The CEC homepage: a quick overview of site performance analytics and health.
- Pages — Manage your search pages and their associated widgets that render search experiences.
- Widgets — Create and configure reusable UI components (widgets) that display search results and features.
- Analytics — Access reports and metrics that show how your search implementation is performing.
- Global resources — Configure system-level resources and global widget settings used across pages.
- Content collection — Browse and validate all indexed content and visitor data in your domain.
- Sources — Define and configure content sources (feeds, crawlers, uploads) and their extractors for indexing.
- Developer resources — Tools and API references for developers to test and work with Sitecore Search services.
- Administration — Manage domains, users, roles, and other administrative settings for your tenant.
Types of Sources & Connectors in Sitecore Search
In Sitecore Search, a Source tells the platform where and how to pull in content. Each source type is designed for different content scenarios:
- Web crawler — simple pull crawler that follows links (or a sitemap) and indexes HTML/PDF/Office content; good for single-locale public websites.
- Advanced web crawler — enhanced crawler with domain restrictions, parallel workers, JS rendering, auth, multiple-entity extraction and locale handling (use when site is complex or gated).
- Feed crawler — ingest structured files (CSV/JSON/delimited) or SFTP uploads and transform rows/objects into index documents (use for batch product/catalog feeds).
- API crawler — pulls JSON from API endpoints and uses JSONPath/JS extractors to create documents (use for headless CMS APIs, XM/Edge, or product APIs).
- API push (API Push) — push-only source that creates an empty index you populate via the Ingestion API (ideal for real-time/event-driven content or frequent small updates).
Document extractor
A Connector in CEC is the selected ingestion mechanism (the “type” you pick when creating a Source) — e.g., Web Crawler, Feed Crawler, API Crawler, or API Push. It determines how Search reaches your content (pull vs push) and which extraction/transform features are available.
- XPath — HTML/XML extraction (Web crawler / Advanced web crawler).
- CSS selector — HTML/CSS extraction (Advanced web crawler).
- JavaScript function / Cheerio-style JS — complex/custom extraction or transformations (Advanced web crawler, API crawler, some feed scenarios).
- JSONPath — parse API/JSON payloads (API crawler; also used when extracting from JSON feeds).
Now that we understand sources, connectors, and document extractors, let’s walk through creating a Web Crawler source step by step.
Create and configure your first source
or this blog post, I’ll show you how to create a Web Crawler source, as that is the most common source you’ll use.
Get Jainhanish’s stories in your inbox
Join Medium for free to get updates from this writer.
Go to sources and click on add source.
Press enter or click to view image in full size

Provide source name, connector and short description and save it.
Press enter or click to view image in full size

The source will be created and you can see some basic information of the source, like the Source ID, the status and the last run status & time (empty since the source was never run). The source is still in Draft status and the Publish button is disabled, so there is required configuration that needs to be completed before we can publish it.
Press enter or click to view image in full size

The source comes with pre-configured attribute extraction rules. The default ones are name, url, image_url, description and type attributes to the Open Graph tags in the HTML markup. In addition to this or if you want you can remove/add/update extraction of value form your specific html tag and update this extraction by clicking on edit button. As I have modified it with some additional attributes and removed default ones.
Press enter or click to view image in full size

Next in the Web Crawler Settings section click on edit and enter the URL to your website in the URL field. Also there are additional settings like max depth of url allowed, max no of urls etc confifurations can be done.
Press enter or click to view image in full size

Next, we can also set the scan frequency. This will ensure that our website is automatically re-scanned periodically, to ensure that any content updates on our website are correctly crawled and reflected in our search index.
Press enter or click to view image in full size

last step, click on the ‘Publish’ button in the top-right corner to publish our new web crawler source.

Once published your crawled content will be available in the content collection. Navigate to content collection, expand your entity under which you have created your source then filter your source from top navigation(by default you will see all the sources’s content). You will be able to see all your crawled pages content mapped to attribute fields.
Press enter or click to view image in full size

Press enter or click to view image in full size

This confirms that our source is ready to be queried via the Search and Recommendation API, so we can start integrating it with our website.