Key features
- Glean captures Confluence pages (including their hierarchical parent/child structure), blog posts, metadata attachments, comments, and more.
- Glean respects all user access permissions, ensuring users only see search results for documents they can access. When a user clicks on a search result, they are taken to the Confluence web application, which enforces the permission.
- The connector ensures comprehensive data coverage, including metadata, identity data, permissions data, and activity data. It provides real-time synchronization, reflecting updates and permission changes immediately in search results.
- All data is stored in the cloud project within the customer’s cloud account (Glean or customer hosted), ensuring no data leaves the customer’s environment.
- Glean uses Atlassian’s standard REST API for Confluence to ingest all data.
- Near realtime freshness via webhooks; the Glean plugin provides view activity signals that improve ranking.
Versions supported
- Supports Confluence Data Center/Server versions 7.4 and above. You can use Applinks manifest
/rest/applinks/1.0/manifestendpoint to get the version and other information related to the instance. - Glean also supports Confluence Cloud. For more information, see the Confluence Cloud Connector.
Indexed content and data
The Glean Confluence connector crawls three distinct types of data—Content, Identity, and Activity—to ensure a fast, comprehensive, and securely managed index.Content
- Pages, blog posts, spaces, attachments (metadata), footer comments.
- Blog posts have no hierarchy; crawled via standard content listing APIs.
- Restricted pages can be indexed if the connector is granted access; permissions remain enforced in results.
- Archived pages are not applicable for Confluence Data Center; archived spaces are crawled by default and are configurable.
Activity data and webhooks
- Processes create, update, delete, move/restore, and permission change events for fast updates.
- Plugin surfaces view activity to improve ranking signals.
- Webhook for permission/restriction changes trickle down to sub documents of the document tree with a minimum delay of 20mins (our cache frequency).
Identity data
- Crawls users, emails, groups, and memberships; visibility is limited to configured product access groups.
- Known DC issue: group members API is unstable in some versions; plugin can serve group memberships as a workaround.
Limitations
- Only footer comments are indexed (not inline). If indexing of inline comments is critical to your workflow, consider copying the content of important inline comments into a page comment to ensure it is indexed by Glean.
- Confluence mutator crawls do not work.
- Image attachment content isn’t indexed; attachment metadata is captured.
- Blog posts have no hierarchy.
- Content restrictions read API in Server/DC returns all restrictions in one response (no pagination).
- Users listing API on Server can 5xx with invalid users; fix product access groups to resolve.
- New comments on pages older than 1 day aren’t crawled or indexed via webhooks or incremental crawls. They’re only updated in full crawls.
Rate limits
Queries per Second (QPS): QPS depends on customer server capacity and is configurable; a common default is ~16 aggregate QPS (admin: 3, content: 7, identity: 6).Update frequency
- Identity: full every 10 minutes; no incremental.
- Content (pages/blog posts): full every 7 days; incremental hourly (updates since start of day).
- Space permissions: full every 3 hours.
- Webhooks apply changes in near real time between scheduled crawls.
How the crawl works
The crawler follows the traditional crawler strategy, including utilizing the API and the following ways to get and update data:- Identity Crawl: updating and adding of People data, including users, groups, and other information
- Webhooks: are messages sent by the application to notify Glean of changes in real-time, and then Glean either initiates a crawl or picks up the change on the next crawl.
- Content Crawls: Full crawls the entire defined scope of the application whereas incremental crawls only capture the changes from the previous full or incremental crawl.
Required permissions
The user setting up this data source must have administrator permissions. You can reach out to Glean Support for any network configuration requirements.Setup instructions
Perform the following steps to connect Confluence Dataa Center with Glean.1. Create a service account for Glean
- Sign into Confluence as an admin.
- Go to User Management.
- Create a user with any name, email, and password.
- Click Edit Groups.
- Add the service account to confluence-administrators. Alternatively, ensure the user is a space administrator for all spaces that should be crawled.
2. Provide basic information about your Confluence instance
- Enter the server’s base URL in Glean setup page. For example, https://confluence.mydomain.com.
- If a network proxy is used to route requests (contact Glean support to confirm if you are not sure about this), enter the Confluence Server Host or IP in the Server Host or Server IP input fields.
- In case there are multiple domains in your Confluence instance, enter all the URLs except the base URL in the Additional domains field. For multiple URLs use commas and no spaces to separate the URLs.
- Enter the product access group(s). This should be the group(s) containing all Confluence users. Often, this is confluence-users. For multiple groups use commas and no spaces to separate the group names.
- Enter the service account details created earlier into Glean.
- Enter the number of API calls per second supported by your Confluence instance.
3. Configure Webhook / Plugin activity
- Check the Admin-privileged service account checkbox if the service account is a part of the confluence-adminstrators product access group. This will automate setting up the webhook and configuring the Glean plugin after installation.
- Install the Glean activity plugin which is available on the Atlassian Marketplace. The marketplace page will provide the installation instructions.
Note: The following sections can be skipped if the service account has admin privileges. If your service account does not have admin privileges, please navigate to your newly created instance in Admin Apps Setup page and follow the rest of the instructions from there.
3a. Configuring the Glean activity plugin
- We need to configure the Glean activity plugin to send the events to the correct endpoint.
- Go to Manage Apps in Confluence Admin UI.
- Open the glean_search app and click on Configure.

- Copy the target URL from Plugin Target URL box shown on Glean UI and hit Submit. The URL must be a valid URL in the format:
https://domain-be.glean.com/instance/CONFLUENCE_ABC1234/scio_event - The activity plugin should now be configured successfully.
3b. Connect the webhook
- Go to General Configuration in Confluence Admin UI.
- Click Create webhook.
- Configure as follows:
| Config | Value |
|---|---|
| Name | Glean Search |
| URL | Copy the URL from Webhook URL box |
| Webhook Shared Secret | Use any value. Enter it in Glean setup page and click Save! |
| Events | Select all |
| Status | active |
API Endpoints
| Purpose | DC Endpoint | DC Method | DC Permission |
|---|---|---|---|
| List users | search/user | GET | READ |
| List groups | group | GET | READ |
| List group members | group/%s/member | GET | READ |
| List groups of user | user/memberof | GET | READ |
| Get current user | N/A | ||
| Get email of users | user/non-system | GET | ADMIN |
| List spaces | space | GET | SPACE_ADMIN |
| CQL based list spaces | search | GET | READ |
| List pages in space | space/%s/content/page | GET | READ |
| List blogposts in space | space/%s/content/blogpost | GET | READ |
| Get space permissions | spaces/spacepermissions.action | GET | Confluence Administrator |
| List content | content | GET | READ |
| Get content | content/%s | GET | READ |
| CQL based list content | content/search | GET | READ |
| List children of page | pages/%s/children | GET | READ |
| Fetch applinks | rest/applinks/1.0/listApplicationlinks | GET | Confluence Administrator |
| Create webhook | rest/api/webhooks | POST | Confluence Administrator |
| Get content restrictions | content/%s/restriction/byOperation/read | GET | READ |
| Update content restriction | N/A | ||
| Configure plugin | scio_search/1.0/configure | POST | |
| Get installed plugin version | scio_search/1.0/version | GET | |
| Get space permissions via plugin | scio_search/1.0/space_permissions | GET |
Content configuration
Note: If Inclusion (Green-Listing) options are enabled, only content from the Inclusion category will be indexed. If Exclusion (Red-Listing) options are enabled, all content in the exclusion category will be removed. If both rules are applied to the same content, then the content will NOT be indexed, as exclusion rules take priority. The rules below should be used MINIMALLY to preserve the enterprise search experience, as most end-users expect to find all content. Most customers do not apply any rules or apply exclusion rules sparingly for sensitive folders. Exclusion rules are applied automatically after the next full crawl, which can vary by corpus size. If a recrawl is needed, please reach out to your Glean representative.Exclusion (Red-listing) options
Glean provides several options for excluding content from the data crawl, which excludes data from search and chat results.- Space: Exclude certain Confluence spaces from being crawled by Glean by specifying space keys
- Pages with specific labels: Exclude pages and blog posts with specific labels from being crawled by Glean
- Pages with content matching specific regex: Exclude pages and blog posts with content matching specific regex from being crawled by Glean
- Creators: Exclude content created by certain creators from being crawled by Glean.

Inclusion (Green-listing) options
Glean provides several options for including content from the data crawl, which includes data from search and chat results.- Spaces: Only allow Glean to crawl certain Confluence spaces. Glean will crawl all spaces except those in the Exclusion rules if no spaces are specified.
