BrightData Recruit Intelligence (Company Data Analysis + Candidate Search)

Powered by Google, Bing, DuckDuckGo + n8n + Google Gemini

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

Download Workflow

BrightData Recruit Intelligence with Google, Bing, DuckDuckGo + n8n + Google Gemini Workflow

1. Introduction

The BrightData Recruit Intelligence workflow is designed to streamline talent sourcing and company research using search engine X-Ray queries, Bright Data’s scraping infrastructure, and Google Gemini AI reasoning.

It allows recruiters, analysts, and growth teams to:

Perform candidate searches (LinkedIn, GitHub, StackOverflow) using Boolean/X-Ray search queries.
Extract company insights (LinkedIn company profiles, website data, metadata).
Automate the entire data collection + interpretation pipeline within n8n.

At its core, the workflow leverages:

Bright Data Web Unlocker & Scraper → reliable data extraction from company or candidate pages.
Google Gemini (LLM) → reasoning engine for query building, intent detection, and human-readable output.
Google, Bing, DuckDuckGo → external search providers for wide coverage of candidate profiles.
n8n orchestration → workflow automation, branching logic, retries, and structured output formatting.

The Core Concept

The „Recruit Intelligence“ system is an automated pipeline designed to go beyond basic data collection for recruitment. It combines web scraping, data enrichment, and AI-powered analysis to provide recruiters with a comprehensive dossier of potential candidates. The key is to transform raw, unstructured data from the web into actionable insights, such as skills, experience, personality traits, and work styles.

The Role of Each Component

Bright Data: This is the foundation of the system for data collection.

Web Unlocker & Proxy Network: Bright Data provides a massive network of proxies and a Web Unlocker to bypass anti-bot measures, CAPTCHAs, and other technical challenges. This ensures reliable and scalable scraping from various websites.

SERP API & Search Engine Data Extractor: Bright Data’s tools allow you to programmatically scrape Search Engine Results Pages (SERPs) from multiple engines like Google, Bing, and DuckDuckGo. This is crucial for „X-Ray searching“ to find public profiles and information across the web.

Pre-built Scrapers/Datasets: Bright Data offers specific, pre-built scrapers for platforms like LinkedIn which provide structured data directly in JSON or CSV format, saving significant development time.

Google, Bing, DuckDuckGo: These are the primary data sources for candidate information.

The system uses these search engines to perform „X-Ray searches,“ which are specific search queries designed to find public profiles and mentions of a candidate on different websites.

Bright Data Tools – This workflow leverages the verified Bright Data node for the automation of scraping of search results and scraping of company information.

n8n: This is the orchestration and automation platform.

n8n provides a visual, low-code/no-code interface to connect all the different services.
It acts as the central hub, defining the workflow from start to finish. A typical workflow would be:
A trigger (e.g., a recruiter adds a LinkedIn URL to a Google Sheet).
A node that calls the Bright Data API to scrape the profile.
Another node that passes the scraped data to Google Gemini for analysis.
A final node that pushes the processed, structured data to a destination like Google Sheets or a CRM/ATS.
n8n’s community-built nodes for Bright Data and Google Gemini simplify the integration process.

Google Gemini: This is the intelligence layer for data analysis and reasoning.

Natural Language Processing (NLP): Gemini can take unstructured text (e.g., a LinkedIn profile or a resume) and parse it into a structured format, such as a JSON resume.

Advanced Analysis: Gemini can be prompted to perform sophisticated analysis, such as:

Data Enrichment: Combining information from multiple sources (e.g., a LinkedIn profile and a Google Search result) to create a more complete picture of the candidate.
Query Building: Gemini can also be used to convert a recruiter’s natural language query (e.g., „Find me a senior software engineer with Java experience in California“) into a structured Boolean search query suitable for search engines.

2. Real-World Use Cases

Talent Sourcing (Recruiting)

Automate Boolean/X-Ray queries to search for candidates with specific skills, experience, and locations.
Example: Find Python developers with 5+ years of experience in Bangalore → workflow generates Google/Bing/DuckDuckGo X-Ray queries → scrapes StackOverflow, GitHub, LinkedIn profiles.
Outputs structured candidate profiles recruiters can filter, store, or enrich further.

Company Intelligence (Sales/Business Development)

Automate company profile lookups (e.g., Extract company details from IBM’s LinkedIn page).
Retrieve key details like industry, employee size, headquarters, recent activities.
Useful for account-based marketing, B2B sales targeting, or competitor research.

Research Automation

Analysts can run ad-hoc queries: Find startups in AI space hiring in San Francisco.
Workflow identifies candidate companies + extracts structured insights.

Benefits in Real Life

Recruiters → faster talent mapping and shortlisting.
Sales teams → enriched account data for outreach.
Analysts → scalable research pipelines without manual Googling.

3. Workflow Breakdown

Here’s how each stage of the workflow functions:

Step 1: Chat Input Trigger

Node: When chat message received
Purpose: Captures natural language queries like:
- “Find Python developers in Bangalore with 5+ years of experience.”
- “Extract details of IBM from LinkedIn.”

Step 2: Search Type Analysis (Intent Detection)

Nodes: Search Type Analysis + Google Gemini Chat Model for AI Agent for Search Type + Structured Output Parser
Purpose: Classifies user request as:
- candidate_search (→ triggers candidate sourcing workflow)
- company_lookup (→ triggers company insights workflow)
Example output JSON:

{
  "type": "candidate_search",
  "search": "Python developers in Bangalore with 5+ years of experience"
}

Step 3A: Candidate Search Path

X-Ray Query Builder (Gemini) → Converts natural language into Boolean queries for Google, Bing, DuckDuckGo.
- Example input: “Python developers in Bangalore”
- Output:
```
site:linkedin.com/in ("Python" OR "Developer") "Bangalore" -jobs -careers -recruiter
```
Candidate Search Agent → Decides which search engine to use (Google, Bing, DuckDuckGo), constructs Bright Data query, and fetches results.
Bright Data URL Fetch → Executes scraping on search result pages and parses them.
Respond to Candidate Search → Returns structured profiles back into chat.

Step 3B: Company Lookup Path

Set Input Fields → Bright Data Company Data Extraction → Triggers a Bright Data scraper configured for LinkedIn company pages.
Check for Snapshot Status + Wait Nodes → Ensures scraping job completes.
Download Snapshot → Fetches structured JSON of company profile data.
Respond to Company Chat → Returns insights like industry, HQ, employees, specialties.

Step 4: Branching & Orchestration

If Nodes control flow between candidate search vs company lookup.
Wait Nodes handle asynchronous scraping delays.
Gemini models enrich, normalize, and produce human-readable recruiter insights.

4. Technical Highlights

Dual AI Roles with Gemini:
- Query Builder (generates Boolean/X-Ray search strings).
- Reasoning Agent (interprets candidate/company data, summarizes findings).
Bright Data Integration:
- Web Unlocker → bypasses anti-bot protection.
- Scrapers → structured output from LinkedIn, GitHub, StackOverflow, or websites.
Multi-Search Engine Support:
- Google → broad coverage.
- Bing → regional queries.
- DuckDuckGo → privacy-focused & alternative index.
Structured Outputs:
- JSON schemas enforce consistency.
- Search results → candidate profiles → stored or exported to ATS/CRM.

5. Example End-to-End Run

Input (User Query via Chat)

“Find Python developers with 5+ years experience in Bangalore. Also, extract company details for IBM.”

Workflow Execution

Detects two intents: candidate_search + company_lookup.
Candidate Search:
- Builds X-Ray search → runs via Bright Data on Google → extracts LinkedIn/StackOverflow/GitHub profiles.
- Returns candidate shortlist with structured details.
Company Lookup:
- Uses Bright Data scraper → pulls LinkedIn company page.
- Outputs structured company data (industry, employees, HQ).

Output

Candidates JSON:

[
  {
    "name": "Rahul S.",
    "profile": "linkedin.com/in/rahuls-dev",
    "skills": ["Python", "Django", "API Development"],
    "location": "Bangalore",
    "experience": "6 years"
  }
]

Company JSON:

{
  "company": "IBM",
  "industry": "Information Technology",
  "employees": "10,000+",
  "hq": "Armonk, New York",
  "linkedin_url": "linkedin.com/company/ibm"
}

6. Real-World Impact

Recruiters → Automate talent pipelines without manual Boolean building.
Sales teams → Quickly enrich company data for outreach.
Researchers → Run scalable talent & company intelligence pipelines.

7. Major Challenges and Solutions

Challenge 1: Heterogeneous Data Sources

Problem: Candidate and company data comes from multiple platforms (LinkedIn, GitHub, StackOverflow, Google/Bing/DuckDuckGo search results) with different structures, formats, and metadata.

Solution: Implemented a normalization layer using n8n’s Switch and Function nodes. Each data source is parsed into a unified schema (name, profile URL, skills, location, experience, company details), enabling Gemini to interpret results consistently.

Challenge 2: Boolean/X-Ray Query Complexity

Problem: Recruiters often struggle to construct accurate Boolean/X-Ray queries for Google, Bing, or DuckDuckGo, leading to incomplete or irrelevant results.

Solution: Leveraged Google Gemini as a Query Builder Agent. It automatically transforms natural language recruiter prompts (e.g., “Python developers in Bangalore with 5+ years experience”) into optimized Boolean search strings for multiple engines.

Challenge 3: Data Reliability & Anti-Bot Barriers

Problem: Direct scraping of platforms like LinkedIn is prone to blocking, incomplete data, and inconsistency.

Solution: Integrated Bright Data Web Unlocker & Search Engine APIs to bypass anti-bot systems, ensuring stable, compliant, and high-fidelity data extraction.

Challenge 4: Asynchronous Scraping Delays

Problem: Bright Data scrapers often take time to finish jobs, and synchronous workflows risk breaking or returning incomplete data.

Solution: Added snapshot polling with Wait nodes in n8n to monitor job completion. Only after scraping results are ready, the workflow fetches and processes structured output.

Challenge 5: Candidate vs. Company Intent Detection

Problem: A single recruiter query might request both candidate search and company insights (e.g., “Find Python developers and also extract IBM company details”). Without correct routing, workflows break.

Solution: Built an AI-powered Intent Classifier (Gemini + Structured Output Parser) that detects query type(s) → triggers parallel paths:

Challenge 6: Data Interpretation & Enrichment

Problem: Raw scraped data is messy and not recruiter-friendly (e.g., JSON dumps, incomplete profiles).

Solution: Used Gemini as an AI Reasoning Agent to enrich, summarize, and present results in human-readable insights. Example: converting “linkedin.com/in/johndoe + skills: python, django” into → “John Doe — Senior Python Developer, 6 years experience, Bangalore”.

Challenge 7: Multi-Search Engine Coordination

Problem: Google, Bing, and DuckDuckGo return different search results, and duplication/noise made it difficult to merge insights.

Solution: Implemented a multi-engine orchestration layer in n8n. Results from all engines are deduplicated, ranked, and merged before passing to Gemini for enrichment.

8. Download Workflow