Build Your First Scraper with FMiner Basic: Step-by-Step Tutorial


What is FMiner Basic?

FMiner Basic is a visual web scraping tool designed for users who want to extract website data without writing code. It uses a point-and-click interface to build extraction workflows (also called “scrapers” or “agents”), lets you schedule and run tasks, and exports results in common formats such as CSV and Excel.

Key highlights:

  • Visual, template-driven scraping — select page elements directly in a browser-like view.
  • No-code learning curve — suitable for beginners.
  • Export to CSV/XLSX — easy integration with spreadsheets and BI tools.
  • Simple scheduling — run scrapers at set intervals (features vary by edition).

Who should use FMiner Basic?

FMiner Basic is best for:

  • Non-developers who need structured web data (marketers, analysts, students).
  • Small businesses monitoring competitors’ prices, product listings, or job postings.
  • Researchers collecting datasets from news, public datasets, or directories.
  • Anyone who wants a straightforward visual tool before moving to more advanced scraping solutions.

Core concepts and terminology

  • Scraper/Agent: a configured task that navigates pages and extracts data.
  • Selector: a rule that identifies which page element(s) to extract (text, attribute, link, image).
  • Pagination: following “next” links or page-numbered lists to scrape multiple pages.
  • Loop/Repeat: iterating through lists of similar elements (e.g., search results).
  • Export: saving extracted data to a file or database.

Getting started: installation and first run

  1. Download and install FMiner Basic from the official FMiner site (choose the Basic edition).
  2. Launch FMiner — you’ll see a built-in browser and a workspace for building agents.
  3. Open the target website inside FMiner’s browser tab.
  4. Create a new agent (scraper). Name it clearly (e.g., “Product List — ExampleStore”).
  5. Use the point-and-click selector: hover over elements (titles, prices, images) and click to capture them.
  6. Add fields for each piece of data you want (product name, price, URL, image link).
  7. Configure pagination if the data spans multiple pages (click the “Next” button in the site and set it as the next page action).
  8. Run the agent in preview mode to confirm the extracted rows.
  9. Export results to CSV or Excel.

Example: scraping an e-commerce category

  • Field 1: Product title — selector: h2.product-title (or click the title in the browser).
  • Field 2: Price — selector: span.price.
  • Field 3: Product URL — selector: a.product-link (extract href attribute).
  • Pagination: click “Next” and set it as the agent’s pagination action.
  • Run and export.

Working with selectors and patterns

FMiner’s visual selectors generate underlying XPath/CSS-like patterns. To get reliable results:

  • Prefer selecting the smallest unique element (e.g., the title within a product card) rather than a broad container.
  • Use “select next similar” or “select all similar” features to capture lists.
  • Inspect the generated selector and refine it if the tool picks inconsistent elements.
  • Combine multiple selectors or use relative selection (e.g., price relative to the product container) to keep fields aligned.

Pagination and multi-page scraping

Most real-world tasks require iterating across pages:

  • Identify the pagination control (“Next”, page numbers).
  • Use FMiner’s pagination action to follow links until there is no next page.
  • For infinite-scroll pages, use the built-in scrolling action or a “load more” button click loop.
  • For sites that use JavaScript to fetch content, ensure FMiner waits for content to load (use wait/delay settings).

Handling dynamic content and JavaScript

Some sites render content client-side (AJAX). FMiner Basic supports basic JavaScript-driven pages by using its embedded browser and wait mechanisms:

  • Add a wait time or wait-for-element action after page load.
  • If content is loaded via API calls, you may be able to capture the underlying JSON endpoint instead of scraping rendered HTML — this is more robust when available.
  • For very complex dynamic sites, a more advanced edition or a code-based scraper may be needed.

Scheduling and automation

FMiner Basic typically offers basic scheduling to run agents at intervals (daily/weekly). Use scheduling to:

  • Keep datasets current (price trackers, inventory monitoring).
  • Automate repetitive data-collection tasks.
  • Combine scheduled runs with export-to-cloud folders or email delivery (check the Basic edition’s available integrations).

Exporting data and post-processing

Common export formats:

  • CSV — universal, spreadsheet-friendly.
  • XLSX — preserves formatting and is ready for Excel.
  • Database export — available in higher editions; in Basic you’ll likely export files and then import them into a DB or analysis tool.

Post-processing tips:

  • Clean price fields (remove currency symbols) before numeric analysis.
  • Normalize date formats.
  • Deduplicate rows by product ID or URL.

Troubleshooting common issues

  • Missing or inconsistent fields: refine selectors or use relative selection inside the product container.
  • Pagination stops prematurely: verify the “Next” selector and that the pagination control appears on all pages.
  • Blocked or CAPTCHA-protected pages: Basic edition may not include advanced anti-blocking; try adding delays, lower concurrency, use public APIs, or obtain site permission.
  • Rate limits and IP blocking: respect the target site’s robots.txt and rate limits; run with slower intervals and random delays.

  • Check Terms of Service: some sites prohibit scraping; always review and respect site terms.
  • Respect robots.txt as a minimum guidance (though it’s not itself a legal permission).
  • Avoid excessive request rates that harm a website’s operation.
  • For commercial use, consider obtaining explicit permission or using official APIs where available.

When to upgrade or switch tools

Consider moving beyond FMiner Basic if you need:

  • Large-scale scraping with IP rotation and proxy management.
  • Complex login handling, form submission, or CAPTCHA solving.
  • Database integrations, cloud execution, or team collaboration features.
  • Programmatic control (writing custom scripts in Python/Node.js) for bespoke transformations.

Practical example: step-by-step mini project

Goal: Extract article titles and publication dates from a news category.

Steps:

  1. Open the news category page in FMiner.
  2. Create a new agent “News — Latest”.
  3. Click the first article title → add field “title”.
  4. Click the date element → add field “date”.
  5. Use “select all similar” to capture all articles on the page.
  6. Set pagination to click “Next” until the end.
  7. Run preview and examine extracted rows.
  8. Export to CSV and open in Excel for sorting.

Final tips for beginners

  • Start small: build an agent for a single page and expand to pagination later.
  • Test thoroughly — run previews and inspect results before large exports.
  • Document your selectors and schedule to reproduce runs months later.
  • Learn basic XPath/CSS gradually — it makes selector refinement faster.
  • Use official APIs whenever they meet your needs; scraping should be a fallback when APIs don’t exist or lack required fields.

FMiner Basic lowers the barrier to entry for web data extraction by combining a visual interface with practical features like pagination, scheduling, and common export formats. For beginners, it’s a solid starting point to collect structured data from the web quickly and with minimal technical overhead.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *