/extract-to-csv

by Aera · Data

Will open Aera to add this skill. Don't have Aera? Download

Extract structured data from pages or open tabs into a clean, CSV-ready dataset.

data extractioncsvtablesscrapingopen tabs

SKILL.md

# Extract to CSV

Use this skill to turn visible web data into a clean dataset. The goal is usable structured data, not a prose summary.

## Inputs

Use the current page by default. If the user says open tabs, inspect all relevant open tabs. If the user requests a specific schema, follow it. If no schema is provided, infer one from the repeated entities on the page.

## Process

1. Identify the entity being extracted, such as product, company, job, person, listing, event, article, price, or review.
2. Infer a schema with stable column names. Prefer clear names like name, company, price, url, date, rating, location, description, source_url.
3. Extract every visible relevant item. If pagination or lazy loading is obvious, ask before crawling many pages unless the user already requested full extraction.
4. Normalize values: dates as ISO style when possible, prices with currency, URLs as absolute URLs, blank fields as empty cells.
5. Add a source_url column when data comes from more than one page or tab.
6. Validate row consistency. Every row must have the same columns in the same order.
7. If file creation is available, create a CSV file. If not, provide a fenced CSV block.

## Output Format

## Dataset Summary
State entity type, row count, columns, pages or tabs used, and notable missing fields.

## CSV
Provide the CSV or a link to the created file.

## Data Quality Notes
List duplicates removed, fields that may be incomplete, assumptions, and pages not accessed.

## Rules

- Do not mix prose inside CSV output.
- Do not fabricate missing values.
- Preserve source URLs.
- Ask before extracting behind login, paywalls, or sensitive personal data.
- Do not use em dashes.