Scrap web as REST

With FaasPlus, web scraping is simple and efficient, allowing you to retrieve and parse data from websites quickly. By leveraging native JavaScript regex patterns, you can easily extract information from HTML content, making it ideal for collecting data from external sources without external dependencies.


Example: Scraping Wikipedia Biography

This function will access the Wikipedia page for the person specified by the firstName and lastName parameters, parse the content using native regex patterns, and return key details found in the biography infobox.

export async function handler(event) {
    const fullName = event.params?.firstName + '_' + event.params?.lastName;
    const url = 'https://en.wikipedia.org/wiki/' + fullName;
    
    // Fetch Wikipedia page
    const response = await fetch(url);
    const html = await response.text();
    
    // Parse HTML using regex (no external libraries in V8)
    // Extract key-value pairs from infobox table
    const res = {};
    
    // Match all table rows with infobox-label and infobox-data
    const rowPattern = /<th[^>]*class="[^"]*infobox-label[^"]*"[^>]*>([^<]+)<\/th>[\s\S]*?<td[^>]*class="[^"]*infobox-data[^"]*"[^>]*>([\s\S]*?)<\/td>/gi;
    
    let match;
    while ((match = rowPattern.exec(html)) !== null) {
        const key = match[1].trim();
        // Remove HTML tags from value
        const value = match[2].replace(/<[^>]+>/g, '').replace(/\s+/g, ' ').trim();
        if (key && value) {
            res[key] = value;
        }
    }
    
    return res;
}
>response
{
    "Born": "March 14, 1879",
    "Died": "April 18, 1955",
    "Alma mater": "Swiss Federal Polytechnic (Diploma)",
    "Known for": "Theory of relativity, E=mc², Einstein field equations",
    ...
}