Scrap web as REST
With FaasPlus, web scraping is simple and efficient, allowing you to retrieve and parse data from websites quickly. By leveraging native JavaScript regex patterns, you can easily extract information from HTML content, making it ideal for collecting data from external sources without external dependencies.
Example: Scraping Wikipedia Biography
This function will access the Wikipedia page for the person specified by the firstName and
lastName parameters, parse the content using native regex patterns, and return key details found in the
biography infobox.
export async function handler(event) {
const fullName = event.params?.firstName + '_' + event.params?.lastName;
const url = 'https://en.wikipedia.org/wiki/' + fullName;
// Fetch Wikipedia page
const response = await fetch(url);
const html = await response.text();
// Parse HTML using regex (no external libraries in V8)
// Extract key-value pairs from infobox table
const res = {};
// Match all table rows with infobox-label and infobox-data
const rowPattern = /<th[^>]*class="[^"]*infobox-label[^"]*"[^>]*>([^<]+)<\/th>[\s\S]*?<td[^>]*class="[^"]*infobox-data[^"]*"[^>]*>([\s\S]*?)<\/td>/gi;
let match;
while ((match = rowPattern.exec(html)) !== null) {
const key = match[1].trim();
// Remove HTML tags from value
const value = match[2].replace(/<[^>]+>/g, '').replace(/\s+/g, ' ').trim();
if (key && value) {
res[key] = value;
}
}
return res;
}
>response
{
"Born": "March 14, 1879",
"Died": "April 18, 1955",
"Alma mater": "Swiss Federal Polytechnic (Diploma)",
"Known for": "Theory of relativity, E=mc², Einstein field equations",
...
}