Open Graph: Preview Web Pages

· node.js

Did you ever wonder how posting a link on your social media app got a preview card rendered? Well it is not so hard. It is a standard called Open Graph that does just that. Easy to understand, the website that wants to be preview-able by another website should embed certain HTML meta tags in the HTML pages that describe the website and then the reader site can just consume the standard elements to preview.

In fact for your information, https://metatags.io/ is a website that does just what we are trying to achieve here. However we will not be concerned with styling nor any complications, my aim is to get the basics across to you my dear reader.

Technology wise, I will be using Node.js with separate frontend and backend projects using the popular express and react packages respectively. Reasons for separate projects will unveil CORS issues you may face if you decided to go over separate web domains.

The entire source code can be found at https://github.com/Morr0/open-graph-tutorial

Writing the backend

So what we need is an API that responds with a certain format for our input. Our input is the website we are looking to extract Open Graph info from. Within this API we will have a way to fetch the website we are hitting and parse the HTML.

So for libraries, I will need one for API, I will use express which is a well-known Node.js library and cors so it handles all CORS, axios for making HTTP requests (we can do it with just a little of Node.js code) and jsdom to parse a HTML document.

To initialize and install:

npm init -y

npm i express cors axios jsdom

Then create an index.js file:

const express = require('express');
const axios = require('axios');
const jsdom = require('jsdom');
const cors = require('cors');

Then create an API server

const app = express();

app.use(cors({
 origin: '*'
}));

const PORT = 3500;
app.listen(PORT, () => {
 console.log('LISTENING');
});

Will listen at port 3500.

Now we define the endpoint handler, I want to listen for GET requests at /meta. Put the following below the middleware above:

app.get('/meta', async (req, res) => {
 console.log('ENDPOINT HIT');
 
 res.statusCode = 200;
 res.end();
});

We then can run the app for test:

node index.js

I will use Postman to test the API.

Now that everything works fine let’s fill the handler with validation code first:

const { url } = req.query || {};
if (!url || !url?.includes('http')){
 res.statusCode = 400;
 res.json({
 error: "Please provide a valid url"
 });
 return;
}
console.log('Received url', url);

Then we can create a new function to handle the logic of our app.

const getMetaTags = async (url) => {
 const returnable = {};

 const response = await axios.default({
 url: url,
 method: 'GET',
 responseType: 'text'
 });
 console.log('Received', response);

 const dom = new jsdom.JSDOM(response.data);
 const elms = dom.window.document.querySelectorAll('meta');
 console.log('length', elms.length);
 elms.forEach((elm) => {
 const {content} = elm;
 if (elm.getAttribute('property') === 'og:title') returnable.title = content;
 if (elm.getAttribute('property') === 'og:site_name') returnable.site_name = content;
 if (elm.getAttribute('name') === 'description') returnable.description = content;
 });

 console.log('returnable', returnable);

 return returnable;
};

It takes a url string and then call the website. Note we call it now the same way as any other browser opens up a webpage. One caveat this is just a fetch. Next we progress to initialize an in-memory document. Then we query all meta tags because all Open Graph metadata is held within <meta /> tags. So here is pretty much frontend JS API when working with JSDOM. So we loop through all elements, then we check for certain tags. I will only check for few, not interested in all existing ones.

Going back to our endpoint handler after our previous validation code we add:

try {
 const tags = await getMetaTags(url);
 res.statusCode = 200;
 res.json(tags);
} catch (e){
 console.log(e);
 res.statusCode = 500;
 res.json(e);
}

Just as a safety mechanism and best practice we don’t want to keep the connection open in case an error occurs. So we wrap the call in a try-catch block and return 500 as a best practice when an internal server occurs.

That’s it for the backend, we can try it.

One thing to note is that this does not handle Javascript rendered pages which are Single Page Applications using React or any similar technology. For that we may use a library like Puppeteer to run Javascript since JSDOM doesn’t do that. For SEO purposes most websites put Open Graph tags in a non-Javascript renderable blank slate HTML. But they can also be changed/generated dynamically when run in Javascript. This is hard to tackle because you don’t know which Javascript code will render/change Open Graph metadata, there is not an easy fix. Performance becomes a consideration when doing this on scale since if you used Puppeteer then you are basically using a Chrome browser.

Writing the frontend

We start by creating a new React app:

npx create-react-app frontend

Then just run npm run start to test that it works.

Cool. Will delete most of the default files just for less clutter. Then will write the React component that will render the data:

import React, {useState} from 'react';

function App() {
 const [previews, setPreviews] = useState([]);
 const [currentUrl, setCurrentUrl] = useState('');
 const [loadingPreview, setLoadingPreview] = useState(false);

 const addClick = (e) => {
 e.preventDefault();

 setLoadingPreview(true);
 fetch(`http://localhost:3500/meta?url=${encodeURIComponent(currentUrl)}`)
 .catch((error) => alert(`Error. ${error}`))
 .then(async (response) => {
 const preview = await response.json();
 setPreviews(x => {
 return [
 ...x,
 preview
 ];
 });
 })
 .finally(() => setLoadingPreview(false));

 };

 return (
 <div>
 <header>
 <input type="url" value={currentUrl} onChange={(e) => setCurrentUrl(e.target.value)} placeholder="Enter URL" />
 <button type="button" onClick={addClick} disabled={loadingPreview}>Preview</button>
 </header>
 <main>
 <ul>
 {previews.map((preview, index) => (
 <li key={index} style={{
 display: 'flex',
 flexDirection: 'column',
 borderBottom: '1px solid black'
 }}>
 <p>Site: {preview.site_name}</p>
 <p>Title: {preview.title}</p>
 <p>Description: {preview.description}</p>
 </li>
 ))}
 </ul>
 </main>
 </div>
 );
}

export default App;

Will start by declaring the states first. I will declare a state for holding the data returned by the backend as an array. As I will allow the ability to search multiple previews. Cool then the other 2 states are for the URL currently searching for and a loading state that will disable search button until data is received.

Then will skip to the UI and come back to the button click handler. Will render a text field, a button as the header and for the body will have a dynamic list of previews of sites.

Then the addClick event handler will be called to call our backend. I will be using native browser function fetch which does HTTP calls. Beware it returns a Promise, the then handles the success case and we set the state on React, when setting state, React will re-render the component. On the catch case there is an error. Bothways finally gets called afterwards and we set loadingPreview state to false so we can reuse the button again.

That’s it, you should be able to use it.

Remarks

We could always design the frontend first and it is a choice you may have to decide on. A good practice is to design the backend not too dependent on UI. Sometimes this maybe not feasible when working on dynamic pages but for a case like above we can expect that API to be called from anywhere and not just our app.

Another note is to consider dynamically having the host of the backend, here we have hardcoded the address localhost:3500. When you want to start dynamically adding IPs you may want some bundling/parameter management system in place.

What is important to note as well is that it is a bad practice in my eyes to have a separate backend and frontend projects for such a naive thing. We could have just used a system like Next.js with server-side rendering capabilities that has both backend and frontend. Also it will eliminate the need for parameter management and we wouldn’t care about hardcoding the API host.