A Beginner guide to make a NodeJS CLI to convert medium posts into markdown format

Van Nguyen Nguyen -

Edit on Github
This is a preview of what we are going to makeThis is a preview of what we are going to make

For someone who just read my blog by accident, I'm using a note-taking app called Joplin to take note which requires me to write every thing in markdown format and I get used to that. So one day I feel bored and came up with an idea that turning Medium posts (of course not include premium post) into a markdown format since Medium a big platform for blogger and it is going to be easier for me to take note since now I just put every thing into Joplin.


Prerequisites

If you have not done any NodeJS CLI before, there is a really good tutorial on Internet from Twilio. Please watch the first 5min, it is not that long and highly recommend watching the whole video and try out some stuff.


There is some package that we need to install:
  • esm This is a requirement package since it will help to support ESM feature like dynamic import
  • node-fetch This is a requirement package since it will allow to use Fetch API
  • turndown This is a requirement package since it will allow us to translate markup into markdown
markdown
1<h1>The title</h1> --> # The title
  • ora This is optional package since it just create some spinner animation when we fetch the API.

Big Problem

There are multiple approach to get this project done. One of the is doing Web Scrapping to get the main content of the website since this is just a beginner guide, so I will make the Web scrapping method in another post.


The other method is using Medium API to get the content. Unfortunately, the Medium does not provide the content of the post. But they do publish one thing that we can take advantage of, which is their RSS feeds (there is also a big drawback from using RSS feeds, which is that Medium only provide that latest 10 posts for us).


Then using an API called rss2json to help convert RSS format into JSON format so it is going to be way easier to work with.

The main logic

1First we want to get the URL from args
2-> then we get the proper link from the URL to construct RSS URL https://medium.com/feed/<User name>.
3--> Fetching the Feed and give the Feed to rss2json API.
4---> Getting the JSON data from the rss2json (Now you can do whatever your want since JSON data is really easy to work with).
5----> Using 'turndown' to convert the main content into makrdown format.
6-----> Put the data some where.

Let's code

Let's call the project m2m (which stands for medium to markdown)


If you follow the Prerequisites , make sure to watch Twilio tutorial as well as install the necessary packages.


There are some files that are important for initial setup

JSON

package.json

1{
2 "name": "<Put your name here>/m2m",
3 "version": "1.0.0",
4 "description": "A CLI to translate medium post into markdown format",
5 "main": "src/cli.js",
6 // The "bin" will define how the script gonna be called
7 "bin": {
8 "<Put your name her >/m2m": "bin/m2m",
9 "m2m": "bin/m2m"
10 },
11 "publishConfig": {
12 "access": "public"
13 },
14 "scripts": {
15 "test": "echo \"Error: no test specified\" && exit 1"
16 },
17 "keywords": [
18 "cli",
19 "m2m"
20 ],
21 "author": "Van Nguyen Nguye",
22 "license": "ISC",
23 "dependencies": {
24 "esm": "^3.2.25",
25 "node-fetch": "^2.6.7",
26 "ora": "^5.4.1",
27 "turndown": "^7.1.1"
28 }
29}
javascript

bin/m2m

1#!/usr/bin/env node
2// The first line starting with #! is called a shebang which is used to specify the absolute path to the interpreter
3// that will run the code below
4require = require("esm")(module /* , options */);
5require("../src/cli.js").cli(process.argv);
javascript

src/cli.js

1export function cli(args) {
2 // by default when you type "m2m" there are 2 thing gonna be return from args
3 // First arg is Node in your computer
4 // Second is the entry point of your code (which is bin/m2m)
5 // And the rest is every thing you type in
6 console.log(args);
7}

Do not forget to run npm link to create a symlink to our code. More on npm link

First step: construct the RSS URL and Get JSON data

There is one thing that we have to worry about:

Both links above are the same content from the same person but, the link is totally different? Why is that? Turn out Medium allow you to create a seperate blog (like for a community or a company, organisation,...). So there will be 2 different RSS links respectively:

javascript

src/cli.js

1export function cli(args) {
2 // We get the URL
3 const myArgs = args.slice(2);
4 // This will container the ID of the pager(this is the title of the page)
5 let mainID;
6
7 // Split every thing by the '/'
8 const processedURL = (URL) => {
9 return URL.split("/");
10 };
11
12 const fetchURL = async (URL) => {
13 const currURL = processedURL(URL);
14 // Check to see whether the link is a personal blog and community blog
15 const found = currURL.findIndex((el) => el === "medium.com");
16 // Create appropriate RSS feed link
17
18 let feedURL =
19 found > -1
20 ? `${currURL[0]}//${currURL[2]}/feed/${currURL[3]}`
21 : `${currURL[0]}//${currURL[2]}/feed`;
22
23 // Assign the mainID of the post
24 mainID = found > -1 ? currURL[4] : currURL[3];
25
26 // After having RSS feed, using rss2json to fetch for JSON data
27 const getJSONFormat = await fetch(
28 `https://api.rss2json.com/v1/api.json?rss_url=${feedURL}`
29 );
30
31 // Parse to JSON and return it
32 const text = await getJSONFormat.text();
33 const json = JSON.parse(text);
34 return json;
35 };
36}

Second Step: Turn every thing into Markdown

Highly recommend you to take one of the RSS feed I provided above and put to rss2json to see what is the structure of data we get back from the API.


Currently, it is going to be similar to this structure

API-structure
1{
2 // we can ignore these attributes
3 "status": "..."
4 "feed": {
5 ...
6 }
7
8 // This is what we want, the "items" attribute will be an array of object that represent the information of each post
9 // We basically want to take out the "items" value
10 "items": [
11 // This is one post
12 {
13 "title": "...",
14 "pubDate": "...",
15 "link": "..."
16 "author": "..."
17 ...
18 },
19 // And other posts
20 ...
21 ]
22}
javascript

src/cli.js

1// Async function becasue we did fetch data from function above
2 const convertToMD = async () => {
3 // Fetch the JSON content
4 const feed = await fetchURL(myArgs[0]);
5 let mainContent;
6 let flag = false;
7
8 // We checking for every post in the RSS feed, whether there is any ID similar to the URL we need
9 for (const item of feed.items) {
10 // Since the URL in RSS has some query in the link, we need to remove those to get the actual ID
11 const processLink = processedURL(item.link.split("?")[0]);
12 for (const li of processLink) {
13 // Find the matching ID
14 if (li === mainID) {
15 mainContent = item;
16 flag = true;
17 break;
18 }
19 }
20 }
21
22 // Return a new Promise
23 return new Promise((resolve, rejects) => {
24 // If we could not find anything, that's probably a premium post or the post from long time ago (I mentioned in Big Problem section)
25 if (!flag) {
26 rejects("This is a preimum post!");
27 } else {
28 // If there is a match
29 const turndownService = new TurndownService();
30 // Convert the whole HTML content into markdown by using turndown
31 const markdownContent = turndownService.turndown(
32 mainContent.content
33 );
34 // Create frontmatter
35 const frontmatter = `
36---
37firstPublishedAt: ${mainContent.pubDate}
38slug: ${mainContent.link}
39thumbnail: ${mainContent.thumbnail}
40author: ${mainContent.author}
41title: ${mainContent.title}
42tags: [${mainContent.categories}]
43---\n\n`;
44 // Merge them together
45 resolve(frontmatter + markdownContent);
46 }
47 });
48 };

Final Step: Find somewhere to put the content

The final step is an entry point to bring everything together

javascript

src/cli.js

1// The main function is also the entry point of our CLI
2 const main = async () => {
3 // If we have more than 2 arg or only m2m, we gonna print the usuage of our CLI
4 if (myArgs.length < 1 || myArgs.length > 2) {
5 console.log("Usage: m2m <Medium_URL>");
6 process.exit(1);
7 }
8 // Create a spinner with ora
9 const spinners = ora("Converting File\n").start();
10 // Since we return a promise, if the promise get reject, we will just print the error
11 const content = await convertToMD().catch((error) => {
12 spinners.fail(error);
13 exit(1);
14 });
15 // If the promise get resolve, we will create a /posts to container all the file
16 const currDir = process.cwd() + "/posts";
17
18 //Create a /posts folder at current Dir
19 try {
20 if (!fs.existsSync(currDir)) {
21 fs.mkdirSync(currDir);
22 }
23 } catch (error) {
24 spinners.fail(error);
25 }
26
27 // write to the file
28 const fileName = mainID + ".md";
29 fs.writeFileSync(`${currDir}/${fileName}`, content, (err) => {
30 spinners.fail(err);
31 });
32
33 spinners.succeed("Successfully create .md file");
34 };
35
36 // Run the main function of our CLI
37 main();

I hope you achive what you are looking for. If you have any problem, please refer to my example repo for more information.

© 2022 Van Nguyen Nguyen. All Rights Reserved.

Feel free to contribute to the website on Github if you see something go wrong

Contact me: nguyenvannguyen.oc@gmail.com