A summary after reading many StackOverflow posts, now with codes!

Image for post
Image for post
image by author: A retro chart generated in R.

Uncovering the company’s business plan throughout the Covid -19 pandemic with Data Science

Image for post
Image for post
Photo by zhang kaiyv on Unsplash

This article is part of an NLP series where I use text mining techniques to analyze earnings calls.

In today’s article, I will be analyzing Apple Inc’s earnings call in Financial Year 2020 with keyword extraction and frequency analysis techniques in R.

Preliminary dataset exploration and cleaning

Earnings call transcripts from Quarter 1 to 4 of Financial Year 2020 released by Apple Inc were used for analysis. After obtaining the dataset, I used Microsoft Excel and RPA tools to pre-process it.

Looking at the debate with Data Science

Image for post
Image for post
Photo by Charles Deluvio on Unsplash

Thanks to the internet, now the world knew about the Presidential Debate 2020 that went out of control. All of the major news stations were reporting about how the participants were interrupting and sniping at one another.

I decided to put together an article that focuses on analyzing the words used in the event and see if there are any hidden insights.

This article focuses on finding out the most used words, categorized by each spokesperson, and sentiment analysis of the speeches.

Detecting fraud from the text of Enron’s earnings call

Image for post
Image for post
Photo by Joshua Hoehne on Unsplash


Natural Language Processing (NLP) has been gaining tractions in recent years, allowing us to understand unstructured text data in a way that was never possible before. One of the promises of NLP is to use relevant techniques to detect fraud in companies and shed light on potential violations in the early phase.

About the dataset

I’ve only managed to find two earnings call transcripts online. And only one of
them is readable when converted from PDF to text. You can find the original
document here.

The earnings call transcript used in this article is from Enron’s conference call hold on November 14, 2001. …

This article uses Natural Language Processing techniques to detect fraud in the accounting scandal-plagued Chinese coffee chain.

Image for post
Image for post
Photo by Ashkan Forouzani on Unsplash

In early 2020, Luckin Coffee was delisted from the NASDAQ stock exchange after the CEO admitted to inflating accounting figures in the company’s 2019 earning reports.

Luckin Coffee, once acclaimed as Starbucks’ biggest rival in the Chinese coffee market, was charged with fabricating sales revenues in 2019. Even though the scandal took some time to blew up, it inspired me to start thinking about the possibility of detecting fraud through words.

This article focuses on applying Natural Language Processing techniques to the Luckin Coffee Earning Calls in Quarter 2 and 3 in 2019. …

A step-by-step guide to cleaning and analyzing tweets in R

Image for post
Image for post
Photo by Darren Halstead on Unsplash

As we get closer to the U.S.’s next presidential election, I wanted to know what people are thinking of the nominees. Will, the current President, continue his stay at the White House, or will we see a new U.S. President with a less angry Twitter rant?

Getting the dataset

I used the R package rtweet to download tweets with the hashtag #WhenTrumpIsOutOfOffice tweeted in March 2020. As a result, I was able to find more than 6000 tweets with the hashtag.

library(rtweet)# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret =…

Uncover Netflix’s expansion strategy in India against Disney Plus with popular NLP and text mining techniques.

Image for post
Image for post
Photo by Thibault Penin on Unsplash

Applying Natural Language Processing techniques on Earning Call transcripts

Earning calls are conference calls between the Chief Executives of a company with the public. The discussion sessions provide opportunities for the public to get a glimpse into what is happening in the company and within the industry.

In the conferences, investors tend to interpret the Chief Executives’ language use with the company’s future performance. If you ever listened to an earning call, you may have already notice that executives are cautious with the words they use in these conferences.

Natural Language Processing (NLP) has been gaining popularity in these recent years. The intersection between technology and the linguistic field gave us a new way to look past the hype of a social media post, review, or in our case, a typical earning call transcript. …

Understanding the billionaire’s words and thoughts through Data Science.

Image for post
Image for post
Photo by Sharon McCutcheon on Unsplash

Keyword extraction is one of the most popular text mining techniques in the Natural Language Processing (NLP) field. The idea behind keyword extraction is to capture important words using Data Science automatically. The technique is very effective when we want to gain insights from a big chunk of text data quickly.

In this article, I will attempt to apply keyword extraction techniques on the stakeholder letters penned by Warren Buffett between 1977 to 2019. There are many keyword extraction techniques available, but we will focus on using three techniques: frequency analysis, RAKE, and POS-tagging on the letter texts.

Warren Buffett, aka the Oracle of Omaha, is infamously well-known for writing insightful annual letters that are widely anticipated by the shareholders of Berkshire Hathaway and the investment community. In this article, we aim to extract interesting insights from the letters without any heavy readings. …

Learn to analyze movie characters with popular text mining techniques!

Image for post
Image for post
Photo by Elia Pellegrini on Unsplash

Joker is a film about one of the most iconic supervillains. The film showed us the journey of a wronged man (Arthur), from being a sad child to a murderous man with agony, who will later be known by people of his freakish smile scar.

In this article, we seek to understand Arthur’s complicated personality by answering the following questions using text mining techniques in R:

  1. Arthur’s choice of words
  2. Arthur’s relationship with the other characters
  3. Arthur’s emotion development in the film

Dataset disclaimer

The movie script can be obtained here. In this article, I extracted lines spoken by Arthur for analysis.

A simple guide to map visualization in R on the diverse island country, Singapore!

Image for post
Image for post
A watercolor map of Singapore island

Singapore is easily one of the top travel destinations in Asia. Even though the city is smaller than NYC, it has just everything you need or wants to experience. Vast selection of food, street (and luxury) shopping, green city that embraces biodiversity, you name it!

In this article, we will use R to visualize spatial data on top of the map of Singapore.

API setup and getting the map

First, we need to get a map background using the get_map function.

#Load the R packages
#Set your API Key
ggmap::register_google(key = "your google api key")
#Get Singapore map as background
map <- get_map("singapore", maptype = "roadmap", zoom = 11, source = "google", color =…


Feng Lim

Changing the world with data points, one word at a time. #naturalLanguageProcessing #textMining #sentimentAnalysis

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store