Looking at the debate with Data Science

Image for post
Image for post
Photo by Charles Deluvio on Unsplash

Thanks to the internet, now the world knew about the Presidential Debate 2020 that went out of control. All of the major news stations were reporting about how the participants were interrupting and sniping at one another.

I decided to put together an article that focuses on analyzing the words used in the event and see if there are any hidden insights.

This article focuses on finding out the most used words, categorized by each spokesperson, and sentiment analysis of the speeches.

Image for post
Image for post
Image by Author: The most used word in the first Presidential Debate 2020 — ‘People’

The first 2020 Presidential Debate overview

- Chris Wallace


Detecting fraud from the text of Enron’s earnings call

Image for post
Image for post
Photo by Joshua Hoehne on Unsplash


About the dataset

The earnings call transcript used in this article is from Enron’s conference call hold on November 14, 2001. …

This article uses Natural Language Processing techniques to detect fraud in the accounting scandal-plagued Chinese coffee chain.

Image for post
Image for post
Photo by Ashkan Forouzani on Unsplash

In early 2020, Luckin Coffee was delisted from the NASDAQ stock exchange after the CEO admitted to inflating accounting figures in the company’s 2019 earning reports.

Luckin Coffee, once acclaimed as Starbucks’ biggest rival in the Chinese coffee market, was charged with fabricating sales revenues in 2019. Even though the scandal took some time to blew up, it inspired me to start thinking about the possibility of detecting fraud through words.

This article focuses on applying Natural Language Processing techniques to the Luckin Coffee Earning Calls in Quarter 2 and 3 in 2019. …

A step-by-step guide to cleaning and analyzing tweets in R

Image for post
Image for post
Photo by Darren Halstead on Unsplash

As we get closer to the U.S.’s next presidential election, I wanted to know what people are thinking of the nominees. Will, the current President, continue his stay at the White House, or will we see a new U.S. President with a less angry Twitter rant?

Getting the dataset

library(rtweet)# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret =…

Uncover Netflix’s expansion strategy in India against Disney Plus with popular NLP and text mining techniques.

Image for post
Image for post
Photo by Thibault Penin on Unsplash

Applying Natural Language Processing techniques on Earning Call transcripts

In the conferences, investors tend to interpret the Chief Executives’ language use with the company’s future performance. If you ever listened to an earning call, you may have already notice that executives are cautious with the words they use in these conferences.

Natural Language Processing (NLP) has been gaining popularity in these recent years. The intersection between technology and the linguistic field gave us a new way to look past the hype of a social media post, review, or in our case, a typical earning call transcript. …

Understanding the billionaire’s words and thoughts through Data Science.

Image for post
Image for post
Photo by Sharon McCutcheon on Unsplash

Keyword extraction is one of the most popular text mining techniques in the Natural Language Processing (NLP) field. The idea behind keyword extraction is to capture important words using Data Science automatically. The technique is very effective when we want to gain insights from a big chunk of text data quickly.

In this article, I will attempt to apply keyword extraction techniques on the stakeholder letters penned by Warren Buffett between 1977 to 2019. There are many keyword extraction techniques available, but we will focus on using three techniques: frequency analysis, RAKE, and POS-tagging on the letter texts.

Warren Buffett, aka the Oracle of Omaha, is infamously well-known for writing insightful annual letters that are widely anticipated by the shareholders of Berkshire Hathaway and the investment community. In this article, we aim to extract interesting insights from the letters without any heavy readings. …

Learn to analyze movie characters with popular text mining techniques!

Image for post
Image for post
Photo by Elia Pellegrini on Unsplash

Joker is a film about one of the most iconic supervillains. The film showed us the journey of a wronged man (Arthur), from being a sad child to a murderous man with agony, who will later be known by people of his freakish smile scar.

In this article, we seek to understand Arthur’s complicated personality by answering the following questions using text mining techniques in R:

  1. Arthur’s choice of words
  2. Arthur’s relationship with the other characters
  3. Arthur’s emotion development in the film

Dataset disclaimer

Image for post
Image for post
A snapshot of Arthur’s lines in the movie Joker.

The R libraries used in this article…

A simple guide to map visualization in R on the diverse island country, Singapore!

Image for post
Image for post
A watercolor map of Singapore island

Singapore is easily one of the top travel destinations in Asia. Even though the city is smaller than NYC, it has just everything you need or wants to experience. Vast selection of food, street (and luxury) shopping, green city that embraces biodiversity, you name it!

In this article, we will use R to visualize spatial data on top of the map of Singapore.

API setup and getting the map

#Load the R packages
#Set your API Key
ggmap::register_google(key = "your google api key")
#Get Singapore map as background
map <- get_map("singapore", maptype = "roadmap", zoom = 11, source = "google", color =…

An exploratory and sentimental analysis of what people are doing and how they feel during the Coronavirus lockdown

Image for post
Image for post
Photo by Sharon McCutcheon on Unsplash

As more countries declare a nationwide shutdown, most of the people are asked to stay at home and quarantined. I wanted to know how people are spending their time and how they are feeling during this “closedown ” period, so I analyzed some tweets in this article, hoping that the data will give me some insights.

Importing and pre-processing the dataset

After importing the data into R, we need to pre-process and tokenize the tweets into words (tokens) for analysis

tweet_words <- tweets %>%
created) %>%
mutate(created_date = as.POSIXct(created, format="%m/%d/%Y %H")) %>%
mutate(text = replace_non_ascii(text, replacement = "", remove.nonconverted = TRUE)) %>%
mutate(text = str_replace_all(text, regex("@\\w+"),"" )) %>%
mutate(text = str_replace_all(text, regex("[[:punct:]]"),"" )) %>%
mutate(text = str_replace_all(text, regex("http\\w+"),"" )) %>%

An exploratory and sentiment analysis of the COVID-19 pandemic in R

Image for post
Image for post
Photo by Markus Spiske on Unsplash

The recent Coronavirus/COVID 19 outbreaks have recently been declared a global emergency. As we start practicing social distancing, work from home to control the virus spread, I decided to use my spare time to look into what people are talking about the pandemic online.

For this article, 15,000 tweets with #Coronavirus and #COVID19 between January 30 to March 15 in 2020 were extracted for analysis.

Extracting the tweets into R


Feng Lim

Changing the world with data points, one word at a time. #naturalLanguageProcessing #textMining #sentimentAnalysis

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store