Thanks to the internet, now the world knew about the Presidential Debate 2020 that went out of control. All of the major news stations were reporting about how the participants were interrupting and sniping at one another.
I decided to put together an article that focuses on analyzing the words used in the event and see if there are any hidden insights.
This article focuses on finding out the most used words, categorized by each spokesperson, and sentiment analysis of the speeches.
- Incumbent President Donald Trump
- Former Vice President Joe Biden (Democratic nominee)
- Chris Wallace
Natural Language Processing (NLP) has been gaining tractions in recent years, allowing us to understand unstructured text data in a way that was never possible before. One of the promises of NLP is to use relevant techniques to detect fraud in companies and shed light on potential violations in the early phase.
I’ve only managed to find two earnings call transcripts online. And only one of
them is readable when converted from PDF to text. You can find the original
The earnings call transcript used in this article is from Enron’s conference call hold on November 14, 2001. …
In early 2020, Luckin Coffee was delisted from the NASDAQ stock exchange after the CEO admitted to inflating accounting figures in the company’s 2019 earning reports.
Luckin Coffee, once acclaimed as Starbucks’ biggest rival in the Chinese coffee market, was charged with fabricating sales revenues in 2019. Even though the scandal took some time to blew up, it inspired me to start thinking about the possibility of detecting fraud through words.
This article focuses on applying Natural Language Processing techniques to the Luckin Coffee Earning Calls in Quarter 2 and 3 in 2019. …
As we get closer to the U.S.’s next presidential election, I wanted to know what people are thinking of the nominees. Will, the current President, continue his stay at the White House, or will we see a new U.S. President with a less angry Twitter rant?
I used the R package
rtweet to download tweets with the hashtag #WhenTrumpIsOutOfOffice tweeted in March 2020. As a result, I was able to find more than 6000 tweets with the hashtag.
library(rtweet)# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
Earning calls are conference calls between the Chief Executives of a company with the public. The discussion sessions provide opportunities for the public to get a glimpse into what is happening in the company and within the industry.
In the conferences, investors tend to interpret the Chief Executives’ language use with the company’s future performance. If you ever listened to an earning call, you may have already notice that executives are cautious with the words they use in these conferences.
Natural Language Processing (NLP) has been gaining popularity in these recent years. The intersection between technology and the linguistic field gave us a new way to look past the hype of a social media post, review, or in our case, a typical earning call transcript. …
Keyword extraction is one of the most popular text mining techniques in the Natural Language Processing (NLP) field. The idea behind keyword extraction is to capture important words using Data Science automatically. The technique is very effective when we want to gain insights from a big chunk of text data quickly.
In this article, I will attempt to apply keyword extraction techniques on the stakeholder letters penned by Warren Buffett between 1977 to 2019. There are many keyword extraction techniques available, but we will focus on using three techniques: frequency analysis, RAKE, and POS-tagging on the letter texts.
Warren Buffett, aka the Oracle of Omaha, is infamously well-known for writing insightful annual letters that are widely anticipated by the shareholders of Berkshire Hathaway and the investment community. In this article, we aim to extract interesting insights from the letters without any heavy readings. …
Joker is a film about one of the most iconic supervillains. The film showed us the journey of a wronged man (Arthur), from being a sad child to a murderous man with agony, who will later be known by people of his freakish smile scar.
In this article, we seek to understand Arthur’s complicated personality by answering the following questions using text mining techniques in R:
The movie script can be obtained here. In this article, I extracted lines spoken by Arthur for analysis.
The R libraries used in this article…
Singapore is easily one of the top travel destinations in Asia. Even though the city is smaller than NYC, it has just everything you need or wants to experience. Vast selection of food, street (and luxury) shopping, green city that embraces biodiversity, you name it!
In this article, we will use R to visualize spatial data on top of the map of Singapore.
First, we need to get a map background using the get_map function.
#Load the R packages
library(ggmap)#Set your API Key
ggmap::register_google(key = "your google api key")#Get Singapore map as background
map <- get_map("singapore", maptype = "roadmap", zoom = 11, source = "google", color =…
As more countries declare a nationwide shutdown, most of the people are asked to stay at home and quarantined. I wanted to know how people are spending their time and how they are feeling during this “closedown ” period, so I analyzed some tweets in this article, hoping that the data will give me some insights.
For the dataset, I extracted 20,000 tweets with “#quarantine” and “#stayhome” hashtags from Twitter using the twitteR library.
After importing the data into R, we need to pre-process and tokenize the tweets into words (tokens) for analysis
tweet_words <- tweets %>%
mutate(created_date = as.POSIXct(created, format="%m/%d/%Y %H")) %>%
mutate(text = replace_non_ascii(text, replacement = "", remove.nonconverted = TRUE)) %>%
mutate(text = str_replace_all(text, regex("@\\w+"),"" )) %>%
mutate(text = str_replace_all(text, regex("[[:punct:]]"),"" )) %>%
mutate(text = str_replace_all(text, regex("http\\w+"),"" )) %>%
The recent Coronavirus/COVID 19 outbreaks have recently been declared a global emergency. As we start practicing social distancing, work from home to control the virus spread, I decided to use my spare time to look into what people are talking about the pandemic online.
For this article, 15,000 tweets with #Coronavirus and #COVID19 between January 30 to March 15 in 2020 were extracted for analysis.
There are many R packages that allow you to extract tweets into usable data types for analysis. Before you use the package in R, make sure that you have an API account on twitter that allows you to extract the tweets. …