Alessandro N. Vargas
Twitter containing Covid-related posts
Studying the people's sentiment towards the Covid-19 pandemic has attracted attention because the pandemic has striking effects upon people's emotional state. The emotional state has been driven not only by the deadliest consequences of getting the coronavirus but also by the confinement imposed by the stay-at-home orders. To complicate the person's emotional state further, many individuals face unemployment, a hard situation to deal with.
Studying the public conversation around COVID-19 allows us to infer how the person's wellbeing has been impacted during the pandemic.
This project aims to monitor all the Covid-related Twitter posts, extracting among them the sentiment and emotions people express. For instance, a person can express his or her opinion about the Covid-19 and its consequences in a positive, neutral, or negative mood. How to assess and categorize this mood is a challenge not only from the computational viewpoint, but also from the cognitive viewpoint. For some individuals, a post can be negative, but for others that same post can be positive. Classyfing that post is then a challenge.
This project deals with Natural Language Processing (NLP), an important research topic. Natural language processing(NLP)includes automatic text parsing and understanding (i.e., sentiment analysis), speech recognition, machine translation between human languages, text generation, text summarization, and question answering. Here, NLP is associated with sentiment analysis, since we wish to categorize the people's sentiment from Twitter posts. The source-code that deploys this project was written in the NLTK (Natural Language Tool Kit) on Python. This platform provides users with a highly configurable environment for different types of natural language analysis and classification techniques.
Sentiment analysis is a sub-topic from the area known as natural language processing (NLP). Given a piece of text, the researcher trying to extract sentiment from it wishes to determine whether it brings positive, negative, or neutral sentiment. The task of performing sentiment analysis usually relies on two approaches: the lexicon-based approach and the machine learning approach.
The lexicon-based approach involves identifying the sentiment (or opinion) of words or phrases. The researcher looks at a table containing words that are associated with sentiment polarity. For instance, `happy' is a strong positive word, whereas 'hate' is a strongly negative word. The words `happy' and 'hate' are associated with scalar numbers (+1 and -1). If the researcher aims to extract sentiment from a phrase, he or she can determine the whole phrase sentiment by summing up the sentiment value of each word.
The machine learning approach involves either supervised or unsupervised learning. The idea is that an algorithm works upon a certain input data. For one thing, the input data was created by human supervision for the case called supervised learning. For the other thing, the input data was created by an algorithm, and it adjusts itself with no human intervention.
The Covid-related Twitter posts are available in free databases, for instance here. The user of this database must be proficient with the use of Python and its associated packages. For instance, the user must "hydrate" the data before using it ("hydrate" is a technical term that means the user must download all the contents directly from the Twitter server, for each given Twitter ID, a tedious procedure).
Recent progress of this project is available as a repository in the GitHub (click here). Other important Covid database is the COVID-Twitter-BERT.