Dissecting Scientific Papers with NLP

Github repo here.

"Language is the road map of a culture. It tells you where its people come from and where they are going."Rita Mae Brown 

Since starting my PhD in 2019, I’ve used text analysis to help me with various aspects of my research, including gearing up for my comprehensive exam. During this period, I delved into over a hundred research papers centered around two pivotal themes: "Novelty Reception," which explores how new ideas gain acceptance, and "Network and Gender," which examines how social networks impact the careers of men and women differently.  These two stream of research are foundational to my work into the systemic barriers women face in creative fields. 

I believe Rita Mae Brown's insight applies to scientific research as well, that the language employed in scientific papers not only reflects current understanding but also signals potential future directions a field might go. Here, I decode this language using NLP.

Phases of analysis

In this GitHub repository, you'll find a series of notebooks that employ different NLP techniques to dissect and understand the thematic structures embedded within these research papers. You can also find them here:

The dataset analyzed is available via a Google Sheets link provided in the first notebook. Thus, anyone can replicate the study by simply running the provided code. To save you time and computational resources, I've included the models trained in the second notebook as well.