Leveraging Social Networks for Career Longevity using Big Data Analytics

Github repo here.

"No man is an island, entire of itself; every man is a piece of the continent, a part of the main."—John Donne

I love John Donne's beautiful meditation on human interconnectedness (the era's sexist pronouns aside). It resonates with me, now more than ever as I explore how networks shape career trajectories in my PhD dissertation. Using IMDb data spanning from 2000 to 2023, I trace how collaboration networks change over time and examine how these changes influence people's career longevity and productivity. I specifically focus on the different impacts networks have on men and women, offering strategies for building more beneficial and equitable professional networks.

Navigating the complexity of large-scale networks

Imagine a network comprising about 100,000 individuals. Annually, this network could potentially foster nearly 5 billion connections, which balloons to over 100 billion across two decades. This astounding complexity arises from two inherent characteristics of networks. First is exponential growth: each time a new individual joins, the network's potential connections multiples exponentially; every newcomer in theory can connect with all existing members. Second is constant flux: like the ebb and flow of the ocean, networks are never static; old ties dissipate while new ties form continually.

To capture these shifts, I've constructed and analyzed 21 sequential network graphs. Each graph spans a three-year period, starting from 2000-2002 and stretching to 2020-2022. This method allows me to track the evolution of social capital  and its impact on career trajectories up to the year 2023. Indeed, in our interconnected world, no one is an island.

Phases of analysis

The Python codes for all the analyses are provided in the project's Github repository. You can also see them with these direct links:

This work uses publicly available data from IMDb and the U.S. Social Security Administration. The analyses can be entirely recreated by following the provided Python codes—though keep in mind, results may vary depending on when you access the data, which is updated regularly.