Inspiration

When confronted with Problem Statement 4 (Structuring NLx Job Posting Data), we realized a fundamental truth: a list of extracted keywords is just a sterile dataset, but a map of how those skills interact is true economic intelligence. If a displaced worker in Colorado needs to pivot careers, policymakers don’t just need to know what skills exist—they need to know the mathematical paths between them. We were inspired to move beyond simply dumping text into a flat spreadsheet. We wanted to uncover the topological "DNA" of the workforce and visualize the hidden structures of career mobility, workforce inequality, and emerging industries.

What it does

The Workforce Graph Intelligence (WGI) platform is a fully automated, 5-step end-to-end pipeline that transforms raw, unstructured National Labor Exchange (NLx) strings into a structured, relational graph database. Our application features:

  1. The Skill Ecosystems Map: A topography of the labor market that automatically clusters thousands of skills into colored industry domains (like IT, Healthcare, or Finance) without any human labeling, using Louvain Community Detection.
  2. Job Description Explorer: A transparent tool where researchers can search for any skill and view the raw job description side-by-side with our model's extracted taxonomy, ensuring complete methodological transparency.
  3. Market Overview Dashboard: A macro-level tracker highlighting 10,482 digested jobs, 165,664 extracted skills, and custom metrics like the "Ghost Job Rate" to detect artificial labor inflation. ## How we built it We engineered a robust data engineering pipeline built in Python:
  4. Graph Construction & Pruning: We transformed job postings into a massive bipartite graph (Jobs <-> Skills) and collapsed it into a pruned skill-to-skill co-occurrence matrix, setting a Minimum Edge Weight threshold to eliminate noise.
  5. Semantic NLP & Graph Message Passing: We bypassed basic Node2Vec random walks in favor of a hybrid NLP+Graph approach. We used Sentence-BERT (all-MiniLM-L6-v2) to encode the literal semantic description of every skill into a 384-dimensional vector. We then applied a symmetric normalized adjacency matrix to propagate embeddings across 2-hops. This ensured our model understood that "Python" and "Data Analysis" are related both semantically and structurally.
  6. Centrality & Gini Mathematics: We ran Betweenness and Eigenvector centrality algorithms to identify "Gatekeeper Skills"—nodes that act as vital bridges connecting entirely disparate industries. To measure workforce inequality and detect "locked" career paths, we calculated the Gini coefficient on the degree distribution of individual clusters:
  7. The Frontend API: We serialized our analytics and deployed a lightweight Flask backend APIs served through a dynamic interactive dashboard on Vercel.

Challenges we ran into

The primary challenge was managing the sheer computational weight of an \(N \times N\) graph matrix when dealing with over 165,000 raw skill extractions. Visualizing every micro-connection created massive "hairballs" of indistinguishable noise. We overcame this by writing rigorous spectral smoothing scripts to drop weak edges and selectively rendering only the top 120 highest-frequency "Gatekeeper" nodes to the user interface, ensuring the data narrative remained clear and performant.

Accomplishments that we're proud of

We are incredibly proud that we didn't just build a dataset; we built a completely automated mathematical model of the economy. Seeing the Louvain algorithm perfectly cluster Healthcare skills together, completely independent of human labeling, proved that our NLP models were actually "understanding" the structural relationships between jobs. Furthermore, building a production-ready dashboard that renders this massive graph natively in the browser without crashing is a huge technical win.

What we learned

We learned that text extraction is only the first 10% of natural language processing. The real power of unstructured data lies in modeling the geometry of the words. By combining NLP sentence transformers with pure network science, we unlocked macro-economic trends that a standard regex script could never detect.

What's next for Workforce Graph Intelligence

We plan to scale our pipeline to ingest the entire national footprint of the NLx feed, moving beyond Colorado. Additionally, we want to integrate a frontend "Career Simulator" where a user can input their resume, and our Graph will run a Dijkstra shortest-path algorithm to recommend the precise 3 skills they need to learn to cross the "Gatekeeper" threshold into a higher-paying ecosystem.

Built With

Share this project:

Updates