Repository visibility
The project evaluates how stars and forks are distributed across repositories and whether a small number of projects dominate attention.
Open-Source Repository Research
This project studies repository visibility, lifecycle, documentation, and community health using public GitHub metadata. It combines a normalized SQL schema, cleaned analytics data, an interactive web dashboard, and a companion Power BI report.
Overview
The project evaluates how stars and forks are distributed across repositories and whether a small number of projects dominate attention.
It studies active lifespan, commit activity, and repository longevity to understand how maintenance patterns vary.
It compares README coverage and community health metrics to see how well repositories are documented and structured.
Research Questions
Are stars and forks spread broadly across repositories, or concentrated in a small set of highly visible projects and owners?
Do most repositories remain early stage, or do they show sustained activity and stronger signs of long-term maintenance?
Does README coverage remain high even among lower-impact repositories, and how does that compare with community health measures?
Data
The main analysis uses the full repository metadata file with 14,644 repositories, which provides stars, forks, commits, README status, and lifecycle fields.
A separate GitHub users dataset supports the relational schema and helps illustrate joins, normalization, and cross-source entity matching.
Only about 200 repositories match across the two sources, so the public-facing analysis emphasizes repository-level patterns instead of full profile-level integration.
Why This Project Matters
The project moves beyond counting stars and asks how documentation, sustained activity, and reuse signals interact with visibility.
The analytical framing emphasizes averages, adoption rates, concentration, lifecycle, and impact-tier share rather than only raw aggregates.
The final output is positioned as a decision-support artifact for understanding repository quality, adoption, and long-term maintenance signals.
Site Map
Interactive charts, filters, KPI cards, and repository spotlight views built from the cleaned full dataset.
Schema design, source integration strategy, SQL view definitions, and methodology notes from the project.
Main conclusions, data-quality challenges, and future work drawn directly from the final presentation.
SQL scripts, prepared BI datasets, published Power BI link, and supporting documents for presentation or review.