Analysis of Diabetes Health Indicators using SQL
In this project, I'm analyzing a comprehensive diabetes health indicators dataset from Kaggle to understand the key factors that predict diabetes risk across different demographics. Using advanced SQL functions and statistical analysis, I aim to uncover patterns that could inform public health policy and healthcare resource allocation. The dataset contains over 250,000 survey responses from the CDC's Behavioral Risk Factor Surveillance System, providing a robust foundation for meaningful insights.
The original dataset can be found here: Kaggle - Diabetes Health Indicators Dataset
You can learn more about the original data source here: CDC Behavioral Risk Factor Surveillance System
The Healthcare Context
Diabetes affects over 37 million Americans, with healthcare costs exceeding $327 billion annually. For healthcare organizations, understanding the demographic and lifestyle patterns associated with diabetes isn't just about patient careāit's about strategic resource planning, preventive care programs, and cost management.
Hospitals and health systems need to identify high-risk populations to:
Optimize screening programs by targeting resources where they'll have maximum impact
Design prevention interventions based on modifiable risk factors
Predict healthcare utilization to ensure adequate staffing and capacity
Address health equity by identifying disparities in care access and outcomes
Early identification of prediabetic populations represents a critical intervention opportunity, as lifestyle modifications can delay or prevent progression to type 2 diabetes.
Database Structure
I'm using Oracle SQL functions to analyze the DIABETES_HEALTH_INDICATORS table, which contains comprehensive health and demographic data, including diabetes status, BMI, physical activity levels, age groups, gender, healthcare access, and lifestyle factors like alcohol consumption.
Analysis 1: Diabetes Category Breakdown
The Foundation Query
First, let's understand the overall distribution of diabetes status in our population. This baseline analysis will inform all subsequent investigations.
Output:
This query reveals the critical intervention opportunity: the prediabetic population represents individuals who could benefit from targeted prevention programs before progressing to full diabetes.
Analysis 2: BMI Patterns Across Diabetes Groups
Understanding the Weight-Diabetes Connection
Body Mass Index is a well-established risk factor for diabetes, but how does average BMI progress across our three categories?
The progression of BMI across categories provides clear evidence for weight management as a prevention strategy. This data supports the business case for employer wellness programs and preventive care investments.
Analysis 3: Physical Activity Impact
The Exercise Paradox
Does physical activity truly correlate with lower diabetes rates across our large population sample?
This analysis quantifies the protective effect of physical activity, providing evidence for community-based exercise programs and workplace wellness initiatives.
Analysis 4: Gender Disparities
Uncovering the Gender Gap
Are there significant differences in diabetes prevalence between men and women that could inform targeted screening programs?
Gender-specific patterns could indicate hormonal factors, healthcare-seeking behaviors, or screening disparities that warrant further investigation and targeted interventions.
Analysis 5: Age Group Analysis
The Aging Effect
How does diabetes prevalence change across age groups, and where are the critical intervention windows?
Age-stratified analysis reveals when diabetes risk accelerates, informing the timing of screening programs and preventive interventions for maximum cost-effectiveness.
Analysis 6: Alcohol Consumption Patterns
The Complex Relationship with Alcohol
Heavy alcohol consumption can affect diabetes risk through multiple pathways. Let's examine this relationship:
Understanding alcohol's role in diabetes risk helps healthcare providers address lifestyle counseling and identify populations needing substance abuse screening alongside diabetes care.
Analysis 7: Healthcare Access Analysis
The Access-Outcome Connection
Healthcare access is a critical social determinant of health. How does access to care correlate with diabetes outcomes?
This analysis reveals the relationship between healthcare access and diabetes outcomes, providing evidence for policy discussions about healthcare expansion and early intervention programs.
Key Findings Summary
From this comprehensive SQL analysis, several critical insights emerge:
Population Distribution:
The majority of the population falls into the "No Diabetes" category
A significant prediabetic population represents intervention opportunities
Full diabetes cases show concerning demographic clustering
BMI Relationships:
Clear BMI progression across diabetes categories
Weight management emerges as a critical prevention strategy
Early intervention in overweight populations could prevent progression
Lifestyle Factor Impact:
Physical activity shows protective effects against diabetes
Age-related risk acceleration occurs at predictable intervals
Healthcare access correlates with better diabetes outcomes
Demographic Pattern:
Gender differences suggest the need for targeted screening approaches
Age-stratified risk reveals optimal intervention timing
Alcohol consumption patterns indicate the need for integrated care approaches
Healthcare Policy Implications:
Healthcare access disparities affect diabetes outcomes
Prevention programs could be more cost-effective than treatment
Multi-factor risk assessment enables precision public health approaches
Technical Implementation Notes
This analysis leveraged several advanced SQL techniques:
CASE WHEN statements for categorical analysis and risk stratification
GROUP BY with multiple dimensions for comprehensive demographic analysis
Aggregate functions (COUNT, AVG) for population-level insights
ORDER BY for logical result presentation
Multi-table conceptual framework ready for JOIN operations with additional datasets
Business Impact and Next Steps
This analysis provides healthcare organizations with:
Immediate Applications:
Risk stratification models for patient populations
Resource allocation guidance for prevention programs
Evidence base for policy advocacy and program funding
Baseline metrics for intervention effectiveness tracking
Strategic Implications:
Prevention programs targeting prediabetic populations could reduce long-term costs
Age and demographic-specific screening protocols would improve early detection
Lifestyle intervention programs have quantifiable impact potential
Healthcare access improvements could significantly affect population health outcomes
Future Analysis Opportunities:
Predictive modeling using machine learning techniques
Cost-benefit analysis of intervention programs
Geographic analysis with additional location data
Longitudinal analysis with time-series data
Conclusion
Using advanced SQL analysis on this comprehensive diabetes dataset, we've uncovered actionable insights that can inform healthcare strategy at multiple levels. The clear patterns in demographic risk factors, lifestyle influences, and healthcare access effects provide a data-driven foundation for improving diabetes prevention and care.
This analysis demonstrates how powerful SQL queries can transform raw health data into strategic intelligence, enabling healthcare organizations to make evidence-based decisions about resource allocation, program development, and patient care protocols.
The prediabetic population identified through this analysis represents the greatest opportunity for intervention. This finding could reshape how healthcare systems approach diabetes prevention and ultimately improve outcomes while controlling costs.
I hope you enjoyed reading about this project. If you'd like to see more of my work, please connect with me on LinkedIn





















