# Load required packages (hint: you need tidycensus, tidyverse, and knitr)
library(tidyverse)
library(tidycensus)
library(scales)
library(RColorBrewer)
library(knitr)
# Set your Census API key
# Choose your state for analysis - assign it to a variable called my_state
my_state <- "LA"Lab 1: Census Data Quality for Policy Decisions
Evaluating Data Reliability for Algorithmic Decision-Making
Assignment Overview
Scenario
You are a data analyst for the [Your State] Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.
Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.
Learning Objectives
- Apply dplyr functions to real census data for policy analysis
- Evaluate data quality using margins of error
- Connect technical analysis to algorithmic decision-making
- Identify potential equity implications of data reliability issues
- Create professional documentation for policy stakeholders
Submission Instructions
Submit by posting your updated portfolio link on Canvas. Your assignment should be accessible at your-portfolio-url/labs/lab_1/
Make sure to update your _quarto.yml navigation to include this assignment under an “Labs” menu.
Part 1: Portfolio Integration
Create this assignment in your portfolio repository under an labs/lab_1/ folder structure. Update your navigation menu to include:
- text: Assignments
menu:
- href: labs/lab_1/your_file_name.qmd
text: "Lab 1: Census Data Exploration"
If there is a special character like a colon, you need use double quote mark so that the quarto can identify this as text
Setup
State Selection: I have chosen Louisiana for this analysis because:
Louisiana’s rural parishes show the highest income estimate uncertainty, with low-confidence counties concentrated in areas with smaller, more dispersed populations. These unreliable income estimates pose a direct risk to algorithmic systems — parishes with high MOE could be systematically misclassified for funding priority, meaning communities with genuine need may be overlooked or incorrectly ranked simply due to data limitations rather than actual economic conditions.
Part 2: County-Level Resource Assessment
2.1 Data Retrieval
Your Task: Use get_acs() to retrieve county-level data for your chosen state.
Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide
Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.
# Write your get_acs() code here
county_level <- get_acs( state = my_state, geography = "county",
variables = c( med_hh_inc = "B19013_001", tot_pop = "B01003_001"), year = 2022, survey = "acs5", output = "wide")
# Clean the county names to remove state name and "County"
county_level <- county_level %>%
mutate(county = str_remove(NAME, ", Louisiana"),
county = str_remove( county, " Parish") )
# Hint: use mutate() with str_remove()
# Display the first few rows
county_level# A tibble: 64 × 7
GEOID NAME med_hh_incE med_hh_incM tot_popE tot_popM county
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 22001 Acadia Parish, Louisi… 44977 2841 57674 NA Acadia
2 22003 Allen Parish, Louisia… 52755 6411 22798 NA Allen
3 22005 Ascension Parish, Lou… 93800 3514 126973 NA Ascen…
4 22007 Assumption Parish, Lo… 47023 3781 21067 NA Assum…
5 22009 Avoyelles Parish, Lou… 38696 5175 39529 NA Avoye…
6 22011 Beauregard Parish, Lo… 68525 6225 36553 NA Beaur…
7 22013 Bienville Parish, Lou… 34268 3054 12958 NA Bienv…
8 22015 Bossier Parish, Louis… 64598 3122 128877 NA Bossi…
9 22017 Caddo Parish, Louisia… 47572 2032 236259 NA Caddo
10 22019 Calcasieu Parish, Lou… 64370 3249 210770 NA Calca…
# ℹ 54 more rows
2.2 Data Quality Assessment
Your Task: Calculate margin of error percentages and create reliability categories.
Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)
Hint: Use mutate() with case_when() for the categories.
# Calculate MOE percentage and reliability categories using mutate()
county_level <- county_level %>%
mutate ( MOE_perc = (med_hh_incM/med_hh_incE)*100) %>%
mutate( confidence_level = case_when(
MOE_perc < 5 ~ "High Confidence",
MOE_perc >=5 & MOE_perc <10 ~ "Moderate Confidence",
MOE_perc > 10 ~ "Low Confidence"
))
# Create a summary showing count of counties in each reliability category
county_level %>%
count(confidence_level, name = "county") %>%
mutate( share = county / sum(county) * 100)# A tibble: 3 × 3
confidence_level county share
<chr> <int> <dbl>
1 High Confidence 15 23.4
2 Low Confidence 23 35.9
3 Moderate Confidence 26 40.6
# Hint: use count() and mutate() to add percentages2.3 High Uncertainty Counties
Your Task: Identify the 5 counties with the highest MOE percentages.
Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()
Hint: Use arrange(), slice(), and select() functions.
library(knitr)
library(kableExtra)
# Create table of top 5 counties by MOE percentage
reliability_table <- county_level %>%
arrange( desc(MOE_perc)) %>%
slice(1:5) %>%
select( County = county,
Median_Income= med_hh_incE,
MOE = med_hh_incM,
MOE_perc,
Reliability = confidence_level)
reliability_table# A tibble: 5 × 5
County Median_Income MOE MOE_perc Reliability
<chr> <dbl> <dbl> <dbl> <chr>
1 Cameron 69847 16919 24.2 Low Confidence
2 Red River 43821 8891 20.3 Low Confidence
3 Madison 34508 6612 19.2 Low Confidence
4 East Carroll 30856 4931 16.0 Low Confidence
5 St. James 62946 9793 15.6 Low Confidence
# Format as table with kable() - include appropriate column names and caption
reliability <- county_level %>%
arrange(desc(MOE_perc)) %>%
slice(1:5) %>%
select(
county,
med_hh_incE,
med_hh_incM,
MOE_perc,
confidence_level
) %>%
kable(
col.names = c(
"County",
"Median Household Income",
"Margin of Error",
"MOE (%)",
"Reliability"
),
digits = 2,
caption = "Top 5 Counties with the Highest Median Household Income MOE Percentages",
format.args = list(big.mark = ",")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover"),
full_width = FALSE
) %>%
footnote(
general = c(
"Margin of Error (MOE) expressed as a percentage of the estimate.",
"🟢 High Confidence: MOE < 5%",
"🟡 Moderate Confidence: MOE 5–10%",
"🔴 Low Confidence: MOE ≥ 10%",
"Source: American Community Survey 5-Year Estimates"
),
general_title = "Notes:"
)
reliability| County | Median Household Income | Margin of Error | MOE (%) | Reliability |
|---|---|---|---|---|
| Cameron | 69,847 | 16,919 | 24.22 | Low Confidence |
| Red River | 43,821 | 8,891 | 20.29 | Low Confidence |
| Madison | 34,508 | 6,612 | 19.16 | Low Confidence |
| East Carroll | 30,856 | 4,931 | 15.98 | Low Confidence |
| St. James | 62,946 | 9,793 | 15.56 | Low Confidence |
| Notes: | ||||
| Margin of Error (MOE) expressed as a percentage of the estimate. | ||||
| 🟢 High Confidence: MOE < 5% | ||||
| 🟡 Moderate Confidence: MOE 5–10% | ||||
| 🔴 Low Confidence: MOE ≥ 10% | ||||
| Source: American Community Survey 5-Year Estimates |
Data Quality Commentary:
All five parishes with the highest income uncertainty fall into the Low Confidence category, with Cameron Parish showing the most extreme MOE at 24.22% — meaning the true median household income could be nearly $17,000 above or below the reported estimate of $69,847. An algorithmic system relying on this income data would be most likely to misclassify Cameron, Red River, Madison, East Carroll, and St. James parishes, all of which are small, rural Louisiana parishes where the ACS samples too few households to produce stable estimates. The combination of small population sizes and geographic isolation in these parishes drives up uncertainty, meaning communities that may have genuine economic need could be systematically ranked incorrectly by any automated funding allocation system.
Part 3: Neighborhood-Level Analysis
3.1 Focus Area Selection
Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.
Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.
# Use filter() to select 2-3 counties from your county_reliability data
selected_counties <- county_level %>%
filter(confidence_level %in% c(
"High Confidence",
"Moderate Confidence",
"Low Confidence"
)) %>%
group_by(confidence_level) %>%
slice_max(MOE_perc, n = 1, with_ties = FALSE) %>%
ungroup()
selected_counties# A tibble: 3 × 9
GEOID NAME med_hh_incE med_hh_incM tot_popE tot_popM county MOE_perc
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 22069 Natchitoches … 41310 2062 37478 NA Natch… 4.99
2 22023 Cameron Paris… 69847 16919 5447 NA Camer… 24.2
3 22053 Jefferson Dav… 52470 5233 32277 NA Jeffe… 9.97
# ℹ 1 more variable: confidence_level <chr>
# Store the selected counties in a variable called selected_counties
# Display the selected counties with their key characteristics
selected_counties %>%
select(
county,
med_hh_incE,
MOE_perc,
confidence_level
) %>%
kable(
col.names = c(
"County",
"Median Household Income",
"MOE (%)",
"Reliability"
),
digits = 2,
caption = "Selected Counties to Run a tract level analysis",
format.args = list(big.mark = ",")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover"),
full_width = FALSE
) %>%
footnote(
general = c(
"Margin of Error (MOE) expressed as a percentage of the estimate.",
"🟢 High Confidence: MOE < 5%",
"🟡 Moderate Confidence: MOE 5–10%",
"🔴 Low Confidence: MOE ≥ 10%",
"Source: American Community Survey 5-Year Estimates"
),
general_title = "Notes:"
)| County | Median Household Income | MOE (%) | Reliability |
|---|---|---|---|
| Natchitoches | 41,310 | 4.99 | High Confidence |
| Cameron | 69,847 | 24.22 | Low Confidence |
| Jefferson Davis | 52,470 | 9.97 | Moderate Confidence |
| Notes: | |||
| Margin of Error (MOE) expressed as a percentage of the estimate. | |||
| 🟢 High Confidence: MOE < 5% | |||
| 🟡 Moderate Confidence: MOE 5–10% | |||
| 🔴 Low Confidence: MOE ≥ 10% | |||
| Source: American Community Survey 5-Year Estimates |
# Show: county name, median income, MOE percentage, reliability categoryComment on the output:
The three selected parishes — Cameron, Jefferson Davis, and Natchitoches — represent distinct reliability tiers, allowing for a meaningful comparison of how data quality varies across Louisiana’s rural landscape. Cameron Parish, with its small and geographically dispersed coastal population, shows the highest income uncertainty, while Natchitoches reflects moderate confidence typical of small interior parishes. This selection ensures our tract-level analysis captures the full range of data quality challenges the department may encounter when deploying algorithmic tools statewide.
3.2 Tract-Level Demographics
Your Task: Get demographic data for census tracts in your selected counties.
Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.
# Define your race/ethnicity variables with descriptive names
tract_level <- get_acs(
state = my_state,
geography = "tract",
county = c("069", "023", "053"),
variables = c(
med_hh_inc = "B19013_001",
tot_pop = "B03002_001",
white = "B03002_003",
black = "B03002_004",
hisp = "B03002_012"
),
year = 2022,
survey = "acs5",
output = "wide"
)
# Add readable tract and county name columns using str_extract() or similar
tract_level_clean <- tract_level %>%
mutate(
tract_number = str_extract(NAME, "\\d+\\.?\\d*"),
county_name = str_extract(NAME, "(?<=; )[^;]+(?= Parish)")
)
# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations
tract_level_clean <- tract_level_clean %>%
mutate( perc_white = ((whiteE/ tot_popE)*100),
perc_black = ((blackE/tot_popE)*100),
perc_hisp = ((hispE/tot_popE)*100))3.3 Demographic Analysis
Your Task: Analyze the demographic patterns in your selected areas.
# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
highest_hisp_tract <- tract_level_clean %>%
mutate(perc_hisp = (hispE / tot_popE) * 100) %>%
arrange(desc(perc_hisp)) %>%
slice(1) %>%
select(tract_number, county_name, tot_popE, hispE, perc_hisp)
highest_hisp_tract# A tibble: 1 × 5
tract_number county_name tot_popE hispE perc_hisp
<chr> <chr> <dbl> <dbl> <dbl>
1 4.01 Jefferson Davis 2365 159 6.72
# Calculate average demographics by county using group_by() and summarize()
# Show: number of tracts, average percentage for each racial/ethnic group
county_summary <- tract_level_clean %>%
mutate(
perc_white = (whiteE / tot_popE) * 100,
perc_black = (blackE / tot_popE) * 100,
perc_hisp = (hispE / tot_popE) * 100
) %>%
group_by(county_name) %>%
summarize(
num_tracts = n(),
avg_perc_white = mean(perc_white, na.rm = TRUE),
avg_perc_black = mean(perc_black, na.rm = TRUE),
avg_perc_hisp = mean(perc_hisp, na.rm = TRUE),
.groups = "drop"
)
county_summary# A tibble: 3 × 5
county_name num_tracts avg_perc_white avg_perc_black avg_perc_hisp
<chr> <int> <dbl> <dbl> <dbl>
1 Cameron 5 92.4 1.08 1.60
2 Jefferson Davis 8 75.5 18.2 2.69
3 Natchitoches 11 51.4 39.6 2.22
# Create a nicely formatted table of your results using kable()
county_summary %>%
kable(
col.names = c(
"Parish",
"Number of Tracts",
"Avg % White",
"Avg % Black",
"Avg % Hispanic/Latino"
),
digits = 2,
caption = "Average Demographics by Parish",
format.args = list(big.mark = ",")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover"),
full_width = FALSE
) %>%
footnote(
general = c(
"Percentages represent the average across all census tracts within each parish.",
"Source: American Community Survey 5-Year Estimates (2022)"
),
general_title = "Notes:"
)| Parish | Number of Tracts | Avg % White | Avg % Black | Avg % Hispanic/Latino |
|---|---|---|---|---|
| Cameron | 5 | 92.38 | 1.08 | 1.60 |
| Jefferson Davis | 8 | 75.48 | 18.17 | 2.69 |
| Natchitoches | 11 | 51.38 | 39.58 | 2.22 |
| Notes: | ||||
| Percentages represent the average across all census tracts within each parish. | ||||
| Source: American Community Survey 5-Year Estimates (2022) |
Part 4: Comprehensive Data Quality Evaluation
4.1 MOE Analysis for Demographic Variables
Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.
Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics
# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
tract_level_clean <- tract_level_clean %>%
mutate ( MOE_white = ((whiteM/whiteE)*100),
MOE_black = (( blackM/ blackE) *100),
MOE_hisp = ((hispM/hispE)*100))
# Create a flag for tracts with high MOE on any demographic variable
tract_level_clean <- tract_level_clean %>%
mutate(
high_MOE_flag = ifelse(
replace_na(MOE_white > 15, FALSE) |
replace_na(MOE_black > 15, FALSE) |
replace_na(MOE_hisp > 15, FALSE),
"🔴 High MOE (>15%)",
"Acceptable MOE"
)
)
# Use logical operators (| for OR) in an ifelse() statement
# Create summary statistics showing how many tracts have data quality issues
tract_level_clean %>%
group_by(county_name, high_MOE_flag) %>%
summarize(num_tracts = n(), .groups = "drop") %>%
kable(
col.names = c("Parish", "Data Quality", "Number of Tracts"),
digits = 2,
caption = "Data Quality Summary by Parish",
format.args = list(big.mark = ",")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover"),
full_width = FALSE
) %>%
footnote(
general = c(
"A tract is flagged if any demographic variable (White, Black, or Hispanic) has MOE > 15%.",
"🔴 High MOE (>15%): Estimates may be unreliable.",
"Acceptable MOE: Estimates are sufficiently reliable for analysis.",
"Source: American Community Survey 5-Year Estimates (2022)"
),
general_title = "Notes:"
)| Parish | Data Quality | Number of Tracts |
|---|---|---|
| Cameron | 🔴 High MOE (>15%) | | 5| |
| Jefferson Davis | 🔴 High MOE (>15%) | | 8| |
| Natchitoches | 🔴 High MOE (>15%) | | 11| |
| Notes: | ||
| A tract is flagged if any demographic variable (White, Black, or Hispanic) has MOE > 15%. | ||
| 🔴 High MOE (>15%): Estimates may be unreliable. | ||
| Acceptable MOE: Estimates are sufficiently reliable for analysis. | ||
| Source: American Community Survey 5-Year Estimates (2022) |
4.2 Pattern Analysis
Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.
# Group tracts by whether they have high MOE issues
moe_comparison <- tract_level_clean %>%
group_by(county_name) %>%
summarize(
num_tracts = n(),
avg_population = mean(tot_popE, na.rm = TRUE),
avg_perc_white = mean(perc_white, na.rm = TRUE),
avg_perc_black = mean(perc_black, na.rm = TRUE),
avg_perc_hisp = mean(perc_hisp, na.rm = TRUE),
perc_unreliable_white = mean(replace_na(MOE_white > 15, FALSE)) * 100,
perc_unreliable_black = mean(replace_na(MOE_black > 15, FALSE)) * 100,
perc_unreliable_hisp = mean(replace_na(MOE_hisp > 15, FALSE)) * 100,
.groups = "drop"
)
# Calculate average characteristics for each group:
# - population size, demographic percentages
# Use group_by() and summarize() to create this comparison
# Create a professional table showing the patterns
moe_comparison %>%
kable(
col.names = c(
"Parish",
"Tracts",
"Avg Population",
"Avg % White",
"Avg % Black",
"Avg % Hispanic",
"% Tracts Unreliable (White)",
"% Tracts Unreliable (Black)",
"% Tracts Unreliable (Hispanic)"
),
digits = 2,
caption = "Demographic Estimates and Data Reliability by Parish",
format.args = list(big.mark = ",")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover"),
full_width = FALSE
) %>%
footnote(
general = c(
"% Tracts Unreliable = share of tracts where MOE exceeds 15% of the estimate.",
"High unreliability for minority groups reflects small subgroup populations in these parishes.",
"🔴 Estimates with high MOE should be interpreted with caution.",
"Source: American Community Survey 5-Year Estimates (2022)"
),
general_title = "Notes:"
)| Parish | Tracts | Avg Population | Avg % White | Avg % Black | Avg % Hispanic | % Tracts Unreliable (White) | % Tracts Unreliable (Black) | % Tracts Unreliable (Hispanic) |
|---|---|---|---|---|---|---|---|---|
| Cameron | 5 | 1,089.40 | 92.38 | 1.08 | 1.60 | 80.00 | 100.00 | 100 |
| Jefferson Davis | 8 | 4,034.62 | 75.48 | 18.17 | 2.69 | 50.00 | 100.00 | 100 |
| Natchitoches | 11 | 3,407.09 | 51.38 | 39.58 | 2.22 | 81.82 | 90.91 | 100 |
| Notes: | ||||||||
| % Tracts Unreliable = share of tracts where MOE exceeds 15% of the estimate. | ||||||||
| High unreliability for minority groups reflects small subgroup populations in these parishes. | ||||||||
| 🔴 Estimates with high MOE should be interpreted with caution. | ||||||||
| Source: American Community Survey 5-Year Estimates (2022) |
Pattern Analysis:
The analysis reveals that data quality problems are not randomly distributed — they are systematically concentrated among minority demographic groups across all three parishes. Black and Hispanic populations show substantially higher MOE percentages than white populations, not because of data collection failures, but because these groups represent smaller shares of the total population in these parishes, resulting in smaller ACS sample sizes and therefore greater statistical uncertainty. This pattern is consistent across Cameron, Jefferson Davis, and Natchitoches parishes, suggesting a structural limitation of the ACS in rural, majority-white areas. The practical implication is serious: an algorithm relying on demographic estimates for these communities would have the least reliable data precisely for the minority groups it may be intended to serve.
Part 5: Policy Recommendations
5.1 Analysis Integration and Professional Summary
Your Task: Write an executive summary that integrates findings from all four analyses.
Executive Summary Requirements: 1. Overall Pattern Identification: What are the systematic patterns across all your analyses? 2. Equity Assessment: Which communities face the greatest risk of algorithmic bias based on your findings? 3. Root Cause Analysis: What underlying factors drive both data quality issues and bias risk? 4. Strategic Recommendations: What should the Department implement to address these systematic issues?
Executive Summary:
Overall Pattern Identification: Across all four analyses, a consistent pattern emerges — data reliability in Louisiana’s rural parishes is inversely related to population size and demographic diversity. County-level income estimates show high uncertainty in sparsely populated coastal and inland parishes, while tract-level demographic data reveals that minority group estimates carry disproportionately high margins of error even within parishes that appear reliable at the aggregate level.
Equity Assessment: Black and Hispanic residents across Cameron, Jefferson Davis, and Natchitoches parishes face the greatest risk of algorithmic harm. Because ACS estimates for these groups carry MOE values well above 15% in the majority of tracts, any algorithm using these figures to allocate social services would be operating on fundamentally unreliable inputs for the very communities most likely to need those services. This creates a feedback loop where data scarcity compounds existing inequity.
Root Cause Analysis: The underlying driver of both data quality issues and bias risk is the same: small subgroup population sizes in rural parishes lead to small ACS sample sizes, which in turn produce wide confidence intervals. This is not a data entry or methodology failure — it is a structural limitation of survey-based estimation in low-density, low-diversity areas. Rural minority communities are, by definition, underrepresented in the sample, making their estimates the least stable.
Strategic Recommendations: The Department of Human Services should implement a tiered approach to algorithmic decision-making that explicitly accounts for data reliability. High-confidence parishes can proceed with automated prioritization, while moderate-confidence parishes should require human review of algorithmic outputs before funding decisions are finalized. In low-confidence parishes, and for any tract where minority demographic estimates carry MOE above 15%, the department should suspend automated classification entirely and instead conduct targeted supplemental surveys or partner with local organizations to gather ground-truth data before making resource allocation decisions.
6.3 Specific Recommendations
Your Task: Create a decision framework for algorithm implementation.
# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category
# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"
# - Low Confidence: "Requires manual review or additional data"
# Format as a professional table with kable()
county_level %>%
select(county, med_hh_incE, med_hh_incM, MOE_perc, confidence_level) %>%
mutate(
recommendation = case_when(
confidence_level == "High Confidence" ~ "🟢 Safe for algorithmic decisions",
confidence_level == "Moderate Confidence" ~ "🟡 Use with caution - monitor outcomes",
confidence_level == "Low Confidence" ~ "🔴 Requires manual review or additional data"
)
) %>%
arrange(desc(MOE_perc)) %>%
kable(
col.names = c(
"Parish",
"Median Household Income",
"Margin of Error",
"MOE (%)",
"Reliability",
"Recommendation"
),
digits = 2,
caption = "Algorithm Implementation Decision Framework by Parish",
format.args = list(big.mark = ",")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover"),
full_width = FALSE
) %>%
footnote(
general = c(
"Recommendations based on Margin of Error (MOE) as a percentage of the estimate.",
"🟢 High Confidence (MOE < 5%): Estimates are reliable for automated decisions.",
"🟡 Moderate Confidence (MOE 5–10%): Use with human oversight and outcome monitoring.",
"🔴 Low Confidence (MOE > 10%): Data unreliability poses risk of algorithmic bias.",
"Source: American Community Survey 5-Year Estimates (2022)"
),
general_title = "Notes:"
)| Parish | Median Household Income | Margin of Error | MOE (%) | Reliability | Recommendation |
|---|---|---|---|---|---|
| Cameron | 69,847 | 16,919 | 24.22 | Low Confidence | 🔴 Requires manual review or additional data | |
| Red River | 43,821 | 8,891 | 20.29 | Low Confidence | 🔴 Requires manual review or additional data | |
| Madison | 34,508 | 6,612 | 19.16 | Low Confidence | 🔴 Requires manual review or additional data | |
| East Carroll | 30,856 | 4,931 | 15.98 | Low Confidence | 🔴 Requires manual review or additional data | |
| St. James | 62,946 | 9,793 | 15.56 | Low Confidence | 🔴 Requires manual review or additional data | |
| Caldwell | 45,707 | 7,075 | 15.48 | Low Confidence | 🔴 Requires manual review or additional data | |
| St. Helena | 46,402 | 7,065 | 15.23 | Low Confidence | 🔴 Requires manual review or additional data | |
| LaSalle | 59,926 | 9,117 | 15.21 | Low Confidence | 🔴 Requires manual review or additional data | |
| Concordia | 38,929 | 5,562 | 14.29 | Low Confidence | 🔴 Requires manual review or additional data | |
| Catahoula | 48,259 | 6,502 | 13.47 | Low Confidence | 🔴 Requires manual review or additional data | |
| Avoyelles | 38,696 | 5,175 | 13.37 | Low Confidence | 🔴 Requires manual review or additional data | |
| Union | 47,068 | 6,133 | 13.03 | Low Confidence | 🔴 Requires manual review or additional data | |
| Tensas | 35,653 | 4,539 | 12.73 | Low Confidence | 🔴 Requires manual review or additional data | |
| West Feliciana | 71,985 | 9,132 | 12.69 | Low Confidence | 🔴 Requires manual review or additional data | |
| West Carroll | 45,035 | 5,686 | 12.63 | Low Confidence | 🔴 Requires manual review or additional data | |
| Pointe Coupee | 53,045 | 6,662 | 12.56 | Low Confidence | 🔴 Requires manual review or additional data | |
| Allen | 52,755 | 6,411 | 12.15 | Low Confidence | 🔴 Requires manual review or additional data | |
| East Feliciana | 64,709 | 7,398 | 11.43 | Low Confidence | 🔴 Requires manual review or additional data | |
| Morehouse | 37,875 | 4,038 | 10.66 | Low Confidence | 🔴 Requires manual review or additional data | |
| Iberville | 59,410 | 6,240 | 10.50 | Low Confidence | 🔴 Requires manual review or additional data | |
| Evangeline | 34,526 | 3,582 | 10.37 | Low Confidence | 🔴 Requires manual review or additional data | |
| De Soto | 49,807 | 5,118 | 10.28 | Low Confidence | 🔴 Requires manual review or additional data | |
| Sabine | 40,777 | 4,130 | 10.13 | Low Confidence | 🔴 Requires manual review or additional data | |
| Jefferson Davis | 52,470 | 5,233 | 9.97 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Franklin | 41,129 | 4,063 | 9.88 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Grant | 57,362 | 5,569 | 9.71 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Richland | 48,125 | 4,550 | 9.45 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Winn | 44,922 | 4,240 | 9.44 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| West Baton Rouge | 80,510 | 7,340 | 9.12 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Beauregard | 68,525 | 6,225 | 9.08 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Lincoln | 37,001 | 3,309 | 8.94 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Bienville | 34,268 | 3,054 | 8.91 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| St. John the Baptist | 65,114 | 5,713 | 8.77 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Lafourche | 61,381 | 4,953 | 8.07 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Assumption | 47,023 | 3,781 | 8.04 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| St. Charles | 79,191 | 6,237 | 7.88 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Washington | 41,803 | 3,193 | 7.64 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Jackson | 40,406 | 2,801 | 6.93 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Acadia | 44,977 | 2,841 | 6.32 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Vermilion | 56,194 | 3,429 | 6.10 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Livingston | 77,978 | 4,335 | 5.56 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Terrebonne | 63,088 | 3,476 | 5.51 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| St. Bernard | 55,857 | 3,072 | 5.50 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Webster | 34,263 | 1,859 | 5.43 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Ouachita | 49,261 | 2,573 | 5.22 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Tangipahoa | 55,274 | 2,869 | 5.19 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Calcasieu | 64,370 | 3,249 | 5.05 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Claiborne | 32,034 | 1,616 | 5.04 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Plaquemines | 77,996 | 3,933 | 5.04 | Moderate Confidence | 🟡 Use with caution - monitor outcomes | |
| Natchitoches | 41,310 | 2,062 | 4.99 | High Confidence | 🟢 Safe for algorithmic decisions | |
| St. Mary | 47,322 | 2,292 | 4.84 | High Confidence | 🟢 Safe for algorithmic decisions | |
| St. Landry | 44,478 | 2,152 | 4.84 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Bossier | 64,598 | 3,122 | 4.83 | High Confidence | 🟢 Safe for algorithmic decisions | |
| St. Martin | 50,806 | 2,405 | 4.73 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Caddo | 47,572 | 2,032 | 4.27 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Iberia | 55,190 | 2,319 | 4.20 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Vernon | 56,547 | 2,244 | 3.97 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Ascension | 93,800 | 3,514 | 3.75 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Rapides | 55,407 | 2,017 | 3.64 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Orleans | 51,116 | 1,705 | 3.34 | High Confidence | 🟢 Safe for algorithmic decisions | |
| St. Tammany | 76,914 | 2,541 | 3.30 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Lafayette | 66,617 | 2,192 | 3.29 | High Confidence | 🟢 Safe for algorithmic decisions | |
| East Baton Rouge | 62,083 | 1,760 | 2.83 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Jefferson | 63,257 | 1,592 | 2.52 | High Confidence | 🟢 Safe for algorithmic decisions | |
| Notes: | |||||
| Recommendations based on Margin of Error (MOE) as a percentage of the estimate. | |||||
| 🟢 High Confidence (MOE < 5%): Estimates are reliable for automated decisions. | |||||
| 🟡 Moderate Confidence (MOE 5–10%): Use with human oversight and outcome monitoring. | |||||
| 🔴 Low Confidence (MOE > 10%): Data unreliability poses risk of algorithmic bias. | |||||
| Source: American Community Survey 5-Year Estimates (2022) |
Key Recommendations:
Your Task: Use your analysis results to provide specific guidance to the department.
Counties suitable for immediate algorithmic implementation: Parishes classified as High Confidence — where median household income MOE falls below 5% — are appropriate candidates for algorithmic decision-making. In these areas, the ACS estimates are stable enough that automated prioritization is unlikely to produce systematic misclassification. The department should still monitor outcomes over time to detect any emergent disparities.
Counties requiring additional oversight: Parishes in the Moderate Confidence tier (MOE between 5–10%) can be included in algorithmic workflows, but only with active human oversight. Program officers should review algorithmic outputs for these parishes before finalizing funding decisions, and outcome data should be tracked quarterly to identify whether any communities are being consistently under- or over-served relative to their actual need.
Counties needing alternative approaches: Low Confidence parishes (MOE above 10%), and any tract where minority demographic estimates exceed 15% MOE, should be removed from automated decision pipelines entirely. For these areas, the department should pursue one or more of the following: targeted supplemental ACS surveys, partnerships with local nonprofits and service providers to gather administrative data, or manual needs assessments conducted by field staff with community knowledge.
- Do data reliability patterns persist over time — that is, have the same parishes consistently shown high MOE across multiple ACS waves (2015, 2017, 2019, 2022) — or do reliability issues fluctuate with population shifts and post-disaster displacement (particularly relevant for Cameron Parish post-Hurricane Laura)?
- How do tract-level income MOE patterns compare to demographic MOE patterns — are the tracts with the least reliable income estimates also the tracts with the least reliable racial composition estimates, suggesting compounded data quality risk for the most vulnerable communities?
- Would switching from ACS 5-year estimates to administrative data sources (SNAP enrollment, Medicaid claims, school lunch eligibility) produce more reliable proxies for economic need in low-confidence parishes, and how would that change which communities are prioritized for services?
Technical Notes
Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [date]
Reproducibility: - All analysis conducted in R version [your version] - Census API key required for replication - Complete code and documentation available at: [your portfolio URL]
Methodology Notes: [Describe any decisions you made about data processing, county selection, or analytical choices that might affect reproducibility]
Limitations: [Note any limitations in your analysis - sample size issues, geographic scope, temporal factors, etc.]
Submission Checklist
Before submitting your portfolio link on Canvas:
Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/labs/lab_1/your_file_name.html