Foster Care Trends · Data Analytics
Forecasting Children in 24 hour Foster Care Placements across NYC Community Districts
Using four forecasting models across 59 NYC neighborhoods, this project predicts foster care placements through 2027 — finding the Bronx bears a disproportionate burden while citywide numbers trend toward ~4,900 by 2027.New York City's foster care system serves thousands of children across five boroughs, with placement rates that vary dramatically by neighborhood. Data published annually by the Administration for Children's Services reveals a citywide decline of over 20% between 2020 and 2024 — from 6,686 children to 5,326 — yet this aggregate trend obscures sharp geographic disparities. Districts like BX12 Williamsbridge and QN12 Jamaica consistently account for disproportionate shares of placements, while others have seen reductions of 30% or more over the same period. Understanding where placements are concentrated, and where they are heading, has direct implications for how the city allocates caseworkers, residential facilities, and support services.
This project parses five years of ACS Excel reports into a unified dataset spanning 59 community districts and applies four forecasting models — Linear Trend, Exponential Smoothing, ARIMA, and an Ensemble average — to project placement counts through 2027. Models were validated using a leave-one-out approach, training on 2020–2023 data and evaluating against known 2024 outcomes. Exponential Smoothing performed best with a mean absolute error of approximately 8 children per district. The Ensemble model projects a citywide total of roughly 4,900 placements by 2027, assuming current trends continue without major policy intervention.
Research Question
Sub-questions:
- Which boroughs and community districts carry the highest and most persistent placement burdens?
- Do placement trends differ meaningfully across boroughs, or is the citywide decline uniform?
- Which forecasting model best captures year-over-year change given limited annual data points?
Materials/Datasets
The data was sourced from the NYC Administration for Children's Services (ACS), which publishes annual point-in-time snapshots of children in 24-hour foster care placements. Each report captures the number of children placed as of December 31 of the given year, broken down by the community district of the placement location.
|
NYC Administration for Children's Services (ACS) |
|
Children in 24-Hour Foster Care by Borough/CD of Foster Care Placement |
|
2020, 2021, 2022, 2023, 2024 |
|
Annual Excel (.xlsx) snapshots — one file per year |
|
NYC Community District (59 districts across 5 boroughs) |
|
Count of children in 24-hour placement as of December 31 |
|
Children in Close to Home placements; children with unknown CD |
|
295 rows — 59 CDs × 5 years |
|
ACS / DPPM / ORA — Data Source: CCRS |
Methods
Data Parsing & CleaningEach Excel file contained non-standard formatting: borough names were interspersed as unlabeled rows, district names included alphanumeric CD codes (e.g. BX01, QN12), totals and footnotes were embedded inline, and column headers varied slightly across years. A Python parser using pandas and regex was written to:
- Detect borough-level section headers and carry them forward as a grouping key
- Extract CD codes and neighborhood names using a consistent regex pattern
- Skip subtotal rows, unknown-CD rows, and footnote lines
- Concatenate all five years into a single tidy long-format DataFrame (year, borough, cd_code, cd_name, count)
Exploratory Analysis
Before forecasting, the data was examined at multiple levels of aggregation: citywide totals, borough totals, and individual community district trends. A consistent downward trend was observed citywide from 2020–2023, with a near-plateau between 2023 and 2024 (4,994 to 5,005 — a marginal increase of 11 children). This leveling off informs model selection, as pure linear extrapolation may overstate future declines.
Model Selection Rationale
With only five annual observations per district, model selection prioritized parsimony and interpretability. Four models were implemented:Model |
MAE |
MAPE |
Notes |
|
Exp Smoothing |
8.4 |
— |
Adapts to recent levels; robust with flat/leveling trends |
|
ARIMA (1,1,0) |
9.4 |
— |
Differencing handles non-stationarity; limited by 5 obs |
|
Ensemble (avg) |
9.6 |
— |
Average of all three; most conservative projection |
|
Linear Trend |
13.7 |
— |
OLS extrapolation; overprojects decline after 2023 plateau |
Validation — Leave-One-Out
Given the small sample size, a leave-one-out approach was used: models were trained on 2020–2023 data for each of the 59 community districts and then evaluated against the known 2024 values. This produced 59 independent test cases per model. Exponential Smoothing achieved the lowest mean absolute error of 8.4 children per district — meaning its 2024 predictions were off by an average of roughly 8 children, against a citywide mean of ~89 children per district.
Forecasting Horizon
All four models were then retrained on the full 2020–2024 dataset and used to generate projections for 2025, 2026, and 2027. The Ensemble forecast (average of all three models) is treated as the primary projection. A ±12% uncertainty band is applied to the Ensemble line in the community district spotlight charts, reflecting the typical spread observed during validation.Methodology
Technology Stack
- Python 3 — core scripting language
- pandas — data loading, parsing, and long-format reshaping
- NumPy — array operations and manual feature construction
- statsmodels — ExponentialSmoothing and ARIMA model fitting
- scikit-learn — LinearRegression, StandardScaler, MAE metric
- matplotlib — all visualizations (dark-theme dashboard + individual PNGs)
Forecasting Pipeline
- Load & parse: regex-based extraction from raw Excel into tidy DataFrame
- Validate: leave-one-out test across all 56 CDs, evaluate MAE per model
- Retrain: fit all models on full 2020–2024 history
- Forecast: project 2025–2027 per district and citywide
- Export: per-district CSV and PNG dashboard with 5 individual charts
Limitations
- Five annual data points per district is a small sample — forecasts should be treated as directional projections, not precise predictions
- No external covariates (poverty rates, housing instability, policy changes) are incorporated
- The 2023–2024 plateau is not explained by the models — it may reflect policy interventions, demographic shifts, or reporting changes
- Placement location ≠ child's home district; the data does not track individual children across years
- Future policy changes — such as expanded preventive services or immigration enforcement effects — could significantly alter trends
Github
Key Findings
- NYC citywide placements fell 20% from 2020 to 2024 (6,686 → 5,326), with a near-plateau between 2023 and 2024
- The Bronx consistently accounts for the largest share of placements; Manhattan the smallest
- BX12 Williamsbridge/Baychester (Bronx) and QN12 Jamaica/Hollis (Queens) are the highest-volume districts in every year
- The Ensemble model projects approximately 4,934 citywide placements by 2027 if current trends hold
- Geographic concentration is persistent — the top 10 districts account for a majority of all placements across every year
- Exponential Smoothing outperformed Linear Trend, ARIMA, and Ensemble in leave-one-out validation (MAE: 8.4 children/district)
Preview of future development
A natural extension of this work is mapping forecast outputs directly onto NYC's community district boundaries. An interactive choropleth map would make geographic concentration immediately legible revealing not just which districts carry the highest burden, but whether high-placement districts cluster spatially or operate as isolated hotspots. This opens the door to neighborhood-effects analysis: do adjacent districts exhibit correlated trends, suggesting shared structural drivers like concentrated poverty or housing instability? Integrating auxiliary datasets DYCD program locations, ACS preventive service coverage, or census socioeconomic indicators onto the same map layer could help explain why some districts have declined sharply while neighboring ones have not.
References
Primary Data Source
NYC Administration for Children's Services (ACS). Children in 24-Hour Foster Care by Borough/CD of Foster Care Placement. Annual point-in-time reports: 2020, 2021, 2022, 2023, 2024. Prepared by ACS/DPPM/ORA. Data Source: CCRS. Available via NYC Open Data / ACS public reports.
Forecasting Methods
- Hyndman, R.J. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts. Available at: otexts.com/fpp3 — covers Exponential Smoothing and ARIMA methodology
- Box, G.E.P., Jenkins, G.M., Reinsel, G.C., & Ljung, G.M. (2015). Time Series Analysis: Forecasting and Control(5th ed.). Wiley. — foundational ARIMA reference
NYC Open Data. (2025). Community asset location datasets. https://opendata.cityofnewyork.us
U.S. Census Bureau. (2025). American Community Survey 5-year estimates. https://www.census.gov/programs-surveys/acs
U.S. Department of Health and Human Services. (2025). Adoption and Foster Care Analysis and Reporting System (AFCARS) 2025 updates. https://www.acf.hhs.gov
Grogan-Kaylor, A., et al. (2025). Multilevel thinking: Discovering variation, universals, and particulars in cross-cultural research.