<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:posse="https://posseparty.com/2024/Feed"><title>Data | Jared Knowles</title><link href="https://jaredknowles.com/data/" rel="alternate" type="text/html"/><link href="https://jaredknowles.com/data/feed.xml" rel="self" type="application/atom+xml"/><id>https://jaredknowles.com/data/</id><updated>2026-06-11T18:58:22Z</updated><subtitle>Data analysis, photography, and the occasional thought.</subtitle><author><name>Jared E. Knowles</name><email>jared@fastmail.us</email></author><entry><title>CRDC School Arrest Rates — Bayesian Estimates</title><link href="https://jaredknowles.com/data/crdc-school-arrest-rates/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/data/crdc-school-arrest-rates/</id><published>2026-06-02T00:00:00Z</published><updated>2026-06-02T00:00:00Z</updated><category term="education"/><category term="civil-rights"/><category term="school-discipline"/><category term="bayesian"/><category term="crdc"/><summary>Model-based estimates of school-based arrest rates for U.S. school districts and states, by race and sex, derived from the Civil Rights Data Collection.</summary><content type="html"><![CDATA[<p>School-based arrests are one of the sharpest edges of school discipline data, and
they are also some of the sparsest. Most district-by-race-by-sex cells in the
Civil Rights Data Collection contain very small counts, where a raw rate of &#8220;2
arrests out of 41 students&#8221; is too noisy to compare against a district ten times
its size. This dataset addresses that by replacing raw rates with <strong>Bayesian
hierarchical estimates</strong> that partially pool across districts: small or noisy
cells are pulled toward the broader pattern, and every estimate carries an
explicit credible interval instead of a false-precision point.</p>
<p>The release covers <strong>eight demographic cells</strong> — race ∈ {AM, BL, HI, WH} crossed
with sex ∈ {F, M} — across three CRDC collection years (2015–16, 2017–18, and
2021–22), for U.S. school districts (LEAs) and states.</p>
<h2 id="whats-in-the-release">
  <span class="heading-mark">What&#8217;s in the release</span>
  <a class="heading-anchor" href="#whats-in-the-release" aria-label="Link to this section">#</a>
</h2>
<p>Two artifacts are published on Hugging Face under release
<code>civilytics-crdc-arrests-2025.1</code>:</p>
<ul>
<li><strong><code>summary.duckdb</code></strong> (~260 MB) — a compact DuckDB with the tables that power the
live API: <code>arrest_summary</code> (LEA grain, ~2.27M rows), <code>state_summary</code>,
<code>district_dim</code> (names + geography), and a <code>meta</code> table. This is the file to grab
if you want point estimates and intervals.</li>
<li><strong><code>parquet/</code></strong> — the full raw posterior draws (500 per group), Hive-partitioned
by <code>model_id / YEAR / LEA_STATE</code> across ~1,387 shards. Reach for this if you need
to propagate uncertainty through your own downstream computation.</li>
</ul>
<h2 id="quick-start">
  <span class="heading-mark">Quick start</span>
  <a class="heading-anchor" href="#quick-start" aria-label="Link to this section">#</a>
</h2>
<p>You can query a single slice straight from Hugging Face with DuckDB — no full
download required:</p>
<figure class="code-block" data-lang="sql"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="n">INSTALL</span><span class="w"> </span><span class="n">httpfs</span><span class="p">;</span><span class="w"> </span><span class="k">LOAD</span><span class="w"> </span><span class="n">httpfs</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">-- Raw draws for TX, Black males, default model, 2021-22:
</span></span></span><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">FROM</span><span class="w"> </span><span class="n">read_parquet</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="s1">&#39;hf://datasets/civilytics/crdc-school-arrest-rates/parquet/model_id=nat_m2_mod/YEAR=21-22/LEA_STATE=TX/*.parquet&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">WHERE</span><span class="w"> </span><span class="n">RACE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;BL&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">SEX</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;M&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">LIMIT</span><span class="w"> </span><span class="mi">20</span><span class="p">;</span></span></span></code></pre></div>
</figure>
<p>A <strong>live API</strong> serves the summary estimates at
<a href="https://crdc-api.civilytics.org">crdc-api.civilytics.org</a>
 (OpenAPI/Swagger at
<code>/api/v1/__docs__/</code>), and the full methodology and data dictionary are documented
at <a href="https://pages.civilytics.org/crdc-arrests">pages.civilytics.org/crdc-arrests</a>
.</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Model-based estimates of school-based arrest rates for U.S. school districts and states, by race and sex, derived from the Civil Rights Data Collection.",
  "og_image": "https://jaredknowles.com/og/data/crdc-school-arrest-rates.png",
  "og_title": "CRDC School Arrest Rates — Bayesian Estimates"
}
</posse:post></entry></feed>