Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Govbot icon

Mission

Why don’t we pay attention to our representatives between elections?

Legislative data is hard to parse, track, and organize. Activists, concerned citizens, and the curious may not have the time, resources, or expertise to build out duplicative tech stacks. Existing solutions may be limited by the willingness of organizations and companies to continue to run and host them - such as in the case of Google’s Civic Information API, which was shut down earlier this year. What would a decentralized, open-source legislative data solution look like?

The Govbot team’s goal is to bridge this gap - building the framework for the building and use of federated, open-source, non-profit legislative data. Built as a Chi Hack Night Breakout Group, the project includes an open-source, simplified, and expanded version of OpenStates’ data on state and federal legislation, as well as example applications.

What We Offer

The main Govbot dataset currently includes legislative updates from bills in the U.S. House & Senate, all 50 states, territories like Guam, and the city of Chicago, as .json files organized using the Project Open Data catalog format. The Govbot scrapers update regularly, appending new logs, and then running them through Claude to provide topic-based tagging and summaries. This data can then be analyzed using SQL, via an interface built with DuckDB, or plugged into applications like our example website, WindyCivi, and a test BlueSky bot built in collaboration with U.S. Representative Hoan Huynh. (https://bsky.app/profile/test-hoan-huynh.bsky.social).

How Do I Use It?

1. Install

sh -c "$(curl -fsSL https://raw.githubusercontent.com/chihacknight/govbot/main/actions/govbot/scripts/install-nightly.sh)"

2. Run govbot

govbot

That’s it. If no govbot.yml exists, an interactive wizard walks you through setup:

  1. Sources - Choose all 47 states or pick specific ones
  2. Tags - Start with an example tag, or get an AI prompt you can copy-paste to create your own
  3. Publishing - RSS feeds configured automatically

The wizard creates govbot.yml, .gitignore, and a GitHub Actions workflow.

3. Run the pipeline

Once set up, running govbot again executes the full pipeline:

  1. Clones/updates legislation repositories
  2. Tags bills based on your tag definitions
  3. Generates RSS feeds in the docs/ directory

Other Commands

govbot clone all           # download all state legislation datasets
govbot clone il ca ny      # download specific states
govbot logs                # stream legislative activity as JSON Lines
govbot logs | govbot tag   # process and tag data
govbot build               # generate RSS feeds
govbot load                # load bill metadata into DuckDB database
govbot delete all          # remove all downloaded data
govbot update              # update govbot to latest version
govbot --help              # see all commands and options

Dataset Key:

  • 🆕: the locale’s data received updates since your last cloning
  • ✅: the data you’ve cloned is up-to-date with the most current version
  • 🔄: the data is currently being updated
  • ❌: the data is not currently accessible

Querying in SQL using DuckDB

You can query the data using SQL, via DuckDB, which creates a simiulated database from the .json log files. See DUCKDB.md for more details.

Running Queries in the Command Line

-- Load JSON extension
INSTALL json;
LOAD json;

-- Query all bill metadata
SELECT * 
FROM read_json_auto('~/.govbot/repos/**/bills/*/metadata.json')
LIMIT 10;

Additional Commands, and Querying via the Web UI

Additional examples of commands, and setup for the web UI, can be found below:

# Load all data into a database (default: govbot.duckdb)
govbot load

# Or specify a custom database file
govbot load --database my-bills.duckdb

# With memory limit and thread settings
govbot load --memory-limit 32GB --threads 8

# Open in DuckDB UI (opens in your browser)
duckdb --ui govbot.duckdb

Helper Scripts

# Run example queries
./duckdb-query.sh examples/duckdb-example.sql

Contributing & Testing

Prerequisites

Folks looking to contirbute should have knowledge of Rust: just. just setup to start, and then just govbot ... to develop the cli.

The following should also be installed:

  1. Rust & Cargo: Install the Rust Toolchain
  2. Just: Install the task runner: cargo install just

Development Workflow

Use just govbot ... as your cli “dev” environment.

Other Useful Commands

  • just - See all available tasks
  • just test - Run all tests
  • just review - Review snapshot test changes
  • just mocks [LOCALES...] - Update mock data for testing

We build snapshots off examples. Add examples to make a test.

Advanced

GOVBOT_REPO_URL_TEMPLATE="https://gitsite.com/org/{locale}.git" govbot ...

Project History

The Govbot project began in 2022, with a vision to create a destination for simplified, summarized updates on legislative action, with the ability to follow or filter for certain legislative topics. The result was the initial Windy Civi app, and website, launched in beta in 2024.

While building the solution, the team began to consider the limitations of a centrally-managed data source and platform, versus one that could be decentralized, that was open-source, and that allowed for exploration and use of the data in ways beyond initial designs.

Our vision now has pivoted to building that data set, as well as building sample applications and solutions to ensure that government accountability can be accessible to all.

FAQs

Can I See The Repo?

Yes! Our main repo can be found here. The repo that is being used to run and store the data - the ‘toolkit’ repo - can be found here.

How Is The Data Structured?

You an find the file format structure and .json schema in the readme.md located here.

How Do I Clone This Data?

Each locale is scaped using a GitHub Actions tempate that is defined and explained in detail here. You can follow this template to create a new repository of locale data.

To help manage multiple pipelines or locales, look at our pipeline manager documentation

How Can I Stay Updated, Or Get In Touch?

You can stay updated by following our work at Chi Hack Night, as well as on the related Slack (see below). You can also follow our commits and updates on Github and this Docs page,

You can message us on the Chi Hack Night Slack - we have our own channel.

Govbot icon

Let The People Take Back Their Government

Our 2024 efforts taught us a few things:

  • People don’t want to download another app or use another portal.
  • Getting bill data is really hard, and usually involves using private APIs.
  • Creating AI summaries and topics should be controlled by organizations/individuals.

As such, our effort has become 2-fold

Contributors

2025 Deck – Democratic Infrastructure


marp: true theme: default paginate: true

bg left:40% 80%

Govbot

Federated, open-source legislative data for everyone


Overview

The Problem Our Solultion What We Offer Features Setup + Core Functions


The Problem

Why don’t we pay attention to our representatives between elections?

Legislative data is hard to parse, track, and organize. Activists, concerned citizens, and the curious may not have the time, resources, or expertise to build out duplicative tech stacks.


The Problem (cont.)

Existing solutions may be limited by the willingness of organizations and companies to continue to run and host them - such as in the case of Google’s Civic Information API, which was shut down earlier this year.

What would a decentralized, open-source legislative data solution look like?


Our Solution

The Govbot team’s goal is to bridge this gap - building the framework for federated, open-source, non-profit legislative data.

Built as a Chi Hack Night Breakout Group, this project offers frameworks and tools built on top of OpenStates’ data on state and federal legislation.


What We Offer

The main Govbot dataset currently includes legislative updates from:

  • the U.S. House & Senate
  • Legislatures from all 50 states
  • Legislatures from U.S. territories

Data is organized as .json files using the Project Open Data catalog format, scraped and appended regularly.


Features

  • A decentralized, regularly updating, legislative data catalog
  • AI-powered, topic-based tagging and summaries, customized using .yml
  • SQL querying via DuckDB interface
  • Example applications, like custom websites (see our demo WindyCivi site), and social media bots (see our BlueSky bot, made in collaboration with U.S. Representative Hoan Huynh)

Setup

You can download the setup script via one-line install, from our GitHub repository:

sh -c “$(curl -fsSL https://raw.githubusercontent.com/chihacknight/govbot/main/actions/govbot/scripts/install-nightly.sh)

Core Functions

Once installed, you can:

  • Clone the entire dataset
  • Clone specific items (state, session, or bill)
  • Load metadata into a SQL-accessible DuckDB database

Project History


2022: socratic.center

The Govbot project began in 2022 at socratic.center with a vision to create a destination for simplified, summarized updates on legislative action.
The initial hypothesis: *What if citizens could easily track and understand the bills being voted on?*

civi.social

We built civi.social, exploring how to make legislative information accessible and shareable on social platforms.
This experiment helped us understand how citizens wanted to engage with civic data in their existing communities.

myChicago + Jarvis

We created a prototype for what integration with the myChicago platform would look like.
The goal of Jarvis, our AI-powered assistant, would have been to help users understand legislation through:
- Simplified bill summaries
- Contextual information
- Guided engagement tools

Windy Civi: Full Launch

After reflecting on previous concepts, the Windy Civi app and website launched in beta in 2024.
The goal was to enable citizens to:
- Track bills by topic
- Receive personalized updates
- Connect directly with representatives

Rethinking Our Approach

While building these solutions, we began to ask a critical question:

What are the limitations of a centrally-managed platform?

  • Can it scale to serve all communities?
  • What happens if we stop maintaining it?
  • How can others build on this work?

Our New Vision

Our vision has now pivoted to building the infrastructure itself:

  1. A decentralized legislative data data catalog
  2. Reusable frameworks for communities to build their own tools
  3. Sample applications demonstrating use cases

Our goal: Ensure that government accountability is accessible to all.


Live Demos

Basic Setup + Commands Querying via DuckDB Creating Social Media Bots


Basic Setup + Commands

Install via:

sh -c “$(curl -fsSL https://raw.githubusercontent.com/chihacknight/govbot/main/actions/govbot/scripts/install-nightly.sh)

Once installed, you can download and set up the data using the following commands

govbot # to see help
govbot clone # to show available datasets
govbot clone {{locale}} {{locale}} # download specific items
govbot delete {{locale}} # delete specific items
govbot delete all # delete everything
govbot load # load bill metadata into DuckDB

Querying with DuckDB

First, set up DuckDB, which creates a simulated database from the .json log files:

govbot load #Load all data into a database
govbot load –database my-bills.duckdb #Specify a custom database file
govbot load –memory-limit 32GB –threads 8 #With memory limit and thread settings

duckdb –ui govbot.duckdb #Open in DuckDB UI (opens in browser)

Once the DuckDB database is created, you can query as normal

– Load JSON extension
INSTALL json;
LOAD json;
– Query all bill metadata
SELECT *
FROM read_json_auto(’~/.govbot/repos/**/bills/*/metadata.json’)
LIMIT 10;

Creating Social Media Bots


Technical Details


Our Open Civic Data Proposal


Democratizing government data

  • What does it mean to democratize government data?
  • Today: legislation (with room to expand to courts, agencies, and more)
  • To understand the solution, it helps to first understand the problem

The problem

Legislative data is commonly distributed through APIs or large database dumps. These approaches work well for transactional access, but they introduce real limitations when the goal is long-term analysis and accountability.

They make it harder to:

  • Perform bulk or historical analysis
  • Track changes over time
  • Analyze data without running a database server

They also introduce fragility:

  • APIs change or disappear
  • Long-term access and verification become difficult

API-based access breaks over time


Why this matters

  • Civic trust
  • Research
  • Accountability
  • Anyone can verify, not just institutions
  • A shared source of truth without interpretation baked in

State-based systems vs append-only logs


What we built (and why Git)

  • File-based structure
  • Bills, events, logs
  • Deterministic paths to find things
  • Built on Git for history, distribution, cheap branching, and broad accessibility
  • Aligned with Open States data and Open Civic Data (OCD) identifiers
  • Formalized through an Open Civic Data proposal

This design treats the filesystem as the primary interface for civic data.


Log Query Example


The OCD proposal (why this matters upstream)

  • Makes the model reusable beyond Windy Civi
  • Provides shared vocabulary and structure
  • Enables other projects to adopt or adapt the approach

OCD proposal screenshot


Technical challenges and triumphs

  • Making transformations deterministic so Git diffs remain meaningful
  • Interpreting and triaging state-by-state scraper errors
  • Passing data cleanly between CI steps (artifacts, environment variables, Docker parity)
  • Designing self-contained log entries that remain analyzable outside their folder context
  • Building a “last seen” mechanism when upstream sources return full snapshots
  • Identifying hard limits: PDF redlines and crossouts remain an open problem

A Dive Into Local AI Tagging


Use two models for two very different roles

  • Smart LLM (ChatGPT / Claude / Cursor)
    • Human-in-the-loop
    • Used during development
    • Produces tag configuration
  • Small embedding model
    • Fully automated
    • Used in production
    • Categorizes every update

The smart LLM helps write the rules The small model runs them


Step 1: Tag Authoring (Developer Workflow)

A developer sits down with:

  • Sample legislative updates
  • Court rulings
  • Regulatory notices

Using ChatGPT / Claude / Cursor, they prompt:

“Create a tag config for legislative bill introductions. Include examples, negative examples, and keywords.”

The output is reviewed, edited, and committed like code.


Important Clarification

The “smart” LLM is not part of production.

It is used the same way you’d use:

  • A code editor
  • A linter
  • A schema generator

Think of ChatGPT / Claude / Cursor as a tag authoring tool.


What the Smart LLM Actually Does

The smart LLM is used interactively by a developer to:

  • Define new tags
  • Refine descriptions
  • Generate examples and edge cases
  • Identify negative examples
  • Propose include / exclude keywords

It replaces manual taxonomy writing — not runtime logic.


AI-tagging-example


What’s next for the project

  • Building relationships with activists and journalists
  • Creating + designing customizable tagging templates + a system to share them
  • Incorporating Executive Orders, judicial opinions, and other relevant non-legislative documents
  • Exploring use cases for the data, such as automated content pipelines
  • Add donation data for analysis of legislative priorities and campaign promises

Special thanks to the following contributors:

Sartaj Chowdhury Tamara Dowis Edwin Chalas Cuevas Andrew Dauphinais Emme Kari Douglass Marissa Heffler Sartaj Chowdhury Zach Schoneman Brian Burns


Thank You!

  • Chi Hack Night
  • Open States
  • Open Civic Data community

Building government accountability tools accessible to all

bg right:40% 80%


Appendix

Contributing & Testing FAQs


Contributing & Testing

Prerequisites

Knowledge of Rust and the just task runner required.

  1. Rust & Cargo: Install the Rust Toolchain
  2. Just: Install the task runner: cargo install just

Development Workflow

Use just govbot ... as your CLI “dev” environment.

Useful Commands:

  • just - See all available tasks
  • just test - Run all tests
  • just review - Review snapshot test changes
  • just mocks [LOCALES...] - Update mock data for testing

Dataset Status Key

  • 🆕 The locale’s data received updates since your last cloning
  • ✅ Your data is up-to-date with the most current version
  • 🔄 The data is currently being updated
  • ❌ The data is not currently accessible

FAQs: Repositories

Can I See The Repo?


FAQs: Data Structure

How Is The Data Structured?

Find the file format structure and .json schema in the readme.md: DATA_STRUCTURES.md


FAQs: Cloning Data

How Do I Clone This Data?

Each locale is scraped using a GitHub Actions template explained here: README_TEMPLATE.md

To manage multiple pipelines or locales, see our pipeline manager documentation


Stay Connected

How Can I Stay Updated, Or Get In Touch?

2025 Bill Blockchain

Open Civic Data Blockchain Proposal

This proposal outlines a decentralized, peer-to-peer system for managing and publishing civic data using a blockchain-like append-only log. Built on the Open Civic Data schema and powered by Git, this architecture enables transparency, tamper-resistance, and flexibility in how public information is stored, shared, and consumed. By treating government data as a series of verifiable, timestamped events, we create an ecosystem where organizations and individuals can build custom civic feeds, automate updates, and uncover hidden dynamics in governance—all without relying on centralized servers.

Why Use a Hashed Append-Only Log?

  • 🔐 Truly Peer-to-Peer
    Everyone keeps their own copy of the data—no central server needed, no extra cost.

  • 📜 The Constitution Is Basically a Blockchain
    Government changes through amendments. Our log reflects this: permanent, append-only, and transparent.

  • 💻 Highly Tailored Custom Feeds Built With Code + AI
    Composable event logs will be easy to filter, tag, and summarize. Orgs can compose those feeds too in order to make highly tailored feeds for publishing.

  • 🤖 Publish Everywhere with Bots
    Organizations can automate updates to any number of platforms easily, from Blue Sky Bot Alert posters —think Reddit replies or Bluesky posts—on top of each other. In addition, we can make tooling to have public RSS feeds that can then be imported by news organizations.

  • ⛓️ Blockchain without the Cringe or Cost
    Blockchain hashes + public key signatures let users verify data themselves without expensive proof algorithms. For IDs, Decentralized Identifiers are the new standard, and interop with Bluesky.

  • ☎️ Network Agnostic
    Supports everything: peer-to-peer, pub-sub, polling, WebRTC, email, RSS, push—notifications, etc. They will all work naturally.

  • 📱 Our App Becomes A Glorified P2P Feed Reader With Civic Tendencies
    By being a P2P feed reader with special features around civic data, we simplify the app itself, and allow others to make their own client apps.

  • 🛜 RSS Feeds Just Work
    Feed-based design lets us easily pull in existing sources like Executive Orders or court decisions via RSS, and allows organizations to pull news website feeds.

  • Bonus: Reveal Power Dynamics
    Replay legislative logs to uncover hidden patterns—who votes when, with whom, and under whose influence.

Why Open Civic Data as the Base Schema?

  • 🤝 Plug Into the Civic Tech Ecosystem
    Uses familiar Open Civic Data formats, making it easy to integrate with existing tools and scrapers.

  • 🔄 Reuse Existing Data
    Works with platforms like OpenStates and Councilmatic, giving us access to many data sources.

Why Git for Data Storage?

  • 📁 Folders + Files = Maximum Portability
    The most universal data structure—easy to read, edit, and share across tools and platforms.

  • 🔄 Git Is Already Peer-to-Peer
    Git is built on a distributed log. git pull works seamlessly in our app and AI workflows.

  • 🌐 GitHub = Easy Browsing
    Markdown rendering and file previews make GitHub a friendly UI for exploring without needing to clone. We can also expose RSS feeds via GHPages.

  • 🧩 Submodules Keep Repos Lean
    Git submodules let us split large datasets across repos, so no single repo gets bloated.

Folder Structure + Filename Convention

/open-civic-data-blockchain/
├── country:us/                                 # United States
│   ├── state:il/                               # Illinois state
│   │   ├── sessions/                           # Legislative sessions
│   │   │   ├── ocd-session/country:us/state:il/2023-2024/  # Full OCD session ID
│   │   │   │   ├── bills/                      # Bills in this session
│   │   │   │   │   ├── sb1234/                 # Senate Bill 1234
│   │   │   │   │   │   ├── logs/               # Event logs folder
│   │   │   │   │   │   │   ├── 20240115T123045Z_session_bill_created.json  # Initial bill creation in session
│   │   │   │   │   │   │   ├── 20240115T123045Z_metadata_created.json      # Initial metadata creation
│   │   │   │   │   │   │   ├── 20240117T143022Z_metadata_updated.json      # Metadata update with field mask
│   │   │   │   │   │   │   ├── 20240117T143156Z_sponsor_added.json         # Sponsors added
│   │   │   │   │   │   │   ├── 20240120T092133Z_version_added.json         # Version document added
│   │   │   │   │   │   │   ├── 20240130T152247Z_action_added.json          # Action recorded
│   │   │   │   │   │   │   ├── 20240215T103045Z_doc_added.json             # Supporting document added
│   │   │   │   │   │   │   ├── 20240315T140011Z_vote_initiated.json        # Vote started
│   │   │   │   │   │   │   ├── 20240315T143022Z_vote_updated.json          # Vote partial results
│   │   │   │   │   │   │   └── 20240315T150537Z_vote_finalized.json        # Vote complete
│   │   │   │   │   │   └── files/              # Raw file storage
│   │   │   │   │   │       ├── bill_introduced.pdf      # Original version document
│   │   │   │   │   │       ├── bill_amended.pdf         # Amended version document
│   │   │   │   │   │       └── fiscal_note.pdf          # Supporting document
│   │   │   │   │   ├── hb0789/                 # House Bill 789
│   │   │   │   │   │   ├── logs/               # Event logs folder
│   │   │   │   │   │   │   ├── 20240118T090023Z_session_bill_created.json  # Initial bill creation in session
│   │   │   │   │   │   │   ├── 20240118T090023Z_metadata_created.json      # Initial metadata creation
│   │   │   │   │   │   │   └── ...
│   │   │   │   │   │   └── files/              # Raw file storage
│   │   │   │   │   │       └── ...
│   │   │   │   │   └── ...
│   │   │   │   └── events/                     # Events for this session
│   │   │   │       ├── 2024-04-15-senate-appropriations-hearing.json  # Senate committee hearing
│   │   │   │       ├── 2024-02-22-house-floor-session.json            # House floor session
│   │   │   │       └── ...
│   │   │   ├── ocd-session/country:us/state:il/2021-2022/  # Previous session
│   │   │   │   └── ...
│   │   │   └── ...
│   │   └── events/                            # Events not tied to a specific session
│   │       ├── 2024-07-15-joint-commission-meeting.json  # Joint commission meeting
│   │       ├── 2024-08-20-special-task-force.json        # Special task force meeting
│   │       └── ...
│   ├── state:ca/                               # California state
│   │   └── ...
│   └── state:ny/                               # New York state
│       └── ...
└── country:ca/                                 # Canada
    └── ...

Git Architecture

We plan to auto-generate many git repos.

Session Git Repo

This repo should be a blockchain-like append only log, making syncing data as easy as git pull.

Question: what about the files like PDFS? They feel right to keep in here as a copy, but also, would balloon the size of these. Maybe yet another submodule for session files?

/
├── README.md                  # Session-specific information
├── bills/                     # Bills in this session
│   ├── sb1234/                # Senate Bill 1234
│   │   ├── logs/              # Event logs folder
│   │   │   ├── 20240115T123045Z_session_bill_created.json
│   │   │   ├── 20240115T123045Z_metadata_created.json
│   │   │   ├── 20240117T143022Z_metadata_updated.json
│   │   │   └── ...
│   │   └── files/             # Raw file storage
│   │       ├── bill_introduced.pdf
│   │       ├── bill_amended.pdf
│   │       └── fiscal_note.pdf
│   ├── hb0789/                # House Bill 789
│   │   ├── logs/
│   │   │   └── ...
│   │   └── files/
│   │       └── ...
│   └── ...
└── events/                    # Events for this session
    ├── 2024-04-15-senate-appropriations-hearing.json
    ├── 2024-02-22-house-floor-session.json
    └── ...

Locale Git Repo

Overall locale repo (also generated). Contain links to git submodules that have event logs for different sessions/events. Will also contain scripts to rebuild data into Open Civic Data formats.

ocd-blockchain-illinois/
├── .gitmodules
├── README.md
├── scripts/
│   ├── scrape.py # Shortcut to directly scrape for this locale
|   └── rebuild.py # To rebuild OCD data from blockchain logs
├── sessions/
│   ├── ocd-blockchain-illinois/ocd-session/country:us/state:il/2023-2024/
│   ├── ocd-blockchain-illinois/ocd-session/country:us/state:il/2021-2022/
│   └── ocd-blockchain-illinois/ocd-session/country:us/state:il/2019-2020/
└── events/
   ├── 2022-2026/
   ├── 2018-2022/
   └── 2014-2018/

Main Repo

The primary repo (also generated) that people can clone to get all civic data easily via the submodules.

open-civic-data-blockchain/
├── .gitmodules
├── README.md
├── scripts/
│   ├── update_all.sh
│   ├── integrity_check.py
│   └── generate_cross_jurisdictional_report.py
└── jurisdictions/
    ├── country:us/
    │   ├── state:il/                           # Illinois submodule
    │   ├── state:ca/                           # California submodule
    │   ├── state:ny/                           # New York submodule
    │   ├── district:dc/                        # Washington DC submodule
    │   ├── county:us/state:va/fairfax/         # Fairfax County submodule
    │   └── place:us/state:tx/austin/           # City of Austin submodule
    ├── country:ca/
    │   ├── province:on/                        # Ontario province submodule
    │   └── province:bc/                        # British Columbia submodule
    └── country:uk/
        ├── england/                            # England submodule
        └── scotland/                           # Scotland submodule

TODO List

  • Timestamps: Scrape-Oriented vs. Gov-Oriented
    Are log timestamps the time we scraped the data, or the time of the actual government update?
    What if a specific event doesn’t have a timestamp?
    Open Civic Data also discussed this
  • Unique IDs
    OpenStates uses a lot of generated UUIDs. Ideally, our folder/file structure and naming conventions should follow official legislative data.
    • Jurisdiction ID: Follows OCD naming convention — country:us/state:fl/government
    • Session ID: TODO
    • Bill ID: jurisdiction_id/sessions/:session_id/bill.identifier — use official ID like HB250
    • Vote Event ID: TODO
    • Person ID: TODO
    • Event ID: TODO
  • Bill Folder + Filename Convention
    • bill.metadata: bill_id/log/metadata_update_{TODO}.json
    • bill.actions: bill_id/log/action_{TODO}.json
    • bill.votes: bill_id/log/vote_{TODO}.json
    • bill.sponsors: bill_id/log/sponsor_update_{TODO}.json
    • bill.versions:
      • File: bill_id/files/version_{TODO}.pdf
      • Log: bill_id/log/version_add_{TODO}.json (we can extract PDF content to JSON)
    • bill.documents:
      • File: bill_id/files/documents_{TODO}.pdf
      • Log: bill_id/log/document_add_{TODO}.json (we can extract PDF content to JSON)
  • Event Folder Convention
    Events tied to sessions should live inside the session folder.
    Out-of-session events: can we define a reliable alternate time span for organization?
  • How to Handle Metadata Changes
    Metadata (like bill) may change from scrape to scrape.
    Use fieldMask for lightweight updates, or consider JSON Patch.
    ➤ https://jsonpatch.com
    // bill.metadata_events
    {
      "fieldMask": ["from_organization"],
      "bill": {
        "from_organization": ""
      }
    }
    

Environment Setup

For now, we aren’t doing any coding that touches the previous code. All code/decisions should be in this scraper_next folder as an isolated experiment. If you don’t have git access, message @sartaj.

Easy: Download Data and Explore With SQL Explorers

Advanced: Running Scrapers / Importing PG Dumps

  • Open States
    • via Scraper. We are using this for v1. By running the scrapers directly, data will be much more up to date as it scrapes data directly. It also allow us to run certain scrapers, like USA, multiple times a day.
    • via SQL Dump, which updates every few days, and has bill full text, in addition to a lot of other content like maps data.
  • Chicago SQL Dump. This updates every night and is managed by Datamade, who we have already been collaborating with on Chicago data. They also do stuff like AI summaries that we can pre-pull.

Prior Art

Communications

Bill Bot Designer

Overview

The Bill Bot Designer is a tool for creating automated bots that monitor and publish legislative updates from the Open Civic Data Blockchain. These bots can be configured to watch specific bills, jurisdictions, or events and automatically post updates to various platforms like Bluesky, Twitter, RSS feeds, or custom webhooks.

Why Bots?

  • 🤖 Automated Monitoring: Bots can continuously watch for legislative changes without human intervention
  • 📢 Multi-Platform Publishing: Single bot configuration can publish to multiple platforms simultaneously
  • 🎯 Targeted Alerts: Organizations can create highly specific feeds for their constituents
  • Real-Time Updates: Instant notifications when important legislative events occur
  • 🔄 Consistent Formatting: Standardized message formats across all platforms

Bot Architecture

Event-Driven Design

Bots operate on an event-driven architecture, listening to the append-only log of legislative events:

Legislative Event → Blockchain Log → Bot Filter → Message Generation → Platform Publishing

Bot Components

  1. Event Listener: Monitors the blockchain log for new events
  2. Filter Engine: Applies rules to determine if an event should trigger the bot
  3. Message Generator: Creates platform-specific messages from event data
  4. Publisher: Sends messages to configured platforms
  5. Rate Limiter: Ensures compliance with platform API limits

Configuration Examples

Basic Bill Monitor Bot

name: "Illinois Bill Monitor"
description: "Monitors all Illinois bills for key actions"

# Event filtering
filters:
  - jurisdiction: "country:us/state:il"
  - event_types: ["bill_introduced", "bill_passed", "bill_vetoed"]
  - keywords: ["environment", "education", "healthcare"]

# Message template
message_template: |
  📋 {bill.identifier}: {bill.title}
  🏛️ {action.description}
  📅 {action.date}
  🔗 {bill.url}

# Publishing platforms
platforms:
  - type: "bluesky"
    account: "@legislative-alerts.bsky.social"
    rate_limit: "10/hour"
  
  - type: "rss"
    feed_url: "https://example.com/il-bills.xml"
    update_frequency: "immediate"

Specialized Committee Bot

name: "Senate Appropriations Monitor"
description: "Tracks all bills going through Senate Appropriations"

filters:
  - jurisdiction: "country:us/state:il"
  - committee: "Senate Appropriations"
  - event_types: ["bill_referred", "bill_hearing_scheduled", "bill_vote"]

message_template: |
  💰 Senate Appropriations Update
  📋 {bill.identifier}: {bill.title}
  📊 Fiscal Impact: {bill.fiscal_note.summary}
  📅 Next Action: {next_action.description}
  🗓️ Date: {next_action.date}

platforms:
  - type: "webhook"
    url: "https://api.example.com/appropriations-webhook"
    headers:
      Authorization: "Bearer {webhook_token}"
  
  - type: "email"
    recipients: ["budget@example.org", "finance@example.org"]
    subject: "Senate Appropriations Alert: {bill.identifier}"

Constituent Alert Bot

name: "District 5 Constituent Alerts"
description: "Alerts constituents about bills affecting their district"

filters:
  - jurisdiction: "country:us/state:il"
  - sponsor_district: "5"
  - event_types: ["bill_introduced", "bill_passed", "bill_signed"]

message_template: |
  🏠 District 5 Update
  📋 {bill.identifier}: {bill.title}
  👤 Sponsored by: {sponsor.name}
  📝 Summary: {bill.summary}
  📅 Status: {bill.status}
  🔗 Learn more: {bill.url}

platforms:
  - type: "sms"
    phone_numbers: ["+15551234567", "+15559876543"]
    provider: "twilio"
  
  - type: "slack"
    channel: "#district-5-alerts"
    workspace: "example-org"

Platform Integrations

Bluesky

  • Rate Limit: 10 posts per hour
  • Character Limit: 300 characters
  • Features: Rich text, links, images
  • Authentication: App password required

Twitter/X

  • Rate Limit: 300 tweets per 3 hours
  • Character Limit: 280 characters
  • Features: Text, images, polls
  • Authentication: OAuth 2.0

RSS Feeds

  • Format: RSS 2.0 or Atom
  • Update Frequency: Configurable
  • Features: Full text, categories, enclosures
  • Hosting: GitHub Pages, custom server

Webhooks

  • Method: POST
  • Content-Type: application/json
  • Authentication: Bearer token or API key
  • Retry Logic: Exponential backoff

Email

  • Providers: SMTP, SendGrid, Mailgun
  • Templates: HTML and plain text
  • Attachments: PDF bills, documents
  • Rate Limits: Varies by provider

SMS

  • Providers: Twilio, AWS SNS
  • Character Limit: 160 characters
  • Features: Text only
  • Cost: Per message

Advanced Features

Conditional Logic

filters:
  - jurisdiction: "country:us/state:il"
  - conditions:
      - if: "bill.fiscal_impact > 1000000"
        then: "priority = high"
      - if: "bill.sponsor.party == 'Republican'"
        then: "include_opposition_analysis = true"

Message Templates with Variables

message_template: |
  {#if bill.fiscal_impact > 1000000}💰 HIGH COST BILL {/if}
  📋 {bill.identifier}: {bill.title}
  👤 Sponsor: {bill.sponsors[0].name} ({bill.sponsors[0].party})
  📊 Fiscal Impact: ${bill.fiscal_impact:,.0f}
  📅 {action.date | date_format: "%B %d, %Y"}
  🔗 {bill.url}
  
  {#if bill.summary}
  📝 {bill.summary | truncate: 200}
  {/if}

Scheduled Publishing

publishing:
  schedule:
    - time: "09:00"
      timezone: "America/Chicago"
      days: ["monday", "tuesday", "wednesday", "thursday", "friday"]
    - time: "17:00"
      timezone: "America/Chicago"
      days: ["monday", "tuesday", "wednesday", "thursday", "friday"]
  
  batch_size: 5
  delay_between_posts: "30s"

Analytics and Monitoring

analytics:
  track_engagement: true
  platforms:
    - bluesky
    - twitter
    - webhook
  
  metrics:
    - posts_sent
    - engagement_rate
    - error_rate
    - response_time
  
  alerts:
    - condition: "error_rate > 0.05"
      action: "email_admin"
    - condition: "no_posts_24h"
      action: "slack_alert"

Best Practices

Content Guidelines

  1. Be Accurate: Always verify data before publishing
  2. Stay Neutral: Present information without bias
  3. Include Context: Provide background information when relevant
  4. Use Clear Language: Avoid jargon and technical terms
  5. Include Sources: Always link to official sources

Technical Guidelines

  1. Rate Limiting: Respect platform API limits
  2. Error Handling: Implement retry logic and fallbacks
  3. Monitoring: Track bot performance and errors
  4. Testing: Test configurations before going live
  5. Documentation: Document bot purposes and configurations
  1. Copyright: Respect copyright on bill text and documents
  2. Attribution: Always credit original sources
  3. Disclaimers: Include appropriate disclaimers
  4. Compliance: Follow platform terms of service
  5. Privacy: Don’t collect or store personal information

Getting Started

1. Choose Your Use Case

  • General Monitoring: Track all bills in a jurisdiction
  • Committee Focus: Monitor specific committees
  • Issue-Based: Track bills by topic or keywords
  • Constituent Service: Alert constituents about relevant bills

2. Design Your Filters

  • Jurisdiction: Which government body to monitor
  • Event Types: What actions to track
  • Keywords: Specific topics or terms
  • Sponsors: Bills from specific legislators

3. Create Your Message Template

  • Platform Limits: Consider character limits
  • Required Information: Bill ID, title, action, date
  • Optional Details: Sponsor, summary, fiscal impact
  • Call to Action: Links to learn more or take action

4. Configure Platforms

  • Primary Platform: Choose your main publishing platform
  • Secondary Platforms: Add additional platforms for reach
  • Testing: Test with a small audience first
  • Monitoring: Set up alerts and analytics

5. Deploy and Monitor

  • Gradual Rollout: Start with limited scope
  • Monitor Performance: Track engagement and errors
  • Iterate: Refine based on feedback and data
  • Scale: Expand to additional jurisdictions or topics

Examples in Action

Bluesky Legislative Alerts

The Bluesky LGBTQ+ Legislation Alerts bot demonstrates how effective automated legislative monitoring can be. It:

  • Monitors bills across multiple states
  • Filters for LGBTQ+ related legislation
  • Posts concise, informative updates
  • Builds a community around legislative transparency

Chicago Councilmatic

The Chicago Councilmatic system shows how bots can enhance existing civic data platforms:

  • Integrates with existing Open Civic Data sources
  • Provides real-time updates on city council activities
  • Maintains historical records of all legislative actions
  • Enables custom feeds for different stakeholders

Future Enhancements

AI-Powered Features

  • Smart Summaries: AI-generated bill summaries
  • Impact Analysis: Automated analysis of bill effects
  • Sentiment Analysis: Track public opinion on bills
  • Predictive Modeling: Forecast bill outcomes

Advanced Integrations

  • Calendar Integration: Add events to personal calendars
  • CRM Integration: Track constituent interactions
  • Newsletter Integration: Compile weekly summaries
  • API Access: Allow third-party integrations

Enhanced Analytics

  • Engagement Tracking: Measure bot effectiveness
  • A/B Testing: Test different message formats
  • Audience Insights: Understand who’s following bots
  • Performance Optimization: Improve delivery rates

Windy Civi

A unified portal with notifications for Chicago residents, showing local, state, and federal bills with AI summaries and topics, allowing users to get notifications.

Contributors

Additional Contributors

December 2024 Presentation

The following is our presentation from December 2024

Slides

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Civi Social + MyChicago

Allow residents of Chicago to directly interact with their elected officials.

Contributors

Additional Contributors

History

How Did We Get Here

Civi Social Site

Defined Vision

Socratic.Center

Easily find who represents you.

This era involved making www.socratic.center, a site to easily find your representative. From here, this project merged into Chi Hack Night as a breakout group.

Contributors

History

Find your rep