Data Rescue 2025
  • 🛟Data Rescues 2025
  • 🧐What are Data Rescues
  • 🙏Community Agreements
  • 🗃️Collecting Scope
  • ⭐How To Start
    • 🎙️Track 1 (Communications)
    • 🔍Track 2 (Data Assessment)
    • 🕵️Track 3 (Technical)
  • 🛠️Resources & Tools
    • Tools
    • Readings
    • Model Projects
    • Updates
  • 🙋Stay in Touch
Powered by GitBook
On this page
  1. How To Start

Track 3 (Technical)

PreviousTrack 2 (Data Assessment)NextResources & Tools

Last updated 3 months ago

As of January 24, 2025 we are NOT actively accepting contributions to our internal forms. Please continue reading for information on how to submit records to existing ongoing projects.

This track focuses on the actual capture of at-risk data in a variety of formats. As these tasks require the most technical knowledge, skills, and equipment, volunteers are encouraged to take this track when they are able to dedicate more time.

Tech Skill Level: Advanced

Time Commitment: ~2-3 hours

Tools Required (vary across tasks):

  • Web capture tools (, , , . )

  • Data quality check system (i.e. checksum)

  • Spreadsheet editor (i.e., excel, google sheets)

  • Web monitoring tool

  • Storage (available internal memory, external hard drive)

Tasks Include:

  1. Setup website monitoring systems

  2. Capture website content

  3. Harvesting public datasets

  4. Review data authenticity and quality

  5. Program or conduct comprehensive data/website crawl

Breakdown of Task Sections 🚁 (helicopter emoji) gives summary of task 🗂️ (index dividers) outlines specific steps needed to complete task 🛠️ (hammer & wrench emoji) details skills & tools needed for task

TASKS BREAKDOWN

1. Set up monitoring API tracker to document changes to government websites

🚁Summary: Given the previous removal of content and subtle revision to federal government environmental websites, many websites need to continually crawled to document track changes.

🗂️Workflow

  1. Read or skim the following report of website monitoring by EDGI

  2. Download the a monitoring tool like:

  3. Deploy tracker for selected website

Skills Needed: Advanced understanding of software deployment, APIs, and technical git repositories.

2. Capture web files/data

🚁Summary: The collecting of web archives (meaning webpages and the content with them) can be complex, but necessary. Using more user friendly software, non-digital preservationist can help capture select content of websites without worrying about collecting the entire structure of a website.

🗂️Workflow

  1. Update the "Status" cell that you are working on that row so that others will know that you are working on that web file

  2. Change the status on the same "Status" cell to notify that the web file/data has been archived and to avoid others from doing redundant work

🛠️Skills Needed: Intermediate understanding of software deployment and website navigation.

3. Harvest public datasets available online

🚁Summary: Some state and federal agencies are required by law to publish data, publications, and basic information about publicly funded projects (think grants and contracts) Given changes in agency personnel, system updates, as well as financial support to pay for database services and storage, the data stored in these repositories may not always be available for the public. Saving copies can help ensure future access as well as information on past government activities and areas of interests.

🗂️Workflow

  1. Verify that downloadable datasets contain enough descriptive information (data files, interactive maps, etc.)

  2. Capture dataset(s) to internal storage (temporary place)

  3. Submit and upload the dataset(s) via 1 of these options

    1. FOR Non-UW Affiliates:

  4. You can delete dataset after successful transfer to Data Rescue coordinators

🛠️Skills Needed: Intermediate understanding of different dataset types and file formats. Comfort with downloading and saving larger files.

4. Create checksum for captured files

🚁Summary: This helps short and long term preservation effort to verify the integrity (fixity) of stored files and datasets. Creating checksums or reviewing them helps detect transfer or creation errors or signs of tampering by external forces.

🗂️Workflow

  • Download a fixity or checksum verification tool like

    • There are a number of other tools, the above mentioned are examples (see Digital Preservation Coalition Digital Preservation Handbook above).

  • Ask the "data titan" coordinator to gain access to captured files

  • Run a check on the selected data to create the supplemental checksum value

  • Upload checksum file using (1 )of the following options

    • FOR Non-UW Affiliates:

      • Submit and upload the dataset(s) via 1 of these options

        • Files up to 2 GB https://wetransfer.com/ Send to snguye@uw.edu

        • OR submit the URL of a downloadable folder via the exit tix https://bit.ly/datarescue-bye

🛠️Skills Needed: Best for those who have strong tech skills, attention to detail, and willingness to read the docs.

Report Link:

HTTP API tracker

Comprehensive list of other tools here:

Identify website to track using

Submit information about tracked website to

Identify a web file ready to

Using web capture software (like ) pick an at-risk website that includes at-risk data

Search for publicly funded project repositories (examples include: NIH , US Government Awards , Federal Audit Clearinghouse , and many others)

FOR UW affiliates ONLY: (URL to Google Drive or UW OneDrive)

Files up to 2 GB Send to

OR submit the URL of a downloadable folder via the exit tix

Read through the

: An application for Windows machines that will generate and verify md5 checksums.

: A file hashing application for Windows, a program that generates and verifies BLAKE2, SHA1 and MD5 hashes (aka. "MD5 Sums", or "digital fingerprints") of a file, a folder, or recursively.

Use the to check details to create checksum

FOR UW affiliates ONLY: (URL to Google Drive or UW OneDrive)

⭐
🕵️
Conifer
Archive-It
Webrecorder
wget
More information on web archiving
https://envirodatagov.org/publication/changing-digital-climate/
https://github.com/edgi-govdata-archiving/web-monitoring-db
https://github.com/edgi-govdata-archiving/awesome-website-change-monitoring
this Data Tracking List
the Data Tracking form
ready to be captured
Conifer
RePORTER
USASpending
FAC
NWIS - National Water Information System
https://docs.google.com/forms/d/e/1FAIpQLSfk0pfq4NTxlxAy2cmA3RYVLatn-tMwzv5NljayYvXNv8dp6Q/viewform?usp=sharing
https://wetransfer.com/
snguye@uw.edu
https://bit.ly/datarescue-bye
digital preservation manual chapter on fixity and checksums by the Digital Preservation Coalition
Md5summer
checksum
dataset tracking list
https://docs.google.com/forms/d/e/1FAIpQLSfk0pfq4NTxlxAy2cmA3RYVLatn-tMwzv5NljayYvXNv8dp6Q/viewform?usp=sharing