UPDB is an open database of unexplained phenomena (UFO/UAP) reports.
Data previously buried away in disparate documents, dead websites, spreadsheets and PDFs can now be viewed together in a unified format.
- • UPDB does not contain any exclusive, private, or classified data
- • UPDB does not vouch for the accuracy or truthfulness of any reports
- • The only criterion for a report to be included in UPDB is that it's been listed in a public UAP/UFO database
- • Enable comparative scientific investigation of unexplained phenomena, by offering open data in a unified format
- • Preserve and honor the decades of research and data collection on this topic
- • Develop tools and techniques for individuals with ongoing repeated experiences
- • Document any potential crimes against the universal human right of individual sovereignty, for current and future generations
UPDB data comes from the following sources. Thank you to these organizations and their staff for helping to build an open and public record of unexplained phenomena.
Scraped and parsed by Publius from nuforc.org.
Scraped and parsed by Publius from mufoncms.com.
Scraped and parsed by Publius from Internet Archive mirror of ufodna.com.
List imported from https://www.nicap.org/NSID/NSID_DBListingbyDate.pdf . Individual case details parsed from nicap.org mirror scraped by Publius.
- • Locations & dates are parsed and converted into a common schema before being added to the database
- • Locations are parsed into: city, district, country, water body, other
- • Dates are always assumed to be in the local time of the report location and are stored without timezone
- • Reports without dates, or with unparseable dates, are not included in the database
- • updb-scrapers CLI scripts to crawl and download reports from sources
- • updb-importers Script for reading downloaded/OCRed data, cleaning/parsing, then inserting into the database
- • updb-app Vue SSR frontend for website
- • postgREST Serves RESTful API to the app from the databases
- • OCR for the document database is a custom ML-backed solution that produces better results than industry-standard Tesseract.
- • Multi-language OCR of over 500,000 document pages took ~6 CPU-months on a 3.2 GHz 8-Core Intel Xeon W (~3 weeks wall time).
- • Report bodies and document OCR results are indexed using postgresql's FTS capabilities, which powers the report & document searches