Animal Finder Tech
From Katrina Help Info
This Animal Finder article is a stub. You can help by joining Animal Finder Project and expanding it (http://katrina.asiaquake.org/wiki/index.php?title=Animal_Finder_Tech&action=edit).
Work-In-Progress, for review and comment only
What we are doing
There are over dozens of web sites out there that help people find lost lost pets along the Gulf Coast. The problem is none of the sites talk to one another. We are solving this problem by building automated data interchange systems and scraping data sets. We need your help!
Here are the goals of this project
- Implement automated data interchange systems around the AFIF spec
- Scrape and merge data from sets that will not implement AFIF
- Minimize duplicate records
- Make the central database avaliable to be searched
Coordination & Leadership
AFIF spec leaders:
- Pending
Scraping Effort Leader:
- Pending
AFIF implementation coordinator:
- Pending
see PeopleFinderTech#Coordination & Leadership
Discussions
- PeopleFinder Discussions:
- PeopleFinderTech#Discussions.
- Meetings, and
- Katrina-Dev Mail Archive (http://orwant.com/katrina/)
- PetFinder Discussions:
- Join the forum/mailing list: KatrinaDev-PetFinder (http://groups.google.com/group/katrinadev-petfinder)
- If you are going to help scrape pet data sets join the petscraper's mailinglist (Pending)
Master Database
- Pending
Data interchange spec
See also PeopleFinderTech#Data interchange spec
- AFIF Specification is pending
How to get involved
Scrape data sets
- Sign up on the pet-scrapers mailinglist (Pending)
- Choose a set from the list #Sites_that_need_to_be_scraped
- Move it under the "Sites that are currently being scraped" heading
- Update it's status on PetFinderTechStructuredDataSets
- Let people know you are scraping the set on the PetScrapers mailinglist
- When you are done scraping, validate the data by
- uploading a single record of the data to:
- Pending
- model on http://www.katrinalist.net/uploadPFIF/
- run the set through the PFIF validator:
- uploading a single record of the data to:
- Link to your data on the pass word protected wiki (Pending)
- Move your wiki listing under the PetFinderTech#Validated section after it is validated
- Let the PetScrapers mailinglist know you have succesfully scraped and validated a data set
If you have trouble getting your data to validate feel free to ask AFIF questions on the KatrinaDev-PetFinder mailinglist.
Validate data sets
- Choose a set and move it under the PetFinderTech#Being_validated section and add your email address and name below it on the listing
- Notify the PetScrapers mailinglist as to which data set you are validating
- Get access to the file on the password protected wiki (pending)
- Validate the data by
- Uploading a single record of the data to:
- Pending
- Model on: http://www.katrinalist.net/uploadPFIF/
- run the set through the AFIF validator: http://www.w3.org/2001/03/webdata/xsv
- If you have problem with file size and this interface, there is a *NIX command line utility which has been recommended:
- Get xmllint (comes with most unix distros and cygwin - go to http://xmlsoft.org/downloads.html for source, binaries, etc)
- Download the XSD file at
- Pending
- Model on http://zesty.ca/pfif/1.1/pfif-1.1.xsd
- Invoke xmllint on your XML file (assume we call it afif.xml):
xmllint --noout --schema afif-1.1.xsd afif.xml
- Uploading a single record of the data to:
- If the feed is valid move it under the PetFinderTech#Validated section. If it is invalid then move it under the PetFinderTech#Invalid_sets heading and contact the data set scraper and help them fix their set
- Notify the PetScrapers of your results
Helping site admins implement AFIF feeds
- Choose a site from this list PetFinderTech#Sites_that_need_help_implementing_AFIF_feeds
- Contact the site admin and offer assistance
- Move the listing under the heading "Sites currently implementing AFIF feeds"
- When the site is putting out a validated AFIF feed send a note to the KatrinaDev-PetFinder mailinglist
Also, we have a task list accessible here: Pet Task List
Data Sets
A list of structured data sets and contact information for the owners is up on PeopleFinderTechStructuredDataSets
PFIF Feeds
PFIF/RDF TRANSFORM
Courtesy of Peter Mika pmika at cs.vu.nl (http://prauw.cs.vu.nl:8080/pfif/)
Feedback Welcome
Sites that have PFIF feeds
- http://katrina.earthlink.net/people/list 2,861 records
- PeopleFinderTechStructuredDataSets#Hurricane_Help (more information)
- http://www.hurricanerefugee.com/ 3,500+ records
- I have a realtime PFIF feed available - Please email me at -- content [at] hurricanerefugee.com for access
- PeopleFinderTechStructuredDataSets#Hurricane_Refugees (more information)
Sites currently implementing PFIF feeds
- http://www.houmashelters.com/ 2,800 records
- PeopleFinderTechStructuredDataSets#Houma_Shelters (more information)
- http://www.familymessages.org/index.php 2,110 records
- http://www.theinfozone.net/NOLAmissing2.html/ 1790 records
- http://www.katrinatracker.com/ 3,031 records
- PeopleFinderTechStructuredDataSets#Katrina_Tracker (more information)
Sites that need help implementing AFIF feeds
- http://www.hurricanesurvivors.org/database.html 595 records
- PeopleFinderTechStructuredDataSets#Hurrican_Survivors.org (more information)
Sites that agreed to implement PFIF but have unknown status
- http://www.katrinadataproject.com/index.aspx 33,743 records
- PeopleFinderTechStructuredDataSets#Katrina_Data_Project (more information)
- http://www.katrinafinder.us/ 4,223 records
- PeopleFinderTechStructuredDataSets#Katrina_Finder (more information)
- "Implementing RSS spec" - not sure if that means PFIF
- http://www.gwid.com/katrina.php 1,654 records
- http://www.theinfozone.net/NOLAmissing2.html/ 1265 records
PFIF Implementation Volunteers
If you are avaliable to help site admins implement PFIF please add your name and email address to the list below
- Tony Chang: tony [at] ponderer.org - email me if you want help implementing PFIF
- Andy Schmitz: andy.schmitz [at] gmail.com - at school most of the day, but can help in the evening.
- Gordon E. Amond: Gordon [at] amonds.net - I would be proud to help my american neighbors.
- Geoff Webb: geofflwebb [at] yahoo.com - I have time in the evenings and weekends.
Scraping
- Mark sets that have been scraped.
- Mark sets that have been uploaded to the salesforce.com repository with the date/time of the scrape and the date/time of the upload.
- Uploads MUST conform to PFIF.
- Source Name MUST be clear, unique explicit and the same across all records from a single source and include the time OF THE SCRAPE (For example: Scrape-gulfcoastnews-bycilibrar-9/5/2005-10am).
Sites that have been scraped
Imported
- http://www.msnbc.msn.com/id/9159961/ 143,000+ records
- Data has been scraped, converted to PFIF format and uploaded - Brent (brent [at] bjohnson.net)
- PFIF File URL:
- Uploaded by Andy Schmitz, using data fetched at 17:38, 13 Sep 2005 (EDT).
- PeopleFinderTechStructuredDataSets#MSNBC_.22Looking_for.22_and_.22Safe.22_lists (more information)
- http://www.familylinks.icrc.org/katrina/people 135,222 records
- removed 100% scraped
- Scraped by Brent L Johnson <brent at bjohnson.net>
- PeopleFinderTechStructuredDataSets#Red_Cross_.28ICRC.29 (more information)
- Imported 135,222 records into Katrinalist.net by Steve Fisher 09-11-05 8:17AM PST
- http://wx.gulfcoastnews.com/katrina/status.aspx 42,477 records
- [Removed (Download OLD DATA, see below)]
- Scraped by Rudi Cilibrasi <cilibrar at gmail.com>
- Rescraped by NacreData <devin at nacredata.com> and Wendy <mrscake at gmail.com>
- [Removed DOWNLOAD: (starting data, JavaScript Hack, PFIF XML output, Perl code)]
- PeopleFinderTechStructuredDataSets#Gulf_Coast_News_Survivor_Connector (more information)
- Validated by devin at nacredata . com
- Record One validates successfuly at http://www.katrinalist.net/uploadPFIF/
- Entire file validates with libxml2 and Xerces
- Imported into Katrinalist.net by Jon Plax 09-09-05 8:35PM PST
- 61927 records imported, 0 errors
- http://www.publicpeoplelocator.com/ 37,259 records
- [Removed Download] (Be warned: 3 Megabytes over a DSL line)
- Scraped by Andy Schmitz <andy.schmitz at gmail.com>
- Data should be valid (validated by xmllint, uploaded two test records).
- PeopleFinderTechStructuredDataSets#Public_People_Locator (more information)
- Imported by Andy Schmitz on 09-11-05 at 9:10 PM CDT.
- http://www.katrina-survivor.com/ 9,071 records
- scraped by Zach Berke <zktb at twotacos.com>
- PeopleFinderTechStructuredDataSets#Hurricane_Katrina_Survivor_Registry (more information)
- [Removed Download .tgz]
- Notes:
- - ignored 5499 records that have links to gulfcoastnews.com for details -- assuming they were scraped from there
- - Set validated with xmllint but was too big to send to w3.org. Singlet record did validate at katrinalist.net. Please let me know if there are any PFIF validation errors on the set.
- Imported into Katrinalist.net by Jon Plax 09-09-05 6:35 PM PST
- 95 data validation errors due to blank or whitespace-only first_name field
- http://www.lnha.org/katrina/default.asp 4,500 records (roughly)
- scraped by Zach Berke <zktb at twotacos.com>
- PeopleFinderTechStructuredDataSets#LANH_Katrina_Evacuee_Directory (more information)
- [Removed Download .tgz]
- Imported into Katrinalist.net by Jon Plax 09-09-05 5:52PM PST
- 61 data validation errors due to blank or whitespace-only first_name
- Manually removed non-XML header and footer to allow parsing
- http://connect.castpost.com/fulllist.php 2,871 records
- [Removed Download] (not validated)
- Scraped by Andy Schmitz <andy.schmitz at gmail.com>
- PeopleFinderTechStructuredDataSets#Hurricane_Katrina_Persons_DB (more information)
- [Removed Download] (Validated after manual corrections with XMLSpy by Darci Hanning <darci.hanning at gmail.com>)
- Imported into Katrinalist.net by Jon Plax 09-09-05 6:15 PM PST
- 61 data validation errors, details lost due to user error. Some records on the source site had no first name or no last name.
- http://www.findkatrina.com 2,474 records
- PeopleFinderTechStructuredDataSets#Find_Katrina (more information)
- Scraped by David Dwiggins <david at dwiggins.net> approx. 3 AM CST, 09-11-2005.
- Validated using xmllint and single record upload by Dmdwiggi 06:13, 11 Sep 2005 (EDT)
- Current XML at: Removed
- [PHP scraping script (http://felix.dwiggins.net/katrina/findkatrina_php.txt)], [Perl export script (http://felix.dwiggins.net/katrina/findkatrina_xml_pl.txt)].
- Uploaded by Andy Schmitz on 09-12-05 at 6:00 AM CDT. Must have been uploaded previously though, because everything was reported as already imported.
- Update: This was probably me -- I had attempted to upload it, but the connection timed out while processing. I guess it made it after all. Sorry for the duplicated work. -- Dmdwiggi 13:45, 12 Sep 2005 (EDT)
- Not a problem. I needed to test my PFIF file uploader anyway (see the scrapers/utils list at the bottom for a link to the source). --Aschmitz 20:35, 12 Sep 2005 (EDT)
- http://www.katrinasurvivor.net/find.cfm?PageNum_GetAll=1&sort=name 2,400 records
- Scraped by Leonard Lin <lhl at usc.edu> @ 2005-09-08 05:43 PDT
- [Removed/ DOWNLOAD: Perl code and valid PFIF XML output]
- PeopleFinderTechStructuredDataSets#Katrina_Survivor (more information)
- Validation completed <darci.hanning @ gmail.com>
- First record uploaded without error.
- XMLSpy validation completed (no errors / no corrections).
- Imported into Katrinalist.net by Jon Plax 09-09-05 5:34 PM PST
- 35 data validation errors due to blank or whitespace-only last_name.
- theinfozone.net (http://www.theinfozone.net/NOLAmissing2.html) 1,300 records
- The site owner provided a CSV file (http://www.theinfozone.net/NOLAmissing.csv)
- I converted to PFIF which can be found here: [removed PFIF XML]
- I tried a single record and it validates <tony at ponderer.org>
- Conversion source code (http://ponderer.org/cvs/index.pl/python/katrina/src/)
- Imported into katrinalist.net by Jon Plax 09-09-05 5:08 PM PST
- http://www.cnn.com/SPECIALS/2005/hurricanes/list 1,120 records
- [removed Download] (not validated)
- Scraped by Nick Easler <easlern at gmail.com> 09-07-05 5:35am EST
- PeopleFinderTechStructuredDataSets#CNN_Safe_List (more information)
- Beginning validation process <darci.hanning @ gmail.com>
- Data uploader being fixed with relaxed secondary validation; will retry when re-released.
- Validation successful with both XMLSpy and W3 site after minor fix.
- [removed Download] (Validated)
- Imported into katrinalist.net by Jon Plax 09-09-05 5:02 PM PST
Being Uploaded to SalesForce
- Family messages - 20,000 records
- PeopleFinderTechStructuredDataSets#Family_Messages (more information)
- Being uploaded --Aschmitz 21:13, 14 Sep 2005 (EDT). Status is here (http://lardbucket.org/projects/katrina/status_sul.txt)
- Validation complete <darci.hanning @ gmail.com> (xmllint, XMLSpy and one record uploaded successfully) with the following outstanding questions by Dan <chaney @ dcre-labs.com>:
- Q1: Zipcodes The first unresolved error involves the zipcode field. It demands an integer (which I suspect will change in PFIF 1.2) so for now, is it appropriate to put in 00000 when the zipcode is unavailable (and strip out +4 zip codes for now?)
- A1: Yes.
- Q2: Null date fields Null date fields aren't allowed, nor the unsightly "unknown" so, when given that I have no date field for source or entry dates, is the preferred action o not list the tagset at all?
- A2: Source date should be the current date(?), entry date should either be provided or an old date(?). I'm not sure about this, if Ping could take a look and give an authoritative answer, that would help.
- Q3: In general, if I do not have data for a field, should I just not print a tagset for it?
- A3: It's not clear. I would add it with blank data, otherwise SalesForce may choke on it.
- http://www.wecaretexas.com/ >200,000 records
- PeopleFinderTechStructuredDataSets#wecaretexas (more information)
- http://www.scribedesigns.com/tulane/ 1,933 records
- PeopleFinderTechStructuredDataSets#Tulane_Safe_Registry (more information)
Validated
- None that aren't being/haven't been uploaded
Invalid sets
- None
Being validated
- None
Need to be validated
- None
Sites that are currently being scraped
- http://www.searchformissing.org/ 184 records
- PeopleFinderTechStructuredDataSets#Search_for_Missing_People (more information)
- Currently being scraped by Mmondok 20:48, 8 Sep 2005 (EDT)
- http://www.kare.arkansas.gov/ 23,000+ records
- PeopleFinderTechStructuredDataSets#Operation_Kare (more information)
- Excel spreadsheet being converted to PFIF < darci.hanning @ gmail.com >
- http://www.safe.textamerica.com/ 69 records
- Currently being scraped by --Joe 20:59, 14 Sep 2005 (EDT)
Sites that need to be scraped
- http://co.harrison.ms.us/assistance/missing/ 1132 records
- http://callhome.textamerica.com/ 643 records
- PeopleFinderTechStructuredDataSets#Missing_Katrina (more information)
- http://www.missingkids.com/missingkids/servlet/PageServlet?LanguageCountry=en_US&PageId=2077 333 records (plus a few dozen records on photo pages)
- PeopleFinderTechStructuredDataSets#NCMEC_Hurricane_Katrina_Children (more information)
- http://www.emergency-database.com/guide/ 200(?) records
- PeopleFinderTechStructuredDataSets#emergency-database (more information)
- http://www.survivorregistry.com/cgi-bin/show_all.pl 193 records
- PeopleFinderTechStructuredDataSets#Survivor_Registry (more information)
Sites that can't be scraped
- http://katrina.streetlampsoftware.com/ 456 records asked to take down 9/8/05 and redirect...
- PeopleFinderTechStructuredDataSets#Katrina_Survivor_Database (more information)
- Scraping looked at by Gabe Wachob (gwachob@wachob.com)
- Not very scrapable - there is no field for missing person data, for example -- it appears to get stuck in the freeform notes section
- http://findourfamily.com/ record count unknown
- PeopleFinderTechStructuredDataSets#Find_Our_Family (more information)
- Not scrapable - freeform Invision BB. (log in with user+pass "zlvypjkmku" to view)
Scraping volunteers
Please sign up on the Katrina Scrapers mailinglist: mailto:katrinascrapers-subscribe@civicspacelabs.org and introduce yourself
Tools
PFIF XML Generators
- PFIF XML Generation (http://ponderer.org/cvs/index.pl/python/katrina/src/) (Python) - objects that can easily be serialized into PFIF XML.
- PFIF XML Generation (http://katrina.internet2.edu/~cilibrar/pfifmake.rb) (Ruby) - This is based on single function call with array of Person objects. Based on Josh's script sent to list
- Perl XML::PFIF module (http://nessie.mcc.ac.uk/~ianb/projects/pfif/) (Perl) - problems to ianb [at] nessie.mcc.ac.uk.
- PFIF XML exporter (http://www.hurricanerefugee.com/pfif_asp_code/) (ASP) - sample code for generating PFIF from SQL Server <egvandell at hotmail.com>
Scrapers
- ICRC scraper (http://www.billglover.com/software/katrina/scrape_ICRC) (Perl) - This is deprecated, Brent has a new Java version with fixes for some problems.
- CNN scraping code (http://www.summertime-software.com/CNNScrape.090705.0526.zip) (Python)
- Gulf Coast Scraper (http://homepages.cwi.nl/~cilibrar/projects/a/gulfcoast/process.rb) (Ruby)
- Gulf Coast Scraper (http://nacredata.org/katrina/perl_gulf_format.pl) (Perl)
- connect.castpost.com Scraper (http://lardbucket.org/projects/katrina/scrape_ccc.phps) (PHP) (andy.schmitz [at] gmail.com)
- publicpeoplelocator.com Scraper (http://lardbucket.org/projects/katrina/scrape_ppl.phps) (PHP) (andy.schmitz [at] gmail.com)
- OO PHP Scraping Tool hacked together by Jonathan Lambert (PHP) The main scraper script (http://workhabit.com/framework/scraper.phps), which in this case was used to hack http://www.publicpeoplelocator.com and the http class that does the work (http://workhabit.com/framework/class_http.phps). This should be really to adjust to scrape virtually any sites. Automatically rips tables to arrays, generates header and footer, cleans up, etc... in a couple of lines of code. Does not appear to support notes.
Misc
- PFIF Uploader (http://lardbucket.org/projects/katrina/split_upload.phps) (PHP) Splits a large XML file into smaller (30 people) chunks and uploads them. Requires libCurl for PHP and write access to the current directory. Edit the second line to refer to your PFIF file. Andy Schmitz <andy.schmitz at gmail.com>
- [removed Combined ICRC and Gulf Coast aggregated SQL]
- Makefile for conversion (http://www.katrinahelp.info/~cilibrar/pfifproj/Makefile)
- Ruby script to convert from PFIF to simple SQL (http://www.katrinahelp.info/~cilibrar/pfifproj/conv.rb)
- Database schema (http://www.katrinahelp.info/~cilibrar/pfifproj/create.sql)
- FindAPlace Application (Drupal)
- Missing image
Http://civicspacelabs.org/home/files/images/FindAPlace.jpeg
Image:
Animal Finder Directory
- Animal Rescue Resources Wiki in use by the public
- Animal Finder Project Wiki in development
- Animal Finder Assignments ++
- Animal Finder Contributors Pages ++
- Animal Finder Coordination ++ Provides a high-level view of the status and activities for the various projects under the Animal Finder Project umbrella
- Animal Finder Data Entry Tips ++
- Animal Finder Funding Plan ++ A plan to explain the value of this project and pursue funding to build out these tools and services.
- Animal Finder Future Developments
- Animal Finder Future Users Page
- Animal Finder Interchange Format or AFIF
- Animal Finder Mission ++
- Animal Finder Outreach ++ Outreach/marketing and press information about the project
- Animal Finder Outreach Lists ++
- Animal Finder Privacy Statement ++
- Animal Finder Task List ++
- Animal Finder Tech ++ automate data interchange between animal shelter databases
- Animal Finder Tech Structured Data Sets
- Animal Finder Volunteer ++ volunteer effort to input unstructured data by hand
- Animal Finder Volunteer Instructions ++
- Animal Shelter Finder ++ project gathering shelters and shelter contact info
- Animal Shelter Finder Usability Issues ++
- Animal Shelter Local Coordination ++
- Animal Finder Database Project Application in development
- Animal Finder Database ProjectMain Wiki Page
- Animal Finder Database Project Forums
- Technical Forum (http://groups.google.com/group/katrinadev-petfinder)
- General Forum (http://groups.google.com/group/DataTrackers-AF) - for community involvement
- Animal Finder Templates:
Notes:
- ++ Major Revisions from PeopleFinder to AnimalFinder are pending
- Redirect Info (http://en.wikipedia.org/wiki/Wikipedia:Redirect)
- Stub categories (http://www.katrinahelp.info/wiki/index.php/Category:Stub_categories)
- Source Katrina PeopleFinder Project

