Animal Finder Tech

From Katrina Help Info

 This Animal Finder article is a stub. You can help by joining Animal Finder Project and expanding it (http://katrina.asiaquake.org/wiki/index.php?title=Animal_Finder_Tech&action=edit).

       Work-In-Progress, for review and comment only


Table of contents

What we are doing

There are over dozens of web sites out there that help people find lost lost pets along the Gulf Coast. The problem is none of the sites talk to one another. We are solving this problem by building automated data interchange systems and scraping data sets. We need your help!

Here are the goals of this project

  1. Implement automated data interchange systems around the AFIF spec
  2. Scrape and merge data from sets that will not implement AFIF
  3. Minimize duplicate records
  4. Make the central database avaliable to be searched

Coordination & Leadership

AFIF spec leaders:

  • Pending

Scraping Effort Leader:

  • Pending

AFIF implementation coordinator:

  • Pending

see PeopleFinderTech#Coordination & Leadership

Discussions

  • PetFinder Discussions:
    • Join the forum/mailing list: KatrinaDev-PetFinder (http://groups.google.com/group/katrinadev-petfinder)
    • If you are going to help scrape pet data sets join the petscraper's mailinglist (Pending)

Master Database

  • Pending

Data interchange spec

See also PeopleFinderTech#Data interchange spec

  • AFIF Specification is pending

How to get involved

Scrape data sets

  1. Sign up on the pet-scrapers mailinglist (Pending)
  2. Choose a set from the list #Sites_that_need_to_be_scraped
  3. Move it under the "Sites that are currently being scraped" heading
  4. Update it's status on PetFinderTechStructuredDataSets
  5. Let people know you are scraping the set on the PetScrapers mailinglist
  6. When you are done scraping, validate the data by
    1. uploading a single record of the data to:
      1. Pending
      2. model on http://www.katrinalist.net/uploadPFIF/
    2. run the set through the PFIF validator:
      1. http://www.w3.org/2001/03/webdata/xsv
  7. Link to your data on the pass word protected wiki (Pending)
  8. Move your wiki listing under the PetFinderTech#Validated section after it is validated
  9. Let the PetScrapers mailinglist know you have succesfully scraped and validated a data set

If you have trouble getting your data to validate feel free to ask AFIF questions on the KatrinaDev-PetFinder mailinglist.

Validate data sets

  1. Choose a set and move it under the PetFinderTech#Being_validated section and add your email address and name below it on the listing
  2. Notify the PetScrapers mailinglist as to which data set you are validating
  3. Get access to the file on the password protected wiki (pending)
  4. Validate the data by
    1. Uploading a single record of the data to:
      1. Pending
      2. Model on: http://www.katrinalist.net/uploadPFIF/
    2. run the set through the AFIF validator: http://www.w3.org/2001/03/webdata/xsv
    3. If you have problem with file size and this interface, there is a *NIX command line utility which has been recommended:
      1. Get xmllint (comes with most unix distros and cygwin - go to http://xmlsoft.org/downloads.html for source, binaries, etc)
      2. Download the XSD file at
        1. Pending
        2. Model on http://zesty.ca/pfif/1.1/pfif-1.1.xsd
      3. Invoke xmllint on your XML file (assume we call it afif.xml):
        xmllint --noout --schema afif-1.1.xsd afif.xml
  5. If the feed is valid move it under the PetFinderTech#Validated section. If it is invalid then move it under the PetFinderTech#Invalid_sets heading and contact the data set scraper and help them fix their set
  6. Notify the PetScrapers of your results

Helping site admins implement AFIF feeds

  1. Choose a site from this list PetFinderTech#Sites_that_need_help_implementing_AFIF_feeds
  2. Contact the site admin and offer assistance
  3. Move the listing under the heading "Sites currently implementing AFIF feeds"
  4. When the site is putting out a validated AFIF feed send a note to the KatrinaDev-PetFinder mailinglist

Also, we have a task list accessible here: Pet Task List

Data Sets

A list of structured data sets and contact information for the owners is up on PeopleFinderTechStructuredDataSets

PFIF Feeds

PFIF/RDF TRANSFORM


Courtesy of Peter Mika pmika at cs.vu.nl (http://prauw.cs.vu.nl:8080/pfif/) Feedback Welcome

Sites that have PFIF feeds

Sites currently implementing PFIF feeds

Sites that need help implementing AFIF feeds

Sites that agreed to implement PFIF but have unknown status

PFIF Implementation Volunteers

If you are avaliable to help site admins implement PFIF please add your name and email address to the list below

  • Tony Chang: tony [at] ponderer.org - email me if you want help implementing PFIF
  • Andy Schmitz: andy.schmitz [at] gmail.com - at school most of the day, but can help in the evening.
  • Gordon E. Amond: Gordon [at] amonds.net - I would be proud to help my american neighbors.
  • Geoff Webb: geofflwebb [at] yahoo.com - I have time in the evenings and weekends.

Scraping

  • Mark sets that have been scraped.
  • Mark sets that have been uploaded to the salesforce.com repository with the date/time of the scrape and the date/time of the upload.
    • Uploads MUST conform to PFIF.
    • Source Name MUST be clear, unique explicit and the same across all records from a single source and include the time OF THE SCRAPE (For example: Scrape-gulfcoastnews-bycilibrar-9/5/2005-10am).

Sites that have been scraped

Imported

Being Uploaded to SalesForce

  • Family messages - 20,000 records
    • PeopleFinderTechStructuredDataSets#Family_Messages (more information)
    • Being uploaded --Aschmitz 21:13, 14 Sep 2005 (EDT). Status is here (http://lardbucket.org/projects/katrina/status_sul.txt)
    • Validation complete <darci.hanning @ gmail.com> (xmllint, XMLSpy and one record uploaded successfully) with the following outstanding questions by Dan <chaney @ dcre-labs.com>:
      • Q1: Zipcodes The first unresolved error involves the zipcode field. It demands an integer (which I suspect will change in PFIF 1.2) so for now, is it appropriate to put in 00000 when the zipcode is unavailable (and strip out +4 zip codes for now?)
      • A1: Yes.
      • Q2: Null date fields Null date fields aren't allowed, nor the unsightly "unknown" so, when given that I have no date field for source or entry dates, is the preferred action o not list the tagset at all?
      • A2: Source date should be the current date(?), entry date should either be provided or an old date(?). I'm not sure about this, if Ping could take a look and give an authoritative answer, that would help.
      • Q3: In general, if I do not have data for a field, should I just not print a tagset for it?
      • A3: It's not clear. I would add it with blank data, otherwise SalesForce may choke on it.


Validated

  • None that aren't being/haven't been uploaded

Invalid sets

  • None

Being validated

  • None

Need to be validated

  • None

Sites that are currently being scraped

Sites that need to be scraped

Sites that can't be scraped

Scraping volunteers

Please sign up on the Katrina Scrapers mailinglist: mailto:katrinascrapers-subscribe@civicspacelabs.org and introduce yourself

Tools


PFIF XML Generators

  • PFIF XML Generation (http://ponderer.org/cvs/index.pl/python/katrina/src/) (Python) - objects that can easily be serialized into PFIF XML.
  • PFIF XML Generation (http://katrina.internet2.edu/~cilibrar/pfifmake.rb) (Ruby) - This is based on single function call with array of Person objects. Based on Josh's script sent to list
  • Perl XML::PFIF module (http://nessie.mcc.ac.uk/~ianb/projects/pfif/) (Perl) - problems to ianb [at] nessie.mcc.ac.uk.
  • PFIF XML exporter (http://www.hurricanerefugee.com/pfif_asp_code/) (ASP) - sample code for generating PFIF from SQL Server <egvandell at hotmail.com>

Scrapers

  • ICRC scraper (http://www.billglover.com/software/katrina/scrape_ICRC) (Perl) - This is deprecated, Brent has a new Java version with fixes for some problems.
  • CNN scraping code (http://www.summertime-software.com/CNNScrape.090705.0526.zip) (Python)
  • Gulf Coast Scraper (http://homepages.cwi.nl/~cilibrar/projects/a/gulfcoast/process.rb) (Ruby)
  • OO PHP Scraping Tool hacked together by Jonathan Lambert (PHP) The main scraper script (http://workhabit.com/framework/scraper.phps), which in this case was used to hack http://www.publicpeoplelocator.com and the http class that does the work (http://workhabit.com/framework/class_http.phps). This should be really to adjust to scrape virtually any sites. Automatically rips tables to arrays, generates header and footer, cleans up, etc... in a couple of lines of code. Does not appear to support notes.

Misc

  • PFIF Uploader (http://lardbucket.org/projects/katrina/split_upload.phps) (PHP) Splits a large XML file into smaller (30 people) chunks and uploads them. Requires libCurl for PHP and write access to the current directory. Edit the second line to refer to your PFIF file. Andy Schmitz <andy.schmitz at gmail.com>
  • FindAPlace Application (Drupal)
  • Missing image
    Http://civicspacelabs.org/home/files/images/FindAPlace.jpeg
    Image:

Animal Finder Directory

  1. Animal Rescue Resources Wiki in use by the public
  2. Animal Finder Project Wiki in development
    1. Animal Finder Assignments ++
    2. Animal Finder Contributors Pages ++
    3. Animal Finder Coordination ++ Provides a high-level view of the status and activities for the various projects under the Animal Finder Project umbrella
    4. Animal Finder Data Entry Tips ++
    5. Animal Finder Funding Plan ++ A plan to explain the value of this project and pursue funding to build out these tools and services.
    6. Animal Finder Future Developments
    7. Animal Finder Future Users Page
    8. Animal Finder Interchange Format or AFIF
    9. Animal Finder Mission ++
    10. Animal Finder Outreach ++ Outreach/marketing and press information about the project
    11. Animal Finder Outreach Lists ++
    12. Animal Finder Privacy Statement ++
    13. Animal Finder Task List ++
    14. Animal Finder Tech ++ automate data interchange between animal shelter databases
    15. Animal Finder Tech Structured Data Sets
    16. Animal Finder Volunteer ++ volunteer effort to input unstructured data by hand
    17. Animal Finder Volunteer Instructions ++
    18. Animal Shelter Finder ++ project gathering shelters and shelter contact info
    19. Animal Shelter Finder Usability Issues ++
    20. Animal Shelter Local Coordination ++
  3. Animal Finder Database Project Application in development
    1. Animal Finder Database ProjectMain Wiki Page
      1. Animal Finder Database Project Charter
    2. Animal Finder Database Project Forums
      1. Technical Forum (http://groups.google.com/group/katrinadev-petfinder)
      2. General Forum (http://groups.google.com/group/DataTrackers-AF) - for community involvement
  4. Animal Finder Templates:
    1. Animal Finder Directory (this template)
    2. Animal Finder Menu ++
    3. AnimalFinder-stub
    4. Animal Finder Project Participation

Notes:

Help us stay online!