TODOs and Progress

Spam Progress And Notes

  • Tag blur computation by known-good urls
    • To do this, must:
      1. Identify known good urls from DMOZ [done]
      2. Compute similarities based on these URLs' tags [in progress]
      3. Use these sims to compute tag blur for Givalink posts [code written but not run]
  • Tag spam computation by known-spam urls
    • To do this, must:
      1. Compute Pr(t) for known-spam URLs in Bibsonomy [done]
      2. Compute tag_spam for Givalink posts [in progress]
  • Both of the above must be inserted into Givalink database
    • To do this, must
      1. Get migrations working [Ben]
      2. Create another migration to add tables / columns where appropriate [started]
      3. Modify above scripts to output to db [not started]
  • Some manual labeling of spam in Givealink
    • Identify likely spam candidates by previous two measures
    • Create page in administrator controller which allows manual spam classification