Methods Description published in the online document
Other Explanation
1.Url.avgTagBlur
Function: return average tag_blur for a given url; Parameters: (1) api_key, (2) url Basic method is in the Url Model:
def self.avg_tag_blur(url_string) url=Url.find(:first, :conditions => ["url = ?", url_string]) avg_tag_blur=PostSpamScore.average(:tag_blur, :group => 'url_id', :conditions => ["url_id=?", url.id]) return avg_tag_blur[url.id] end
2. Advanced tag.getUrls
The new method tag.getUrls can return a list of urls which have any/all of the tags that the user provides.
It has the following parameters: (1) api_key (2) tag = a space separated list of tags (3) mode = any | all If "any" the function returns a list of urls that have at least one of the tags specified. This result set SHOULD BE ranked putting on the top the urls with all the tags, then the others. I understand that implementing this functionalities is not trivial ... if you have a simple idea let me know. Otherwise we could just put all the urls together and ordered them randomly. If "all" the function returns the list of urls that have ALL the specified tags. (4) page = you split the entire result set in pages of "limit" elements and send the page requested (5) limit = number of elements returned in a page (i.e., the limit is related to the whole tag set)
....but it is hard to implement. It hasn't been implemented actually.
3.Tag.getTags
mode=top/random(default top), spam=yes/no(default yes), api_key;
4.Tag.getSimilar
tag, spam=yes/no(default yes), limit, api_key (if spam is set as "no", it might be slow...actually it is quite slow, although it works.) (if any of the top 10 users of one tag is "known_good"(both manully or by classifier), the tag is regard as good; otherwise it is spam tag. Same works for urls)
5.Url.getSimilar/Tag.getUrls/...
has filtered out the spam urls right now.
6.Url.generate
api_key, min_tag_num=5, min_user_num=3 (non-spammers by human_label), min_page_num=5(minimum number of its similar pages) The url should be used by at least min_user_num users who are not "known_spammer" by human label. Once the spam classifier gets to work, we can change to user classifier!! The url should have at least min_page_num similar pages.
7.Utl.getTitle
api_key, url In Url model, there is a self.gettitle() function. It will detect whether there is any empty title field for one certain url_id in user_url_datas table. If there is, givealink will send a http request to that website to get the title, and fulfill the empty title fields. If the request fails, return existing title in the databse or "No title".