In September 2020, we are introducing a major upgrade for Botometer. This post explains the changes and motivations behind them.
Introducing Botometer V4
We are replacing the Botometer V3 model that has been running for the past two years with a new V4 model, hoping to provide more accurate and transparent results for our users.
Botometer is a supervised machine learning tool, which means it learns the characteristics of likely automated and human accounts from annotated datasets. As time goes by, bots on Twitter evolve and novel behaviors emerge, which requires us to retrain the model with new annotated datasets periodically. This is the major motivation for this upgrade.
In addition to adding new training datasets, the V4 model also implements a brand new architecture, called Ensemble of Specialized Classifiers (ESC). ESC is inspired by the observation that human accounts share similar behaviors while bots have many different types and each type exhibits distinct characteristics. ESC trains specialized classifiers for various classes of bots and combines their decisions based on their confidence.
In our extensive experiments, ESC not only achieves an exceptional cross-validation accuracy on the training datasets, but also generalizes to unseen accounts better than our previous model. Moreover, ESC can also provide the results from the specialized classifiers, allowing us to better understand the decision making process within the black box. We are excited about the new V4 model. For more information, please refer to our paper Detection of Novel Social Bots by Ensembles of Specialized Classifiers, forthcoming in the proceedings of CIKM 2020.
What’s new on the Botometer website
Although you can use the Botometer website (botometer.org) the same way as before, the transition to V4 brings several changes.
All the bot scores come from the V4 model. In the score details, the old sub-scores based on subsets of features are replaced by bot type scores (see illustration). These scores come from the specialized bot classifiers and estimate how similar an account is to different types of bots. Currently we have:
- Astroturf: manually labeled political bots and accounts involved in follow trains that systematically delete content
- Fake follower: bots purchased to increase follower counts
- Financial : bots that post using cashtags
- Self declared: bots from botwiki.org
- Spammer: accounts labeled as spambots from several datasets
- Other: miscellaneous other bots obtained from manual annotation, user feedback, etc.
The all-features overall score is based on all available features (including linguistic features that assume English content) while the language-independent score excludes the language-related features. Botometer infers the language from the majority of a sample of tweets and selects the appropriate overall score accordingly.
Another important change is that the Complete Automation Probability (CAP) score is now a cumulative probability: it represents the probability that an account with the given score or higher is automated.
What’s new in the Botometer Pro API
Like the webpage, the Botometer Pro API will also change accordingly.
First, the old
/2/check_account endpoint is being deprecated. A new
/4/check_account endpoint with the same input schema and functionality will be added. At the same time, the old RapidAPI plan you subscribed to will also be outdated.
Second, the API response of the
/4/check_account endpoint is restructured to reflect the changes in scores described above for the website. It is not compatible with the old endpoint. For details, please refer to our Botometer Pro documentation on RapidAPI.
What do users need to do?
- Please make sure you re-subscribe to a new RapidAPI plan (details below).
- Change your code accordingly to adapt to the new endpoint and response.
- If you are using our botometer-python package, you need to upgrade the package in your local environment to the latest version; you still need to take care of the new response.
Despite the popularity of Botometer, scalability has been a challenge for many of our users. To mitigate this issue, we are introducing a completely new model called BotometerLite in this upgrade.
BotometerLite only needs the information in the user profile to perform bot detection, allowing for bulk analysis of Twitter accounts. This is useful for high-volume users. Based on a novel training data selection method, BotometerLite achieves high accuracy both in cross-validation and on unseen accounts. For details, please refer to our paper Scalable and Generalizable Social Bot Detection through Data Selection, published in the proceedings of AAAI 2020.
Note that although the scores returned by BotometerLite are correlated with the ones from Botometer, there are differences. This is because the features, algorithms, and training datasets used by the two models are different. Users have to decide for themselves which model is more appropriate for their applications.
The new BotometerLite endpoint
We are adding a new endpoint
/litev1/check_accounts_in_bulk to our Botometer Pro API, making BotometerLite available to the public via RapidAPI.
/4/check_account endpoint, which can only evaluate one account in each request,
/litev1/check_accounts_in_bulk can evaluate up to 100 accounts in each request. BotometerLite works on historical data too.
Please refer to our Botometer Pro documentation on RapidAPI for how to interact with the new endpoint. A new class is also added to our botometer-python package to help our users.
New RapidAPI pricing plans for Botometer Pro
In addition to the upgrades in the models and endpoints, we are also changing the RapidAPI pricing plans. You can go to the pricing plan page for an overview and select the most appropriate one. Here we summarize the features of the new plans:
- Our BASIC plan has a rate limit of 500 Botometer V4 requests per day, per user. This plan is mainly for testing.
- The PRO plan has no subscription fee. It provides a rate limit of 2,000 Botometer V4 requests per day, and you go beyond the rate limit for $1/1k requests.
- The ULTRA plan has a subscription fee of $50 per month. In return, it provides a rate limit of 17,280 Botometer V4 requests per day and a $1/1k requests overage. Moreover, it allows users to access the BotometerLite endpoint with a rate limit of 200 bulk requests per day and a $1/100 bulk requests overage.
A few notes:
- The BotometerLite bulk endpoint is only available through the ULTRA plan.
- Subscribing to the PRO and ULTRA plans requires a valid credit card. For PRO plan users, there will be no charges if they stay within the rate limit.
- Users who already subscribed to one of the plans before the upgrade will need to re-subscribe for the new endpoints to work.
We understand that the changes and charges represent hurdles for some of our API users. However, maintaining the availability of our service has significant costs, including physical, storage, network, and computing infrastructure (e.g., a new server) as well as human resources for software development and maintenance, data annotation, and system administration. We are a non-profit research lab supported by grants and volunteers. The API charges go to Indiana University and represent a very tiny fraction of our costs.
The new pricing plans are also designed to minimize abusive behaviors. From time to time, we have users creating multiple accounts to abuse the BASIC plan, overloading our server and deteriorating the experience of other users. We hope the new pricing plans can help us maintain a high-quality service for all legitimate users.