SOK Update 1
Season Of KDE 1
Hello!!! Last month I started working on “Automating Social Media data collection” as my Season of KDE project, I wanted to share my progress so far. The following work has been done:
A data format has been establised and this is the format of data that is collected every week.
- Twitter data
- Public metrics like retweets, likes, replies and private metrics like engagement and clicks.
- Get all mentions of any twitter account. Since most of the data on twitter is public, I am designing my application in such a way that it can be used for retreiving data from multiple accounts.
- Searching for keywords across tweets, so that we can analyse what people are talking about kde products.
- Instagram and Facebook
- To get data from Instagram and Facebook need analyst or admin access. So we cannot swap account name to get data about other accounts.
- Linkedin also requires analyst role or above to get the data.
- Mastodon
- Public data
- Get stats for different subreddits.
- Search for words across different subreddits.
- Stats for all posts by contributors.
- Other than subreddit traffic (which is available for moderators) most data is public so we can get most of the data.
- Linux package stats
- Getting popularity of different kde apps and packages using popcons data provided by different distros.
- Youtube
- Channel data (subscribers, stc)
- Content data (views, comments, likes, …)
- Basic analytics
- Sentiment analysis. For the comments so we can visualise positive/negative feedback to a post.
TODO
Currently working on “Text classification using machine learning and natural language processing” for better understanding of which type of posts are doing well on social media.