Twitter Processing Software

Sign in or create your account | Project List | Help

Twitter Processing Software


Twitter was founded in March 2006 and released in July 2006. The website allows users to share short bursts of information in the form of a tweet, limited to 140 characters. Researchers often collect tweets that are of importance or particular interest. Collecting and categorising these tweets manually can be a long and laborious process so a software system to aid this process is required.


Our project group has been tasked with the creation of a piece of software which will assist in the process of collecting, categorising and analysing tweets in an academic research environment. The user will be required to manually categorise and sub-categorise tweets from an initial user-defined sample. Our software will then perform linguistic analysis on these user-defined categories before automatically finding a set number of relevant examples. These examples will be selected by matching the linguistic features and content from the analysed data set with the return data of a query of Twitter. The user then has the option of screening the results the program returns and using this, the software will employ machine learning techniques in order to refine the search parameters. This is an iterative process and can be repeated as many times as the user desires. Once the user is confident in the programs ability to find relevant tweets a bulk search is used to return tens of thousands of tweets, all matching the search criteria. This is the advantage of our software; in previous research environments data sets have been limited by the massive time investment that is required to manually find and sort a data set of tweets.

Once a substantial dataset is acquired through the bulk search, further linguistic analysis can be carried out. The software will be capable of producing tables and graphical representations of the data, showing the linguistic features of each category. This feature of easy comparison should enable researchers to recognise key linguistic patterns.

All of the above classifies as one project. The program supports the creation of multiple projects, all of which can be exported and imported for use in academic research. This feature gives the user the ability to easily share and verify findings with colleagues.


  • Matthew Rzepka - Project Leader
  • Steven Frost - Technical Lead, Site Manager/Repository Master
  • James Goode - Editor
  • Lawrence Reed - UI Designer, Open Day "Producer"
  • Jie Gao - Quality Assurance Lead
  • John Wright-Dodd - Lead Researcher



  • Group Project Site - November 1 2013 4:30 pm
  • Interim reports - December 9 2013 4:30 pm
  • Final reports - April 4 2014 4:30 pm
  • Software - April 4 2014 4:30 pm
  • Open Day Stand - April 9 2014
  • Presentation - April 11 2014
Powered by InDefero,
a CĂ©ondo Ltd initiative.