Disease Outbreak Detection and Tracking Using Social Media with Machine Learning (ML) and Artificial Intelligence (AI){Abstract}

Disease Outbreak Detection and Tracking Using Social Media with Machine Learning (ML) and Artificial Intelligence (AI){Abstract}

The present invention is related to the use of Machine Learning (ML) and Artificial Intelligence (AI) to identify and track Disease Outbreaks for all states in the USA using Social Media Platforms. The invention would help out common people and governments/healthcare institutions and organizations in on-time detection and prevention of Outbreaks in a highly optimized way. An extensive research has been made to identify major problems and algorithms for detecting and tracking disease outbreaks by researchers all over the world. This helps in performing analytics on Outbreaks from the Twitter platform. The major points to consider in outbreak detection are following
• Removal of Re-Tweets
• Cleansing False information
• Classifying Tweets as “Related or not Related”
• Identifying an Outbreak based on collected information

The first Component, Data collector is designed to be In-Charge of collecting data from external sources like Twitter. It also performs Text Pre-Processing (removing stop words, stemming). Once the refined data is extracted, the component will pass the authority to its Sub-Component for extracting useful information.
The second component is Data Processor. It is designed to process data after an interval of a week and extract meaningful Tweets related to Outbreak.
The third and the most important component of our invention is the Predictor. It maintains the average percentage of Tweets for the disease at every state, ignoring the Outliers.
We have deduced that for each week, the percentage remains constant for a specific state. This component is also responsible for updating the average by calculating the number of week passed. The Outliers are neglected during the whole process. After every week, the intensity of disease activity is assigned to each state depending on percentage of related Tweets for a specific disease in that state.
The Standard Deviation for each disease, classified according to states, is also maintained in the same method like the Average. In case, if the percentage of related Tweets for a specific disease for a specific state is Three Standard Deviations above the average, then a High intensity is assigned to it. In the similar way, if it jumps up to Two Standard Deviations above the Average, then Moderate intensity is assigned. In case of One Standard Deviation the intensity assigned is Low. A Minimal Intensity is assigned if its less than equal to Average.*


Leave a Reply

Your email address will not be published. Required fields are marked *