The media plays an important role in disseminating facts and knowledge to the public at critical times, and the COVID-19 pandemic is a good example of such a period. This research is devoted to performing a comparative analysis of the representation of topics connected with the pandemic in the internet media of Kazakhstan and the Russian Federation. The main goal of the research is to propose a method that would make it possible to analyze the correlation between mass media dynamic indicators and the World Health Organization COVID-19 data. In order to solve the task, three approaches related to the representation of mass media dynamics in numerical form — automatically obtained topics, average sentiment, and dynamic indicators—were proposed and applied according to a manually selected list of search queries.
The results of the analysis indicate similarities and differences in the ways in which the epidemiological situation is reflected in publications in Russia and in Kazakhstan. In particular, the publication activity in both countries correlates with the absolute indicators, such as the daily number of new infections, and the daily number of deaths. However, mass media tend to ignore the positive rate of confirmed cases and the virus reproduction rate.
If we consider strictness of quarantine measures, mass media in Russia show a rather high correlation, while in Kazakhstan, the correlation is much lower. Analysis of search queries revealed that in Kazakhstan the problem of fake news and disinformation is more acute during periods of deterioration of the epidemiological situation, when the level of crime and poverty increase. The novelty of this work is the proposal and implementation of a method that allows the performing of a comparative analysis of objective COVID-19 statistics and several mass media indicators. In addition, it is the first time that such a comparative analysis, between different countries, has been performed on a corpus in a language other than English.
COVID-19 has highlighted the relative inefficiency and low productivity in the health sector, which in turn have contributed to increased social tension and a steady decline in the economic growth in most countries during the pandemic. The healthcare system can be considered as one of the main factors determining the sustainable growth of welfare in many countries including Kazakhstan.
However, healthcare systems in Kazakhstan and throughout the world face multiple problems, which cause an increased demand for health services, high public expectations, and higher expenses. Not only economic but also social and medical efficiency is important in the healthcare system; “medical measures of therapeutic and preventive nature may be economically unprofitable, but medical and social effects require them”. According to the authors of, a fundamental transformation of healthcare systems, based on Artificial Intelligence (AI) technology, is necessary. The economic impact of AI on healthcare in Europe is estimated at 200 billion euros. The effect is associated with savings in time and an increase in the number of lives saved.
One of the technologies related to AI is Natural Language Processing (NLP), which effectively uses machine learning techniques to process natural language texts and speech; it is used in healthcare to extract information from clinical records, to process speech messages, and to create question answering systems. NLP methods can be used not only to address the direct healthcare objectives but also to assess how the mass media (media) reflect the public health situation during the pandemic. Mass media and social networks have a substantial influence on the informational environment of society. Nowadays, the media not only act as a source of information on current events, but often shape the information agenda and form the discourse of socially important topics. The inadequate presentation of health authorities in the media may contribute to the spread of rumors and misinformation, and affect the mental health of the population. Topic modeling in combination with sentiment analysis is often used to evaluate media texts.