All HSE students are taught data analysis. What does this mean?

The Data Culture project at HSE is celebrating its first anniversary! Its key concept is that all students should possess at least basic competencies in data analysis, since data skills are increasingly becoming an entry-level requirement for professionals in almost every field. More than half of all HSE programmes were involved in the project in its first year. In the following academic year, the project is being expanded to cover all programmes and all students. Let’s have a closer look at the project’s profile and try to see the ways students can benefit.

What is data analysis?

Data analysis is a large cross-disciplinary field, somewhere in between mathematics and computer science, which is focused on deriving certain knowledge out of datasets. Such datasets may include bank transactions, large collections of images, social media posts, news stories over the last 10 years, etc. The information derived from this data can also vary, e.g., transaction histories can help us understand which banking product (bank card, credit or insurance plan) most clients are likely to find attractive. At the same time, datasets with news stories can help us pinpoint the most popular topics for discussion over a certain period of time or visualize their popularity timelines. These results can be obtained via standard data processing (e.g., calculations of average values, deviations, correlations and various diagrams), as well as advanced machine learning methods, which can help identify multiple consistent patterns in datasets.

What does the project cover?

Since the early stages of the project’s inception and design, HSE has proceeded on the assumption that in the 21st century, data skills are turning into basic competencies for any person with a quality academic background. For instance, the popularity of the Minor in Data Mining among HSE students clearly speaks in favour of this idea: this year, the application ratio to sign up for this Minor was 5 students per place (it is important to remember that, at HSE, a student cannot sign up for a Minor if it’s in the same field as his/her degree programme profile, i.e. the popularity of courses in data analysis shows that data skills are in high demand among HSE students interested in pursuing programmes in other fields).

In the project’s first year, 26 out of 39 programmes were involved, including programmes in the humanities, economics and international relations.

The set of courses offered under the Data Culture project, in addition to their complexity and duration, depend on the “level” of each specific degree programme. Each programme was assigned a complexity level based on the data skills literacy students are expected to possess once they complete the required pool of courses in Data Culture. There are four complexity levels — elementary, basic, advanced and professional.

Students of the Faculty of Humanities, for instance, are expected to attain an elementary level in data skills, i.e., they are only required to pass one compulsory course “Digital Literacy”, which lasts only one semester. Furthermore, students pursuing programmes in economics are expected to possess a basic level in data science by the time they graduate. Therefore, during their first two years at HSE, such students should take courses in mathematical analysis, linear algebra, probability theory and mathematical statistics, econometrics and introduction to Python programming. If an economics student wishes to add more complex data science courses to their curriculum, they can do so by signing up for Minor, elective courses, Bachelor+ programmes, or by taking an online course.

Programmes offered by the Faculty of Computer Science and MIEM, as well as BAs in Business Informatics and Fundamental and Computational Linguistics, are a special case. These programmes, for the most part, don’t have additional specialized courses in data analysis, since their students are expected to attain an advanced or even professional level in data science and machine learning as per their actual curriculum.

Let’s have a closer look at three cases and three levels of data skills.

Undergraduate programme in International Relations

Undergraduate programme in Economics and Statistics

HSE Faculty of Humanities

Undergraduate programme in International Relations, compulsory course 'Basics of Data Analysis in International Relations', complexity level — basic

Margarita Burova, Lecturer

The Internet and other advanced technologies have given us access to huge amounts of information. Today, more than 60% of research projects in international relations rely on data analysis, and I believe that this share will continue to grow in the future.

Data analysis is essential not only in research, but also in consulting, foreign policy-making, and the development of cross-regional business strategies. Recent research findings suggest that data science can help us come up with more accurate forecasts for important political events such as revolts, violent outbreaks, and mass demonstrations, if we study entries in search engines, posts in social media, or online shopping data. So, if a graduate in international relations is planning to work in international political and/or economic analysis, or make forecasts, they are very likely to need the skills acquired from their data culture courses at HSE.

Upon completion of this course, a student in political science should at least have a general understanding of the key approaches to data analysis and its possible applications. Even if their future job has little to do with actual datasets, graduates should be able to communicate with analysts more effectively, because they will be capable of clearly formulating a given task in a language that analysts would understand. On top of that, they must possess at least basic skills in standard data analysis.

Regina Mustafina, second-year student

To be honest, this course feels different than all other courses, which mostly concern the humanities. Here, we learn to look at information from a completely different perspective, in a more technical way, if I may say so. We now know the basics of Python programming, which, as it turns out, makes things much easier and is very helpful in systemizing information and automatically calculating certain indicators like average or median value. Finding such values manually for a 2000-line table, for instance, is a quite complex and time-consuming process, whereas with Python, it’s fast and easy. Another useful feature of Python is that it allows tracking correlations between indicators, identifying factors that can influence specific parameters and visualizing correlation matrices and regression models.

In addition to Python-based data analysis, we have also learned different ways of visualizing our research results — tables, diagrams, bar charts, box plots, etc. A lot of attention was paid to text analysis, and now we can do a fast and high-quality analysis of speeches by political figures and determine whether their wording underlies aggression, or cooperation and development, etc.

One of our course projects was to study the evolution of US states in terms of their political affiliations. We built a separate diagram for each state showing all tendencies we needed to study. Without the knowledge and skills we acquired during the course, it would have taken us ages just to process the numerous lines and columns of data.

To summarize, in my opinion, this course is extremely useful. Unlike mathematics, which is of course ‘good for your brain’ but still a bit theoretical, data analysis and programming can be easily applied in real-life situations.

What is Python? Python is a programming language, which has become a standard tool for performing data analysis. It’s relatively easy to use, and, therefore, it is normally the first language that entry-level programmers learn. Python is quite popular in data analysis, since it offers user-friendly tools for extracting data from the Internet (web crawling and parsing), processing data (tables and visualization) and machine learning (from simple models to up-to-date neural networks).

Undergraduate programme 'Economics and Statistics', elective course 'Web Data Extraction and Analysis'. Complexity level — advanced

Ekaterina Denike, Lecturer

It is my strong belief that literacy in Python or another programming language, as well as the ability to work with big datasets, is becoming a common prerequisite, just like speaking English, for example. I had two key goals in mind while preparing the course syllabus:

1) providing students with insights into the latest and most ‘fashionable’ trends in data analysis;

2) making sure that all students who signed up for the course, even those who are a bit scared of the very idea of programming and are reluctant to engage with it (this is also what I used to feel), were able to not only dig deeper into solving problems and coding, but also grow to like it.

This is why our classes mostly rely on hands-on assignments and direct interaction. Firstly, we study the main tools used in programming. Then, we cover the instruments used in data analysis and web data extraction. Along the way, we also reflect on the available resources for data analysis and machine learning, as well as consider essential libraries and interesting articles.

Polina Kazinina, second-year student

Before signing up for the course ‘Extraction and Analysis of Web Data’, we were required to study the fundamentals of Python programming on our own. One of the options was to take an online course offered by HSE on Coursera. This course covers a lot of ground, but it’s rather easy to learn. Moreover, it is a good starting point for further studies.

We continued to study programming in more detail when taking the course in web data extraction and analysis. We learned about numerous new functions, which allow for the retrieval and analysis of all kinds of data.

We had to do some quite unusual assignments for this course. For example, we had to analyze prices for co-working space in Moscow. Our task was not only to extract data from co-working websites, but also perform ‘parsing’, i.e. break down all textual information from the database into smaller blocks. Then, we studied how various characteristics of co-working space (i.e., location, total square metres, etc.) correlated with the established price. Another project was to analyze the popularity of an Instagram post, which I found quite amusing. It was quite interesting, not only in terms of the overall idea, but also in terms of technology, as we learned about many new functions associated with retrieving social media data. For example, in addition to extracting all information from a specific profile, we could also analyze the cities/countries and time of day when people were hitting the ‘like’ button more often.

Faculty of Humanities, compulsory course 'Digital Literacy'. Complexity level — elementary

Anastasiya Bonch-Osmolovskaya, Course Author and Lecturer

Entry-level courses do not require any prior knowledge in mathematics or programming. We do realize that our students opted for academic paths other than data science when choosing a degree programme. So, we use a lot of simple and down-to-earth examples, as well as focus on very specific instruments for solving problems that students may encounter in their research projects.

Furthermore, we try to give students an in-depth understanding of the basics — what machine learning is, what are its possible applications and limitations, how to define ‘open data’, etc. We would like them to be capable of using a range of readily-available and powerful instruments (this list of instruments can vary depending on a programme’s profile) for corpus and network analysis, visualization, table data processing (e.g., additional Excel features), etc.

Students in cultural studies and art, for example, get to learn about how images work and how to edit video files, i.e. what features the latest programmes have to offer in terms of sound editing, cutting, overlay, and various other effects. Their hands-on projects include creating their own video files out of existing content. Students will definitely need those skills to edit their field trip materials. Furthermore, they have an opportunity to see what it’s like to be a website developer. For instance, they learn what a website interface is, what its stages of development are, and then analyze the user experience for interfaces of the websites of popular museums.

Students in history have an opportunity to study optical character recognition, which is a part of computer vision and used for processing manuscripts. Philologists, on the other hand, get to focus on corpora of poetry and learn how to compare an author’s style across different periods, etc.

Alisa Uryupina, first-year student

Our course in digital literacy included both lectures and seminars. During the lectures, we were given rather basic information, which was nonetheless very interesting, since it was our first encounter with digital tools. It’s true that historians always have needed to work with tables and big sets of data, but it’s good that today we can rely on various digital tools. You can use them even if you are working with birch-bark manuscripts. For example, if parts of the text in a birch-bark letter are illegible and you can’t figure out the words, you can do a birch-bark glossary search and get a list of possible words.

Anastasia Bonch-Osmolovskaya told us about projects related to Leo Tolstoy (“Living Pages” (Zhivye Stranitsy) and Tolstoy Digital) and the Russian language corpus. Some students, including myself, have known about this corpus since high school, but others only heard about it for the first time while attending the course, which is good, as it’s a very useful instrument. Our lectures inspired me to join Professor Bonch-Osmolovskaya’s project in the digital humanities.

In fact, we didn’t have any classes in actual programming. We were only taught how to use the existing tools and the methods applicable for our field. We got hands-on experience in these methods during the seminars. For example, we learned how to use GitHub and how to go about visualizing data, i.e. create tag clouds and graphs.

I think this course is invaluable, since it helped bring us to a new level in digital literacy. We are not old-school historians anymore: we can go beyond scrutinizing books in libraries, as we can effectively employ the data skills we’ve acquired. Moreover, my belief is that data skills are now a prerequisite in any field.

Within 1-2 months from a specific course’s starting date, the project team asks students to take part in a survey to identify any possible flaws in the syllabus. They complete a feedback form and also have an opportunity to meet with organizers to discuss what they like about the course and what could be improved.

With respect to the course in digital literacy, such surveys have shown that students lack a clear understanding of its key goals and how it can fit with courses in their curriculum. These surveys also uncovered several technical issues. For instance, the software required for doing homework was sometimes not so easy to set up on a home computer. All of these issues have been given due consideration, and now the course syllabus is being upgraded, along with its goals and the list of necessary competencies. In the future, acquired competencies will be brought strictly in line with a student’s main academic field.

In the following academic year, the Data Culture project will be expanded to cover new undergraduate programmes offered by the HSE Faculty of Humanities — Biblical Studies and History of Ancient Israel, Indian Languages and Literature and Language and Literature of Iran.

Mikhail Seleznev, Academic Supervisor of the undergraduate programme 'Biblical Studies and History of Ancient Israel'

My colleagues and I are mostly interested in IT, owing to the role it plays in language processing and language corpora. Since we focus on Biblical Studies, the key points of our agenda include the languages of classical texts, which is also true for many other fields in Classical Asian Studies. Our students, primarily, learn to read and analyze classical texts and prepare commentaries. Here, we can’t do without corpus analysis, especially if we need to work with languages that are no longer actively spoken like Ancient Hebrew and Ancient Greek. It’s much easier with modern languages. If you have a question about the usage of a word or a phrase, you can always contact a native speaker. But what will you do if all the native speakers of a given language died a long time ago? In such cases, the unusual aspects of meaning and usage of a word can be uncovered through a contextual analysis: you may find that a certain word is only used in poetry, whereas another word might be more typical of translated texts...

The importance of IT in humanities goes far beyond linguistics, of course. For instance, both historians and archaeologists rely on computer modelling. Today, we can’t even guess where the paths of humanities and data analysis will cross in 5 years’ time.

What’s next?

According to the head of project, Olga Podolskaya, the team now has plans to work on improving the content of courses on data culture, based on current experience, as well as formulate clear prerequisites for more complex courses. For example, literacy in Python programming and mathematics will be required in order to sign up for certain courses. Another plan is to develop and launch new courses aimed at enhancing students’ data skills.

New learning formats will be added such as project work (a new undergraduate programme ‘HSE and Kyung Hee University Double Degree Programme in Economics and Politics in Asia’ has already incorporated this format in its curriculum) and online sessions instead of in-class lectures, so that students can have more time for seminars and tutorials. Each degree programme will also be assigned a consultant, i.e. a specialist in data analysis in a specific field of study, who can provide students with professional guidance and advice in regards to their research and project work.