Data Science and AI: Trends Analysis

This study has the primary goal to analyze the growth of data science through the main search trends. This study was conducted by defining in high level the concept of data science as well as its main components. Supported in those elements, we identified the main trends. We used mainly data from google trends to determine the evolution of search by topics., research area, or simple expressions. It allowed us to reckon that artificial intelligence (AI)suffered a lack of interest until 2012. Then it became an increasingly popular field since 2014. This is due to the progression of machine learning and data science. Results show a cumulative search of data science since 2012.


INTRODUCTION
Data science is one of the main topics in computer science.The increasing amount of data available demanded the use of statistics from a new perspective.Data analysis is not only supported by statistic techniques but also by new computing power [1].In this sense, machine learning also increases its importance, leading to the resurgence of widespread interest in artificial intelligence (AI).This lead to several questions: What is the future of data science?What subjects may be related to this one?What is expected for artificial intelligence in the future?Since these questions cannot be answered, we reformulated the following research question: What are the main trends related to Artificial Intelligence and Data Science?
To conduct this study, we analyzed the main search trends of concepts related to data science and artificial intelligence, such as mathematics, computer science and management.Afterwards, it was also relevant to analyze works developed in the areas of machine learning and software development.Specifically, in what concerns software development, the study investigates the context of usage and also the technologies that have been used.
It was possible to observe, on the one hand, the growth of languages related to data science.On the other, the diminishing relevance of languages only used for software development.[2], data science is a set of fundamental principles that support and guide the extraction of information and knowledge from data.As according to Dhar [3, p. 64] data science "focus involving data and, by extension, statistics, or the systematic study of the organisation, properties, and analysis of data and its role in inference, including our confidence in the inference."Data science comprises the intersection of several fields of knowledge: data science = {statistics informatics computing communication sociology management | data domain thinking} [4].Other researchers emphasise the importance of machine learning and its impact on business [1], [3], [4].Granville [5,p. 73] presents arguments for the work of being a data scientist "not statisticians, nor data analysts, nor computer scientists, nor software engineers, nor business analysts.They have some knowledge in each of these areas but also some outside of these areas.",This is due to the usage of several tools and knowledge that is no pure software development, mathematics nor purely statistics.Data science comprises algorithms implementations and use, as well as other robust techniques to entail predictions to be applied in organisational or societal contexts.Chatfield et al. [6] present an overview of the most common attributes of data scientist as presented in Table 1.

II. DEFINING DATA SCIENCE According to Provost & Fawcett
According to Table 1, almost the majority of the authors agree that business domain knowledge is an important attribute [2], [7]- [13] that a data scientist should have.The ability to derive valuable insights, and science computing skills, as well as effective communication skills, are also attributes of the most important in data science [3], [5]- [9], [11], [12], [14], [15].Other attributes such as statistical modelling knowledge, data visualisation, mathematics, data management, artificial intelligence knowledge, machine learning, analytical traits, and being curious are much referred in literature.
As the starting point, we may define as the interception of Basic fields: Computer Science and IT (CS), Domain/Business Knowledge (BK) and Mathematics and Statistics (MS).In fact, it is possible to identify traditional research ={BK MS}, Software Development={CS BK}, Machine Learning = {CS MS} and obviously (but simplifying), Data Science = {BK MS CS} (Figure 1).  1. Data Scientists attributes according to several studies [6] Figure 1.Data Science = {BK∩MS∩CS} Computer science is a field that includes hardware, programming, artificial intelligence databases, networks.The main application of computer science is software development [16], [17].

14th Iberian
Mathematics is a very relevant field of knowledge, supporting commerce, engineering and science.Specifically, statistics is becoming a field of increased usage.From the less computational area like data visualisation and descriptive statistics to the more sophisticated areas of data analysis, the abundance of data allows new possibilities [3], [15], [18].
Machine learning is as a subset of artificial intelligence.It is an approach, where computer systems perform a specific task without using explicit instructions.It relies on patterns and inference.To obtain this result, it consists of the scientific study of algorithms and statistical models.In practice is a connection between computing and statistics [19], [20].Domain knowledge includes a broad context of usage of computing.Its usage is becoming more and broader.If initially was related especially to scientific computing, computing became popular when entered in business, supporting accountancy and statistics.However, computing has been used in more and more fields [7], [8].
Data science comprises several roles centred on data; those roles entail data analysis and analytics capabilities and competencies [4], [21].Cao [21] defines two data analytics Eras: Era 1 is characterised by the use of explicit analytics with descriptive purposes, such as reporting alerting, and forecasting.In the first Era, the primary objective is framework what we know that we do not know (know the unknowns), by processing data with low complexity degree and intelligence, providing at the same time moderate value creation.Era 2 is characterized by the practice of implicit analytics providing predictive and prescriptive data analytics.In Era 2 the primary objective is to extract knowledge for a better understanding of why things happen and how it happens and if they will happen.In this Era the level of visibility is low, and therefore processes and tools provide a higher level of intelligence and supporting smart decisions.AI is powered by the rise of data science, as autonomous and intelligent systems need quality data to train and develop continuously.However, we still have a long way to run in terms of perfection of AI, and people need to understand what systems can and cannot do, and what are the consequences to organisations and society [20].AI poses several challenges to humankind; one of them is that thinking machines often know more than we do, and recognise that it is a great encounter [20].Ethical issues also arise with the AI usage in several areas, in human resources decision process, when an algorithm decides the bank credit rate of a house loan based on gender, race, and even when in autonomous driving vehicles when redundant systems fail and the algorithm decides who is going to be killed [22].Brooks [23] refers seven deadly sins of AI predictions: 1.
Overestimating the consequences of technology in the near future and underestimating in the long run; 2.
Imagining magic, as some arguments may state that technology in future is magic when it is science;

3.
Performance vs competence, usually robots are, today very narrow on what they can do compared to humans; 4.
Suitcase words, people tend to think robots think the way humans do; 5.
Exponentials, people often think that the autonomous learning process is exponential, but exponentials can collapse when a physical limit is reached; 6.
Hollywood scenarios, changes in autonomous and intelligent systems in the real world may not appear as seen in science fiction movies; 7.
The speed of deployment, it is important to notice that transformations may not occur as fast as we expect them to be.
In this context, the goal of the present study is to understand the widespread interest of AI and identify the related dimensions to AI.In the next section, our goal is to answer the following question: What are the main trends related to Artificial Intelligence and Data Science?In order to answer this question, we began by defining in high level the concept of data science as well as its main components.Supported in those elements, we identified the main trends.We used mainly data from google trends to determine the evolution of search by topics, research area, or simple expressions.A score of 0 means there was not enough data for this term.It shows the statistics related to the field study of Mathematics and computer science and the academic discipline of management.As it is present, there is an increase of interest by mathematics.

IV. EVOLUTION OF INTEREST BY DATA SCIENCE
In the last 15 years, Artificial Intelligence had decreased and then an increase in popularity.If we analyse the search by teams in the google, we obtain a curve with a U shape.It reduced prevalence until 2012, and its importance increased mainly since 2014.We may try to find the reason, by identifying the search for other topics, like expert systems, data science out machine learning.There is a reduction of interest in topics like the topic "expert system", but an essential increase in topics like data science and machine learning.It is also interesting to identify that the relative importance also changes geographically.A specific analysis of several technologies allows confirming this idea.JavaScript, stabilised while other languages and technologies are more related to server-side reduced.Analysing only trends search for languages, we verify that Java and C# has a decline in terms of search.R has a stable trend.Javascript shows a U shape., while Python [24] has an increase in terms of search.

VII. CONCLUSIONS
According to the literature review, it was possible to define data science as the intersection of three large fields: computer science and artificial intelligence, mathematics and statistics and domain knowledge.It was possible to identify that this area is growing although its basis areas seem to be decreasing.For example, the interest in computer science is apparently reducing.However, it was also possible to identify that programming languages related to data science are becoming more and more used, has it was already stated in several other papers.Still, while languages and technologies related to software development are being less significant.It is also essential to identify that mobile development is being more critical.It suggests that those devices will become either sensor, obtaining data from users and actuators, communicating with users.Server tends to be mainly repositories and eventually additional processing capacity.
TO MATHEMATICS, COMPUTER SCIENCE AND DOMAIN KNOWLEDGE To analyse the trends identifying the fields mathematics, computer science and domain knowledge.There was a difficulty concerning the domain knowledge.So, as long as management is becoming a transversal concept and probably representing what we call domain knowledge, was considered 2019 14th Iberian Conference on Information Systems and Technologies (CISTI) 19 -22 June 2019, Coimbra, Portugal ISBN: 978-989-98434-9-3 as a possible concept to analyse.The following graph shows the data produced by Google Trends.Numbers represent search interest.Values are relative to the highest point on the chart for worldwide.A value of one hundred is the peak popularity for the term.A value of 25 means that the term is 25% as popular.

Figure 2 .
Figure 2. Management, mathematics and computer science

Figure 3 .
Figure 3. Importance in geographic terms

Figure 6 .
Figure 6.Search trends comparing Python with Java and C#

Figure 7 .
Figure 7. Server-side field vs front end related searches

Figure 8 .
Figure 8. Geographic importance of search on google.

Figure 9 .
Figure 9. Technologies related to software development

Figure 10 .
Figure 10.Technologies related to software development, geographic perspective

Figure 11 .
Figure 11.Technologies related to software development.

Figure 12 .
Figure 12.Searches trends comparing web development and mobile app development Table 2 presents the main key terms in data science, based on Cao definition and classification [21].

Table 2 .
Data Science key terms