M2 Choice – Statistics and Econometrics

M2EcoStat_currentstudent2Current Student – Joseph Agossa

Which aspects of your chosen program were the most challenging ?

Among the courses I have chosen this year, I can say that the most challenging for me were Mathematics of deep learning algorithms. Deep learning knowledge can be described in terms of four distinct aspects:

  • Knowledge of multiple models and multiple viewpoints of the domain.
  • Knowledge about the relations between different models and viewpoints.
  • Knowledge of reasoning procedures to solve quantitative and qualitative problems.
  • Knowledge of first principles and knowledge to reason on their basis in order to solve novel or unfamiliar problems

Deep learning algorithms can be successfully applied to big data for knowledge discovery, knowledge application, and knowledge-based prediction. In other words, deep learning can be a powerful engine for producing actionable results.

Which was your favourite course(s) and why? 

My favourite courses were Survey Sampling and Time series because they are very useful and applicable to real life cases. My favorite part of being in a master in Statistics and Econometrics  was being challenged by professors with interesting problems, especially the real application of Time Series, and survey sampling projects.

What do you plan to do next ?

I will start my internship on April 06, 2020 in the international company IQVIA-France in Paris.

I will work as an Economic Statistician in the Real-World Solutions (RWS) Department of IQVIA France, which brings together a team of 100 multidisciplinary and highly qualified consultants in market access, real-life studies, health economics and epidemiology.  Future plan after graduation will be to find a job as a Data Scientist in Paris or Washington.

TSE Alumni Article – Maguelonne Jarczak, Data Analyst at Airbus

TSE Alumni Maguelonne J

 

What is your position today?

I am working as a Data Analyst at Airbus. I joined in October the Airframe data Analytics (ADA) Team that is in charge of supporting the deployment of data analytics solution for Airframe engineering. Airframe is the mechanical structure of an aircraft.

I am part of a self-organised transnational team of six people. Our mission is to build the transverse referential of data analytics methods and data model for the Airframe community. ADA team delivers transverse activities and projects in these four fields: data exposure, data semantics, data services and data analytics products.

I have for instance projects on composite materials. Structural materials used on Airframe are tested to ensure the right level of performance and the compliance of the raw material with regards to the product specification of the material. I use a data analytics approach to identify any possibility to reduce the level of testing keeping the same level of material quality. I also have a big project to predict gaps and overlapping when we assemble nacelle on an engine.

We are the reference team in data analytics for engineering airframe. It implies a high involvement in the analytics network: animation of the network, sharing best practices, participation to market place, and communication event.

As part of a self-organised team, we are responsible for the organisation team. I have recently been involved in a recruitment process. It was funny to be on the other side!

I recently had the opportunity to become a focal point for eSelf, a community aiming to develop empowerment in Engineering Airframe. The term empowerment refers to measures designed to increase the degree of autonomy and self-determination in people and in team in order to enable them to represent their interests in a responsible and self-determined way, acting on their own authority. My work has great variety!

 

What was your path from you Master’s graduation to this current post?

I was enrolled in Master 2 Econometrics and Statistics and in apprenticeship as a data analyst at Airbus in Quality Procurement. I was looking for a position of data analyst, with a preference for the aeronautic sector. I found this opportunity through the Alumni website and I applied for it. I passed two interviews with HR and two with the team. I was also involved in a recruitment process with Air France for a position in Marketing.

Today I am very happy to be part of this team. I am delighted by the self-organisation of the team. It is great, particularity in a big firm!

 

According to your professional experience what are the most useful skills obtained during your degree ?

The most useful skill is my ability to learn quickly and to adapt myself to different environments. During the Master we have worked on applied and theoretical projects on a large variety of topics (marketing/bank/social network). With a background in Economics I quickly adapted to Engineering environment.

Moreover, I learnt a lot during my apprenticeship. One year of experience in a big firm allows you to be more efficient when you start. I already had very useful skills when I began to work for Airbus.

The Machine learning and programming courses are very useful to work in a firm. I used Python, R, RShiny and Dataiku. RShiny is a very useful tool, you can realise very sophisticated things!

 

What advice would you like to give to TSE or to the school?

Learn by yourself and be curious on different methods and ways of working! The world is changing all the time. I am always learning in the team and we have to be open-minded; everything can be useful for your career.

The involvement of professionals during the master was very useful. Having a strong relationship between the academic and professional worlds is key! For example, in the marketing course taught by P.Bizarri from Avisia, we used Dataiku DSS an analytics platform. It can make the difference in a CV!

One last advice: if you have the opportunity, go abroad!

 

 

Statistics: a libero in sports

Looking back at the 2014 World Cup, the dramatic match between Brazil and Germany immediately comes to mind. Within minutes, Brazil conceded three goals. Some fans might wonder if Germany had had a winning streak or a “hot hand” that brought them an enormous amount of luck. For a long time, the uncontested view among scientists was that a hot hand was nothing but a probabilistic coincidence. Economists and psychologists argued that the idea of a winning streak was due to the human predisposition to detect patterns in randomness.

However, leading scientists from Berkeley have raised doubt on this alleged cognitive bias. Assuming that a coin was thrown a hundred times, they then looked at how many trials it would take until the expected proportion of success actually converges to the probability of success. They repeated the experiment with different numbers of consecutive successes. To put in context, they found that the more consecutive successes occurred, the longer it took the expected probability to converge to the true probability.  This means that a bias indeed exists. However, that bias is not cognitive, but rather a selection bias from the data’s sequential nature.

MIT researchers found that basketball players who have performed well – whether expected to or not – tend to take more difficult shots. Moreover, “hot” players are much more likely to take the team’s next shot and thus are not choosing shots independently. This challenges the common view that shot selection is independent of a player’s own perception of hot-or-coldness. Thus, it might not be a cognitive bias for the audience, but it definitely is for the players. A player who performed well at an earlier stage of the game and exceeded his own expectations is willing to take more risks. Therefore, he shoots from significantly further away, tackles tighter defence, and attempts more challenging shots.

cognitive bias

In the case of the 2014 World Cup, the question of a hot hand is difficult to answer because football provides less data than sports like basketball and baseball due to lower point yield. Additionally, while baseball and basketball are rather democratic sports – meaning everyone theoretically has an equal chance to score – football has a relatively high number of players with different positions and thus different probabilities to score. Whereas statistics in score-based sports like baseball have intrigued the public interest through movies such as Moneyball, featuring Brad Pitt, the quantitative aspects of races such as NASCAR or horse racing have been of greater academic interest.

Horse races are particularly interesting due to their abundance of data. It is a common phenomenon for gamblers to underestimate favourites and overbet longshots in order to receive a higher reward in case the longshot wins. This favourite-longshot bias, however, has been proven culturally diverse. The existence of this bias depends on the average pool size,   meaning the total amount of bets paid. In the western world, horse-betting is more of a pastime with a relatively low betting pool, whereas in Asia, notably in Hong Kong, betting is business to be taken seriously. Analysis by researchers from Berkeley yielded that bias in favour of longshots exists more prominently in Western countries, where bookmakers bet relatively low amounts (an average pool size of $218,000 at the Yonkers race in the United States) as compared to in Asia (an average pool size of $1.1 M at the Happy Valley race in Hong Kong). As the average pool size is much higher in Hong Kong, bookmakers assess their bets more carefully and attempt to predict the outcome of games more accurately.

Another classic application for horse races in statistics are Markov Chain Monte Carlo simulations (MCMC). A Markov chain is a stochastic process in which the future is not dependant of the past but only of the present. MCMC can be thought of as carrying out many experiments, each time altering the variables in a model and observing the response. The goal of MCMC is to draw samples from some probability distribution without needing to know its exact height at any point. MCMC achieves this by “wandering around” on that distribution such that the amount of time spent in each location is proportional to the height of the distribution. For example, one has eight horses and wants to predict which one is going to win. The individual winning probability of each horse is calculated sequentially based on the prior odds assigned to the animal. The probabilities are then cumulated such that horse number eight has value 100%. If the value that is drawn from the random distribution is higher than the first horse’s value, one will move up the line until a horse’s true probability is strictly higher than the drawn value. In a sufficiently large sample, the proportion of assigned values will reflect the true probabilities for each horse. Thus, even if the true probability is unknown, it is still possible to achieve an accurate model.

Horse 1 2 3 4 5 6 7 8
Cumulative probability 13.04% 30.42% 40.85% 48.87% 69.73% 72.62% 79.14% 100%

As horse races usually largely depend on prior knowledge rather than assuming complete independence from the past, the Bayesian approach has enjoyed significant popularity as it takes prior beliefs into account. Different factors, including the days since the last run, the time of the year, or the characteristics of the running ground, impact the probability of winning. A Bayesian statistician not only distinguishes between winning or losing, but also takes the other factors into consideration. Using the example of seasons – spring and autumn – one can distinguish four cases: success or loss of a horse in spring respective of autumn. This allows the bookmaker to make a more informed decision about the horse’s performance based on the season, and he can thus decide which season would be the best to bet on the horse. This model can be extended to the multivariate case, so that the bookmaker can assess all participating horses based on their performance in different seasons. This and other approaches can set the foundation for more elaborate machine learning methods whose discussion would be a horse of a different colour.

Statistics can also be used to predict the winner of the international championships such as the Olympic Games. A team of data miners attempted to identify the factors that determine success during the Winter Games in Sochi 2014. Unsurprisingly, they found that the geography of the participating country is notably important for Winter Games. About 90 percent of countries have never won a single Winter Olympics medal, including Middle Eastern, South American, African, and Caribbean countries. Additionally, the researchers used GDP per capita as an explanatory variable since nations whose people are affluent can afford to spend time in pursuing excellence in sports. Moreover, history has shown that the nation hosting the games often over-performs. Both Italy and Canada over-performed in 2006 and 2010 with five and 14 gold medals, respectively, as the Winter Games were hosted in Torino and Vancouver.  The trend continued during the Winter Games in Sochi 2014 with Russia scoring eleven gold medals. However, South Korea’s performance 2018 remained on an average level.

The reason for the overperformance at home could be twofold. On the one hand, the host may allocate more money to the success of winning in order to increase the prestige when the world’s spotlight is on them. On the other hand, hosting could have the same effect as the home advantage in any other sports such as soccer. Past analysis has also shown that countries with a socialist background generally overperform. One reason is that in a command economy it is easier to direct funding to the training of athletes. Another motive is that in an authoritarian system, the elites appreciate medals as a demonstration of power and thus, push harder in order to achieve their prestige-bringing goals.

The GDP also reveals another characteristic of the winning scheme: high-income countries diversify more in terms of sports, while low-income states usually focus on a few sports as a safe bet for medals, such as Ethiopia in athletics. For the Winter Games, this phenomenon is not as common, as most low-income countries have limited access to practicing the more resource intensive winter sports.

In conclusion, no matter whether ball sports, races, or a combination of both in form of the Olympics, statistics has found its way into the world of sports. In consideration of the World Cup, we should not forget: the passion for the numbers in the field of football should not predominate the passion for the jersey numbers on the football field.

by Jacqueline Seufert

cartoon2202

Photocredit: anderstoon.com