I mainly worked on several projects in Fall 2016. I felt myself very passive on not taking any classes, however, overall I had a great learning opportunity and I believe it was good decision for me.
This project-based-class was such a mess and difficult to tell what I learned from it even though I felt it was overwhelming and gained a significant skill sets. Thus, here I'm going to organize it for better understanding and handling my experience for future use. First, I'll briefly mention the background of this project, following by main learnings, and end up with future learning.
Note that this project is under NDA.
Background
We are provided series of dataset, several business questions, and 4 months of time. Our team includes the (super smart and curious) professor and seven of us (let's say rattatas). The goal is to answer the questions in limited time, and to submit the products as a presentation.
Learning
- Analytical skill
Nothing comes more than this!!
It starts from deep understanding of datasets. Understanding the data generating process, variables, and observing what the descriptive statistics tell us. These processes always require back and forth, since each time you find something quirky you need to go back to the very beginning, data generation. Other key words may be truncation and censoring.
Then model specification comes next. To specify the model, you need finely defined questions and deep understanding of the models. Read papers and do thought experiments for the outcome. Also, pay close attention to heterogeneity, exogenouity, and endogenouity.
Your coding skill will be shown off the most at data cleaning step. HELL YEAH.
Once you went through that amazing hurting-eyes-and-fingers process, you can run the model and examine the results. It is always important to ask you back why you get the results you got especially when you got unexpected numbers. This might send you back to very beginning of analysis too.
Discussing the results with teammates is another hell, but the most interesting part for me. You will realize the bias of yourself on the numbers. Here the most important things would be Sign, Size, and Significance. Don't get attached with your dearest results, be objective, and find the story that data tells you.
Specifically we covered inference and application of difference-in-difference model, propensity score matching, gradian boosting method, and logistic model. And a bit more in detail, marginal effect of logistic model (with quadratic and cubic terms, and boots strap and delta methods though not applied for this time).
Coding are mainly done by SAS, sometimes R. Now I'm catching up with Stata for another project evaluation.
- Presentation skill
Showing statistical results in the business situation requires different skill than idea based business presentation or academic presentation. Also creating slides with seven people often results in lack of coherence and improper weighing of contents.
Story - The order does not follow the academic way, which follows the exact step of our analysis. Rather, the answer and insights should come first and the modeling will come later. Scientific articles often write in; What is the main findings? Why this is important? then How this was found?
You have to capture the mind of audience so that to make your presentation worth listening. You may hide a problem that you could not solve until the end of presentation to avoid people get stuck to the problem and stop listening to you before showing your illuminating findings.
Also the final goal of presentation is to motivate the organization/person towards making a firm 'decision'. Nothing vague is worth listening. The unit of measure is 'decision'. How many decisions you can lead them by this presentation will define the success of your time.
Design - Even you use the complicated statistical method, it is important to show it in a simple mechanism using intuitive graph and chart preferably without equations and sentences. You need to be creative for this work and high sense in coloring will help. The most difficult thing is the trade-off between 'understandable simpleness' and 'statistical detail'. It should be carefully discussed with your teammate. You may add some explanation in footnote or verbal description.
Script - Again, strong skill in explaining jargon in general terms is required.
One thing I learned from this experience is, once you capture the audiences' mind the presentation will be open discussion with rhythmic conversation between audience and presenter (for a while). Probably the audience is experiencing aha-moment in those situations, and the more you provide information along the way they are getting them, the more they will focus on every single words coming from your mouth.
Another thing I learned is that close attention to the choice of words brings significant impact on the audience. You should know their language, what will hit their mind, in other words, what is the bazz-word for them.
- Time management skill
Being curious is much better than being indifferet, however, sometimes too much curiosity steals considerable amount of time and energy. (Actually the professor was the largest curiosity-oriented-monster, hitting my mind 'I can't be a professor!!!')
The only thing I can tell for time management, no matter how curious you are, whatever interesting modeling you want to try, whatever results you expect from that modeling, deadline is deadline. Stop coding there and go summarizing it. If the manager of research team is one of the analysts, it would be painfully tough to stop in front of the bunch of seeds of excitement.
Future learning
- Study of probability and statistics
I did not have enough basic in theory in order to understand and apply further modelings. It's the biggest regret in my life. (If we could have mined the distribution of one variable and could have categorized them in that way instead of using simple statistics for proxy...)
- Readings on specification
Learning modeling will be never ending fundamental elements of analysis since every moment new and refined model is found by prominent researchers. It was great that I could learn propensity score analysis in this project while learning heckman's model and the code for treatment effect in Stata for my research.
Machine learning is such a popular word these days, however, Econometrics modeling is as same as, or more important than ML since ML is mainly for prediction but 'Econometrics explains and predicts', which I ought to be able to answer whenever whoever wakes me up during deep sleep!!
- Learning other language for coding
SAS and Stata are getting old products. Even I stated ML as such, we need it to deal with large size data having larger existence than a decade, or just a few years ago. This relates with study on stats too.
Conclusion
Coolest thing my professor kept telling was 'There is only 8 of us who know these facts' as the research went further. It kept my motivation and pride on my skill to undergo the tough training while always part of my mind whispered me there are much more smarter students in computer science, or it's too late to start learning stats. Whatever. I have solid skill that I didn't have in 15months ago.
私はこうして経験を反芻し、自分なりにレコードし、フタをすることによって次に使える形に整える。
浪人の頃に、先生の一言一句を脳みそに刻みつけた時期があった。それほど先生の言葉に価値があった。今回のプロジェクトもそうだった。先生のそばでみた、先生の行動、やってくる分析とコーディングの量、私たちの誘導の仕方は全て学ぶところがあった。それに加えて今読み直したシラバスは自分にとってかけがえのない価値をもつものだ。あー感動!もっと勉強したい!
Suppose it's 4 am and you are in the mid of sleep.
What does Econometrics do?