Author Archives: Subhadeep Mukhopadhyay

What’s The Point of Doing Fundamental Science?

What’s the point of doing fundamental science? Nima Arkani-Hamed presented his own fascinating perspective in a recent public lecture at Cornell, which everybody should listen. These are some of my top picks:

What does it require to do fundamental science?

Nima: Doing fundamental science necessitates working on hard problems for very long periods of time and it is not obvious how to go about attacking a problem that seems impossible to solve and might take decades to make progress on….You don’t sit around and read in a book “how you go about solving an impossibly hard problem.” You need a small group of people who have actually done it and you learn from them by osmosis. The people who work on these things are driven by a cause that is bigger than themselves and to earn it they have to work their asses off.

How can we make funding possible for this kind of people?

Nima: If you are somehow gotten into doing fundamental science with the idea that it is some sort of safe nice things to do with your life, you are crazy, you are wrong. It’s not safe, it’s risky. You are risking the most important thing you have, your life and your time to pursue things. So it’s a big risk, you should know it….if it pays out you will get all the joys and any other kind of more practical reward you get out it will be similarly bigger.

Should you work on more ambitious slightly crazier ideas or the things that most other people are working on?

Nima: There is no clean answer to this question. But it’s true that if you go a little bit further to the other people and something works, even a little bit, poof, you are lifted out of the masses and you will have a wonderful academic life after that. In academics, the greatest thing can happen is that a crazy idea actually ending up being somewhat right.

How do you approach solving problems? 

Nima:  Don’t be a monkey who is only interested in the calculation. Zoom in (detailed calculations) and zoom out (the grand vision that draws you in) constantly when solving any problem. They not orthogonal to each other, but they are not the same either. They are the two different ends of one beast that is known as a scientist.

What is the number one thing that the current academic system needs?

Nima: The number one thing that academics need is utter and complete freedom to do anything the hell they want. Freedom, freedom, freedom the most important commodity in any academic pursuit.

Is there any systemic thing that can be done to ensure the rapid development of fundamental science research? 

Nima: I think we could be doing a lot better on increasing the number of weird people that we actually have in academics. We need many many more of them. So actually, what I personally want to fight for: is enlarging the group of strange people that we allow into our mix. I find too much of a certain kind of homogeneity.

 

 

Two sides of Theoretical Data Science: Analysis and Synthesis

This discussion is motivated by a simple question (that was posed to me by a theoretical computer scientist): “What is the role of Statistics in the development of Theoretical Data Science?” The answer lies in understanding the big picture.

Theory of [Efficient] Computing: A branch of Theoretical Computer Science that deals with how quickly one can solve (compute) a given algorithm.  The critical task is to analyze algorithms carefully based on their performance characteristics to make it computationally efficient.

Theory of Unified Algorithms: An emerging branch of Theoretical Statistics that deals with how efficiently one can represent a large class of diverse algorithms using a single unified semantics. The critical task is to put together different “mini-algorithms” into a coherent master algorithm.

For overall development of Data Science, we need both ANALYSIS + SYNTHESIS. However, it is also important to bear in mind the distinction between the two.

 

 

 

The “Science” and “Management” of Data Analysis

Hierarchy and branches of Statistical Science

The phrases “Science” and “Management” of data analysis were introduced by Manny Parzen (2001) while discussing Leo Breiman’s Paper on “Statistical Modeling: The Two Cultures,” where he pointed out:

Management seeks profit, practical answers (predictions) useful for decision making in the short run. Science seeks truth, fundamental knowledge about nature which provides understanding and control in the long run.

Management = Algorithm, prediction and inference is undoubtedly the most useful and “sexy” part of Statistics. Over the past two decades, there have been tremendous advancements made in this front, leading to a growing number of literature and excellent textbooks like Hastie, Tibshirani, and Friedman (2009) and more recently Efron and Hastie (2016).

Nevertheless, we surely all agree that algorithms do not arise in a vacuum and our job as a Statistical scientist should be better than just finding another “gut” algorithm. It has long been observed that elegant statistical learning methods can be often derived from something more fundamental. This forces us to think about the guiding principles for designing (wholesale) algorithms. The “Science” of data analysis = Algorithm discovery engine (Algorithm of Algorithms). Finding such a consistent framework of Statistical Science (from which one might be able to systematically derive a wide range of working algorithms) promises to not be trivial.


Above all, I strongly believe the time has come to switch our focus from “management” to the heart of the matter: how can we create an inclusive and coherent framework of data analysis (to accelerate the innovation of new versatile algorithms)–“A place for everything, and everything in its place”– encoding the fundamental laws of numbers. In this (difficult yet rewarding) journey, we have to remind ourselves constantly the enlightening piece of advice from Murray Gell-Mann (2005):

We have to get rid of the idea that careful study of a problem in some NARROW range of issues is the only kind of work to be taken seriously, while INTEGRATIVE thinking is relegated to cocktail party conversation

 

Edge Exchangeability

Harry Crane writes

“One thing that’s kept me busy is that I’ve had some people who took my work on edge exchangeability and are claiming credit for it. I’ve tried to resolve the issue with them and the conference (NIPS) but none of that was successful. Since traditional channels failed, I made this video to get the facts out there.”

If this is true, then it is a very ALARMING trend.

Confirmatory Culture: Time To Reform or Conform?

Confirmatory culture is deep rooted within Statistics.

THEORY

Culture 1: Algorithm + Theory: the role of theory is to justify or confirm.

Culture 2: Theory + Algorithm: From confirmatory to constructive theory, explaining the statistical origin of the algorithm(s)–an explanation of where they came from. Culture 2 views “Algorithms” as the derived product, not the fundamental starting point [this point of view separates statistical science from machine learning].

PRACTICE 

Culture 1: Science + Data: Job of a Statistician is to confirm scientific guesses. Thus, happily play in everyone’s backyard as a confirmatist.

Culture 2: Data + Science: Exploratory nonparametric attitude. Plays in the front-yard as the key player in order to guide scientists to ask the “right question”.

TEACHING 

Culture 1: It proceeds in the following sequences:

for (i in 1:B) {
Teach Algorithm-i;
Teach Inference-i;
Teach Computation-i
}

By construction, it requires extensive bookkeeping and memorization of a long list of disconnected algorithms.

Culture 2: The pedagogical efforts emphasize the underlying fundamental principles and statistical logic whose consequences are algorithms. This “short-cut” approach substantially accelerates the learning by making it less mechanical and intimidating.

Should we continue to conform to the confirmatory culture or It’s time to reform? The choice is ours and the consequences are ours as well.

Data Scientist and Data Mechanic

Beware of “Kaggle Syndrome”. Refuse to jump on this bandwagon of playing brainless task of combining random algorithms to win a competition.  In any case, this will make NO impact (other than 15 minutes of fame) as happened with the Netflix competition.

As the datasets are getting BIG and COMPLEX, the most difficult challenge for Statistical Scientist is to figure out “Where is the information hidden.”  It’s an interactive process of investigation rather than a passive application of algorithms and calculating error rates. Two critical skills:  (1)  “look at the data”, which is missing in the mechanical push the button culture; and (2)  learn “how to question the data”, rather than only answering a specific question.  They allow data scientists to discover the unexpected in addition to the usual verification of the expected.

This begs the question whether

  • the Data Science training curriculum should look like a long manual of specialized methods and (series of cookbook) algorithms;
  • or, should train students (and industry professionals) in the Scientific Data Exploration (Sci-Dx) — A systematic and pragmatic approach to data modeling addressing the “Monkey and banana problem” [Pigeon’s approach] for practitioners. [I believe Wolfgang Kohler‘s “insight learning” idea can guide us to  develop such a curriculum.]

The first path will produce DataRobots, not Data Scientists. The later goal looks out of reach unless we figure out how to design the “LEGO Bricks” of Statistical Science (fundamental building blocks of Statistical learning), which help to understand disparate Statistical procedures from a common perspective (thus reduces the size of the manual) and can be appropriately combined to build versatile data products brick by brick.

 

 

Follow Your Star

The article “Follow your star” by Theoretical physicist Ulf Leonhardt (thanks to Prof. Martin Vetterli’s tweet) is remarkable in many ways. I will quote few inspiring portions, which are highly significant for early-career researchers like me.

What qualities are necessary to succeed as a scientist?

“Be stubborn. Believe in yourself. Don’t do what others are saying. Also very important is to stand up again and again. You will fall all the time. There will be disasters, small and great. Usually the first attempt fails, but you learn from it. Maybe you don’t even learn all the time, but from the beginning, you should not be afraid of failure.”

Advice for young scientists?

“What I have observed in some very good young scientists is that they think too much about their careers. This is all wrong: Build yourself, not your CV. Otherwise, you succeed in the short term, but then you may burn out and lose the fun in science. Young scientists should first think about what they want to do.”

“Know how science and the scientific establishment works, but don’t take it too seriously. Listen to what other people are saying, but don’t apply it automatically. Other people may see some aspects of your situation, but they don’t have the knowledge of it all. Only you have that.”

Have people tried to dissuade you from following your ideas?

“Of course, all the time. Whether it affects me or not depends on the people and the style of the discussion. If people criticize me in a nonscientific way, I completely ignore them because it’s not an argument. If it’s a scientific attack I take it seriously, and then I respond and I learn from it.”

Deep Association with Manny Parzen: 2009-2016

Each mind to achieve its full potential needs a SPARK. The spark of enquiry, excitement, and passion. Often that the spark comes from a teacher. There was teacher behind every great artist, every great philosopher, every great scientist. However difficult life can be, teachers have always been there, behind the scenes, showing us the way forward” — Stephen Hawking, 2016.

Manny Parzen, my greatest mentor and my hero, was that SPARK in my life. He changed my whole outlook by opening a new window for viewing the landscape of Statistical Science — The “Parzen’s window.” I was hooked by discovering the joy of  “connecting the dots.”  It was the turning point in my career.

A lot what I do, how I do, and why I do is heavily influenced by those wonderful years I spent with Manny. I salute my real master Manny Parzen, who infused in me a sense of purpose, which drives me to do meaningful research; who taught me the art of asking the right question whose solution really matters. THANK YOU for igniting the passion for the study of fundamental laws of numbers.  I hope someday we’ll be able to fulfill your dream and vision for “United Statistical Theory and Algorithms” to reboot 20th Century fragmented Statistics.

To access the scientific impact of a researcher, I often run a quick thought experiment by asking what will happen if I remove this person from the history of that discipline. If we dare to do that experiment for Manny, there will be a multidisciplinary collapse: Statistics, Econometrics, Machine Learning, Signal processing (and Data Science, which will soon be clear). His innovations were pillars for modern data analysis.

In my last telephone conversation on Jan 7th, I told him: we should catch up soon, and his reply was “I will send you a note”; Feb 6th will be remembered as the saddest day in my professional life.  I will miss my HERO dearly, but the friendship we had will never be forgotten from this day until we meet again. You will always remain Deep in my heart.

I am still recovering from the shock. I am now more focused and determined than ever to run the show; I’m looking forward to the day when you will welcome me saying: “It was a good show, my boy. Let’s raise a toast together.”

In eternal gratitude and love,
Deep

IAS: The Role Model Institute

A truly Ideal Seductive Atmosphere (IAS), where Research is not reduced to commercial profit and loss statements but a pursuit of big ideas and intellectual curiosity, which often appears to be risky and thus discouraged by current norms. On the other hand as reiterated in this recent NATURE article “Attempts to hit the publishable ‘sweet spot’ by avoiding both the prosaic and the risky are likely to reduce the efficiency of scientific discovery.”  Rzhetsky and colleagues [link] (drawing on millions of papers and patents published over 30 years)  further observed that “Successful research that goes against the crowd is more likely to garner high citations and prizes.”  I feel IAS provides a unique intellectual freedom in this regard.

However, I do feel that the presence of students and mild teaching (on a  modern advanced topic that simplifies and unifies current paradigm by giving a comprehensive connected view of that subject) might add more excitement to this intellectual environment.  Training (motivating and mentoring) next-generation of students is a part of academic professional duty, which is often immensely rewarding.

 

The Scientific Core of Data Analysis

My observation is motivated by Richard Courant‘s view:

However, the difficulty that challenges the inventive skill of the applied mathematician is to find suitable coordinate functions.

He also noted that

If these functions are chosen without proper regard for the individuality of the problem the task of computation will become hopeless.

This leads me to the following conjecture: Efficient nonparametric data transformation or representation scheme is the basis for almost all successful learning algorithms–the Scientific Core of Data Analysis–that should be emphasized in research, teaching, and practice of 21st century Statistical Science to develop a systematic and unified theory of data analysis (Foundation of data science).