From physicist to machine learning engineer

Justin Chen earned his Ph.D. in Physics from Rice University before deciding to move from academia into machine learning. He spent the last few years as a machine learning engineer at Manifold, building and deploying end-to-end ML and data pipelines from scratch. He recently made the transition to Google, where he is currently part of the Keyword Team (“Ok Google”) developing technology behind speaker identification and audio speech processing. Here, he talks about use cases, best practices, and what he’s learned throughout his ML journey.

How did you go from university to machine learning?

It was not easy. I think the challenge was figuring out how to best present what I’ve done in a way that’s useful to the industry. So I moved my resume from the research problems I solved in my PhD program to the mathematical methods I had to implement, the coding work I did, and the groups I had to organize to do it. This approach had much better traction and my success rate in job interviews was much higher afterwards.

What are some of the key skills you needed to develop to become a machine learning engineer?

I would say that being able to write code and then communicate about it with other people is one of the most crucial skills for being successful as a machine learning engineer. Anyone with a PhD in computer science has the math and programming knowledge to thrive, but it’s more about navigating the different ways of working in a company. For me, the hardest thing to learn was having to produce code between meetings, and more importantly, having to produce code with other people. It was a huge struggle for me, and I think a lot of people in academia might struggle with it as well. I was very used to coding by myself. I had never been subject to a code review – and certainly not one where anyone cared about readability. At first it seemed like a lot of finicky suggestions, but I quickly realized that working in this industry is a very collaborative process. You can’t do all the coding by yourself.

Your first role in machine learning was at Manifold AI. Tell us a bit about your responsibilities there and some of the most interesting cases you’ve worked on.

Manifold does AI consulting, in partnership with other companies in various fields. They implement the entire AI workflow, from data preparation and ingestion to model deployment and monitoring. For my work, I wore many hats and was everything from a data engineer to a front-end engineer doing dashboards (which I wasn’t very good at) to a machine learning engineer.

The most interesting problems I’ve worked on at Manifold are the ones I’ve had to solve end-to-end, because it’s actually pretty rare to do that now with any framework out there. It was great to be able to work with data that wasn’t already organized in a way that it could easily be ingested into models, and then design a system that could ingest the data, process it, build features, train the model and then learn how to monitor it.

It was also nice to work in healthcare where there are some interesting challenges around data security. You regularly deal with PII (personally identifiable information) and PHI (protected health information), and if you can’t even dive into the data to see it for yourself, how can you solve the pattern? Manifold has done a great job of partitioning, protecting, and fixing these issues.

What are your tips for starting to build a model once you’ve completed the experimentation phase and how do you ensure it’s ready for production?

People outside the field are often eager to jump into that really cool thing they’ve heard about, like image recognition or neural networks, and they want to know if you can build a brain that does everything for them. And this might not even be the right place to start, because if you’ve never done basic ML models – linear regression, SVMs (support vector machines) or random forest algorithms – you don’t know not even if neural networks will help you. .

Often it’s best to get the data, fill the overall end-to-end pipeline, then start with a base model and see what the baseline is. See what you can achieve with pure linear regression or pure decision trees. And then you can start figuring out where the model is underperforming and how to fix it before progressing to more complicated models.

It’s hard to know when something is really ready for production. I think the biggest mistake I’ve made is spending too much time trying to perfect, refine, and recycle models. What I’ve found to be a good strategy is to identify one or two metrics relevant to your problem, and when you run that model, compare it to what those metrics currently are with a baseline model. If it doesn’t do better in those one or two metrics, you should go back to the drawing board. I’ve been in situations where we tried to have five or 10 metrics, but quickly realized that it’s impossible to assess when you have too many metrics to look at. It will always do better for some and worse for others. So probably when you find the most relevant ones, that’s when you’re ready for launch.

Can you tell us about your new position at Google and the differences in responsibilities compared to your job at Manifold?

My role now at Google is so much more about working on the ML part of things. Manifold was constantly building models for different partners and clients and I had to wear many hats, whereas at Google I’m working on a particular problem and there is a clear path for many DevOps and baselines.

What excites you the most about your new role and what kind of problems do you hope to solve?

Without going into too much detail, I’m working on active speech identification and solving the interesting problem of doing it efficiently on small devices. For me, the most exciting thing with this particular problem is learning NLP (natural language processing) and audio speech processing and being able to work on these more advanced methods in an environment where I can really focus on this issue.

Can you give us an example of a challenge you faced in your ML experience and how you went about solving it?

A general problem we might face is where to start, because you receive so much data. You can’t test all the hundreds of features, and building a model this way is very difficult. What works best is finding the subject matter expert in the business who already has an idea of ​​what makes users happy and what makes them unhappy. Find out the patterns they’ve noticed when people do and don’t buy. Once you have these subject matter experts, you can focus on a set of features and then from there the problem becomes easier to solve.

We talked about putting models into production and not being able to predict what happens. How did you approach situations where the inference data or the model started to derivative?

My biggest lesson from releasing models is to release them on Monday, because there’s no way to predict what type of data is going to be released to your model. The best part about testing a model in a test environment that replicates reality as closely as possible is that downstream consumers aren’t affected if anything goes wrong, and if any weird upstream data comes in, you can catch it. . But eventually, you put the model into production and monitor it for a while. Inevitably, some assumptions are going to be wrong and your model could break. The main thing here is therefore to regularly monitor and have performance measures on the data so that any aberration triggers an alarm.

Can you tell us about your approach AI fairness and bias tracing?

I don’t think there’s a well-established way to prevent bias in your model, and this idea of ​​sending more data to it doesn’t always work. I think you need to focus on explainability. I worked on a project where I implemented SHAP values, but no model will be perfect. Some are at least useful for finding big biases and slowly reducing them.

In order to really tackle it, you have to have a human in the loop who is actively looking for possible biases and actively trying methods to monitor those things, because bias is something that absolutely fails silently. You won’t have a clue what’s going on unless you try to find it, and even then you’ll miss it. And because the bias itself is so subjective, it’s important to have a diverse group of humans in the loop because they’ll all see the problem in unique ways and they’ll see different things that could happen to them. Having that kind of environment will always outperform having one person or a small effort trying to identify biases.

How do you feel about working for a startup versus a larger, more established company?

There’s a certain excitement that comes with working at a startup and having to do things like code by the seat of your pants that you just don’t get at a bigger company. I wanted to try that, but I also wanted to grow in other directions. For me now, it’s great to be able to focus on a single issue that’s particularly interesting rather than trying to do tons of other things.

Sherry J. Basler