The Power of Optimization: Lessons from the Academic-Industry Divide
As I reflect on my own journey and experiences as a researcher, I am reminded of the importance of optimization in both academia and industry. The notion that there are "hard problems" out there waiting to be solved is not unique to me or my community. In fact, this idea has been a driving force behind much of our work. We care about those specific problems we're thinking about how can we optimize whether we can get interesting results and that effectively was driving the community so we're not scared maybe to some extent because we didn't maybe because we were lacking actually the theory behind optimization but but I would encourage people to just try and not be afraid to try to tackle hard problems.
One of my personal philosophies is to "not learn to code just into high level" frameworks. Yes, that's right. In deep learning classes, I ask students to implement backpropagation algorithms for convolutional neural networks. It may seem painful at first, but once you really understand how these systems operate and how they work, it becomes a fundamental part of your toolkit. This hands-on approach has taught me the value of understanding the underlying mechanics of AI systems. When I go into research or industry, having this solid foundation is crucial.
The question of whether to pursue a PhD versus joining a company is also one that many students in our community are grappling with. My own lab has a mix of students who want to take an academic route and those who want to join the corporate world. While academia offers more freedom to work on long-term problems, I believe that industry provides its own set of challenges and opportunities for impact. Research in industry can have a profound effect on millions of users if you develop innovative AI technologies.
The landscape is rapidly changing, with academics moving between industries and vice versa. Some students find success by pursuing both paths, while others prefer to focus exclusively on one or the other. The reality is that there are pros and cons to each approach, and it ultimately depends on individual preferences.
One area of research that I believe holds tremendous promise for the future is deep reinforcement learning. We've made significant progress in recent years, enabling agents to learn from virtual worlds and interact with their environments in complex ways. Scaling these systems, developing new algorithms, and understanding how they communicate with each other are all active areas of research.
Another exciting frontier is the field of reasoning and natural language understanding. Can we build dialogue-based systems that can reason intelligently? Can we develop AI models that can read text and answer questions with ease? These questions are at the heart of much of our work, and I believe that we're making significant progress in understanding how humans learn from limited examples.
In fact, this is an area known as "few-shot learning" or "transfer learning," where you throw a new task at someone who has learned something about the world and they can solve it quickly, like a human. This is an active area of research, with many of us in the community working to develop more sophisticated models that can learn from fewer examples. The ultimate goal is to create AI systems that are as intelligent and capable as humans, and I believe that we're getting closer to achieving this goal every day.
Ultimately, my message to anyone interested in pursuing a career in academia or industry is to take the leap and try your hand at it. Don't be afraid of tackling hard problems or trying new approaches. With persistence and dedication, you can make a meaningful impact in either field.
"WEBVTTKind: captionsLanguage: enwelcome Russ I'm really glad youcould join us here today thank you thankyou Andrew so you know today you're thedirector of research at Apple and youalso have a faculty and professor roleat Carnegie Mellon University so I'd love tohear a bit about your personal story howdid you end up doing this you know deeplearning work that you do yeah it's it'sactually does some extent it was Istarted in deploying to some extent byluck I did my master's degree at Torontoand then I took a year off I wasactually working in the financial sectorit's a bit surprising and at that time Iwasn't quite sure where they want to gofor my PhD or not and then somethinghappened something surprising happened Iwas going to work one morning and I bumpinto Jack Hinton and Jeff told me hey Ihave this terrific idea come to myoffice I'll show you and so we basicallywork together and he started telling meabout you know these Boltzmann machinesand contrastive divergence and some ofsome of the tricks which I didn't atthat time quite understand what he wastalking about but that really reallyexcited that was very exciting andreally excited me and then basicallywithin three months I started my PhDwith Jeff so so that was that was kindof like the beginning because that wasback in 2005-2006 and this is where youknow some of the regional deployingalgorithms using restrictive Boltzmannand supervised spec training were kindof popping up and so you know that'sthat's how I started it was really youknow that one particular morning when Ibumped into Jeff completely changed mymy future career moving forward and thenin fact you were a co-author on you knowone of the very early papers onrestricted Boltzmann machines therereally helped with this resurgence ofneural networks and deep learning tellme a bit more what that was like you'reworking on that seven oh yeah this wasthis was actually a really this wasexciting year I was a first year it wasmy first year as a PhD student andJeff and I we're trying to explore theseideas of using restricted Boltzmann'sand and using pre-training tricks totrain multiple layers and specificallywe will try to focus on autoencoders youknow how do we do an only an extensionof PCA effectively and it was veryexciting because we've got these systemsto work on em these digits which wasexciting but then the next steps for uswere to really see whether we can extendthese models to dealing with phases soremember we had this automated phasesdata set and then we started looking atcan we do compression for document so westarted looking in all these differentdata you know real-valued count binaryand throughout you know a year it was Iwas a first-year PhD students it was abig learning experience for me but andreally within six or seven months wewere able to get really interestingresults and really good resultssomething that we you know we were ableto train these very deep autoencodersthis is something that you couldn't doat that time using sort of traditionaloptimization techniques and then it'syou know it turns out it's a reallyreally exciting paper for us that wasthat was super exciting year because itwas a lot of learning for me but at thesame time the results turn out to be youknow really really impressive for whatwe were trying to do so in the earlydays of this resurgence of deep learningor a lot of the activity was centered onrestricted Boltzmann machines and thenpeople see machines as a there's still alot of exciting research they're beingdone including some in your group butwhat's happening with both machines yeahthat's it that's a very good question Ithink that in the early days the waythat we were using restricted Boltzmannmachines is you sort of can imaginetraining a stack of these restrictedboth machines that would allow you tolearn effectively one layer at a timeand there's a good theory behind youknow when you add a particular layer itimproves the variation bound and soforth under certain conditions so therewas a theoretical justification andthese models were workingwell in terms of being able topre-trained these systems and thenaround 2009/2010 once the computerstarted showing up you know GPUs then alot of us started realizing thatactually directly optimizing these deepneural networks was you know was givingsimilar results or even better resultsso just standard back problems out thepre-training or restricted Boltzmannmachine that's right that's rightand that's sort of over you know threeor four years and it was exciting to thewhole community because people thoughtthat wow you can actually train thesedeep models using these pre trainingmechanisms and then you know with morecompute people start realizing that youcan just basically do standard backpropagation something that we couldn'tdo back in 2005 or you know 2004 becauseit would take us months to do it on CPUsand so that was that was a big changethe other thing that I think that wehaven't really figured out what to dowith you know both machines and deepBoltzmann machines I believe they'revery powerful models because you canthink of them as generative models youknow they try to model complexdistributions in the data but when westart looking at learning algorithmslearning algorithms right now theyrequire using you know Markov chainMonte Carlo in variational learning andsuch which is not a scalable as backpropagation algorithm so so we get haveto figure out more efficient ways oftraining these models and also the useof convolution it's something that'sfairly difficult to integrate into thesemodels I remember some of your work onon using provost ik max pooling for sortof building these generative models ofdifferent objects and using these ideasof convolution was also very veryexciting but at the same time it's stillextremely hard to train these models soit's unlikely Israel yes how much thesework right and so we still have tofigure out water I on the on the otherside some of the recent work usingvariational encoders for example whichcould be viewed as directed versions ofBoltzmann machines we have figured out aways of of training these models was awork by Maxwell and in there thereKingma on using you know we pair withrelation tricks and now we can use backpropagation algorithm within thestochastic system which is which isdriving a lot of progress right now butwe haven't quite figured out how to dothat in in the case of Boltzmann machineso so that's a very interestingperspective I actually wasn't aware ofwhich was in an earlier era wherecomputers were slower that the RPM youknow the pre-training was reallyimportant as only fast the computationthat that drove switching to standingback from you know in terms of theevolution of the community is thinkingin deep learning another topic I knowyou spent a lot of time thinking aboutthis the generative unsupervised versussupervised approaches do share bit abouthow you're thinking about that hasevolved over time yeah I think that's athat's a really I feel like it's a veryimportant topic particularly if we thinkabout unsupervised or semi-supervised orgenerative generative models because tosome extent a lot of successes thatwe've seen there recently is due tosupervised learning and back in theearly days unsupervised learning was wasprimarily viewed as unsupervised pretraining because we didn't know how totrain these multi-layer systems and eventoday if you're working in a settingswhere you have lots and lots ofunlabeled data and a small fraction oflabeled examples you know theseunsupervised pre training models sobuilding these generative models canhelp you know for for supervised die soI think that a lot of us in thecommunity you know it's kind of less itwas the belief when I started doing myPhD was all about generative models andtry to learn these stacks of ballbecause that was the only way for us totrain these systems today there is a lotof work right now on generative modelingyou know if you look at generativeadversarial Networkif you look at variation within quartersthe energy models is something that mylab is working on right now as well Ithink it's it's very exciting researchbut we haven't perhaps we haven't quitefigured it out again for many of you whoare thinking about getting in thedeploying field this is one area that'sI think we you know will make a lot ofprogress and hopefully in the nearfuture so unsupervised earlyunsupervised learning right head layingoh maybe you can think of it asunsupervised learning or semi-supervisedlearning where you have I give you somehints or some examples of what whatdifferent things mean and I throw youlots and lots of unlabeled data so youknow thank you very important insightthat in an earlier era of deep learningwhere computers just slower therestricted Boltzmann machine and deepBoltzmann stream that was needed forinitializing the neural network weightsbut as computers got faster straightbackprop then start to work much betterso you know one of the topic that I knowyou've spent a lot of time thinkingabout is the supervised learning versusgenerative models unsupervised learningapproaches so how has your tell me a bitabout how you're thinking on that debatehas evolved over time I think that weall believe that we should be able to tomake progress there it's just it's justyou know you know all the work onBoltzmann machines variational tencoders yes you can think a lot ofthese models as generative models but wehaven't quite figured out you know howto you know really make them work andhow can you make use of logic almost andeven if even for I see a lot of an ITsector you know companies have lots andlots of datalots of unlabeled data there's a lots ofefforts for going through annotationsbecause that's the only way for us to tomake progress right now and it seemslike you know we should be able to makeuse of unlabeled data because it's youknow it's just abundance of it and andwe haven't quite figured out how to doyet so you mentioned for people wantingto enter deep learning research you knowunsupervised learning the exciting areatoday there are a lot of people wantingto enter a deep learning either researchor applied work so for this globalcommunity either researcher of my workwhat advice would you have yes I thinkthat one of one of the key advisors Ithink I should give is people enteringthat field I would encourage them tojust try different things and not beafraid to try new things and not beafraid to try to innovate I can give youone example which is when I was agraduate student you know we werelooking at neural nets and he's a highlynon convex systems that are hard tooptimize and I remember talking to myfriends with in the optimizationcommunity and the feedback was alwaysthat well there is no way you can solvethese problems because these are nonconvex we don't understand optimizationhow could you ever even do that you knowcompared to doing comics optimizationand it was surprising because in our labyou know we never really cared that muchabout those specific problems we justwere thinking about how can we optimizeand whether we can get interestingresults and that effectively was drivingthe community so we're not were you knowwe were we were not scared maybe to someextent because we didn't maybe becausewe were lacking actually the theorybehind optimization but but I wouldencourage people to just try and not beafraid to try to tackle hard problemsyeah and I remember you once said don'tlearn to code just into high level youknow deep learning frameworks butactually understand yes that's right Ithink that bolon it's one of the thingsthat I try to do it when I teach youdeep learning class is is is one of thefor one of the homeworks I'm askingpeople to actually code backpropagationalgorithm for convolutional neuralnetworks and it's you know it's painfulbut but at the same time if you do itonce you really understand how thesesystems operate and how they work andhow you can efficiently implement themon on on GPU and I think it's it'simportant foryou too when you go into research orindustry you have a really goodunderstanding of what these systems aredoing so it's it's important I think youknow since you have both academicexperience that's professor andcorporate experience I'm curious ifsomeone's sensitive learning what aretheir pros and cons of doing a PhDversus joining a company yeah I thinkthat's that's actually a very goodquestion in my particular lab I have amix of students some students want to goand take an academic route some studentswant to go and take an industry routeand it's it's becoming very challengingbecause you can do amazing research inindustry and and you can also do amazingresearch in academia but in terms ofpros and cons in academia I feel likeyou have more freedom to work on longterm problems or if you think about somecrazy problem you can work on it so youhave a little bit more more freedom atthe same time the research that you'redoing industry is also very excitingbecause in many cases you can with yourresearch you can impact millions ofusers if you develop you know a core AItechnology and and obviously within theindustry you have much more resources interms of compute and be able to you knowdo really amazing things so there arepluses and minuses that it reallydepends on on what you want to do andright now it's interesting veryinteresting environment where academicsmove to industry and then you know focuson industry move to academia but not asmuch and so it's it's you know it's it'sit's a very it's very exciting times itsounds like your academic machinelearning is great and corporate machinelearning is great and the most importantthing is just jumping right either onejust jump in so it really depends onyour own your preferences because youcan do amazing research in either placeso you've mentionedsupervised learning as one excitingfrontier for research are there otherareas that you consider excitingfrontiers for research yeah absolutely Ithink that what I see now communityright now in particularly deep learningcommunity is there are a few trends oneparticular area that I think is reallyexciting is the area of deepreinforcement learning because we wereable to figure out how we can train Agesin virtual worlds and this is somethingthat's in just the last couple of yearsyou see a lot a lot of progress of howcan we scale these systems how can wedevelop new algorithms how can we getages to communicate to each other witheach other and and it's I think thatthat area is and generally the thesettings where you're interacting withthe environment is super exciting theother area that I think is reallyexciting as well is the area ofreasoning and natural languageunderstanding so can we build dialoguebased systems can we build systems thatcan reason that can read text and beable to you know answer questionsintelligently I think this is somethingthat a lot of research is is focusing onright now and then there's not a sort ofsub-aerial so is this area of being ableto learn from fewer examples sotypically you know people think of it asone short learning or transfer learninga setting where you know you you learnsomething about the world and I throwyou a new task at you and you can solvethis task very quickly much like humansdo without requiring lots and lots oflabeled examples and so this issomething that's a lot of us in thecommunity are trying to figure out howwe can how we can do that and how can wehave come closer to human-likehuman-like learning abilities Thank YouRuss for sharing all the comments andinside so there's especially if you sayhearing the story of your early daysdude thanks Andrea yeah thanks forhaving me\n"