Detection Transformers; Nvidia Jetson Xavier NX DevKit; Google's Big Transfer (BiT); CVPR 2020

Vision Week Issue #5

Detection Transformers (DETR)

Transformer networks have gained a lot of popularity in the recent years. But mostly in the sequence modeling space like NLP, not much used in the computer vision space.

Recently Facebook AI research team has successfully used transformer networks for object detection making the end-to-end detection pipeline much simpler. (paper | code | blog)

Nvidia Jetson Xavier NX DevKit

Nvidia announced Jetson Xavier Nx last November but it wasn’t available to buy right away. After the wait, the devkit has finally arrived.

What’s new ?

Nvidia announced Jetson Nano last year as entry level edge device for AI at $99. Nano was really well received in the market. Now Nvidia is upping the game with Xavier NX for heavy compute applications on edge which Nano can’t handle. In fact, Nvidia calls it “the World’s Smallest AI Supercomputer”. It comes at the same form factor as Nano but with way better compute capability at $399.

Google opensources Big Transfer (BiT)

Researchers at Google have found that pre-training computer vision models on very large scale datasets (way bigger than ImageNet) helps the model to learn quickly and generalize better when finetuned using transfer learning for other tasks even with vey small training data. This approach has been found useful in the language domain in recent times. This work in computer vision shows the need for very large scale open datasets (one of the datasets with 300M images used in the research was an internal dataset). (paper | code | blog)

AI for Healthcare Nanodegree

Following Coursera, Udacity has also launched “AI for Healthcare” nanodegree, a specialized course focusing on applying machine learning to medical data. With everything going on currently, applying AI to healthcare has gained more importance than ever. Checkout the curriculum to learn more.

CVPR 2020

CVPR this year was conducted as a virtual conference due to the pandemic. Featured two special keynotes from Satya Nadella, CEO of Microsoft and Charlie Bell, SVP of AWS. Many tutorial and workshop organizers have generously posted the video recordings online. Checkout the individual websites for the tutorial / workshops you are interested in.

Deep learning lectures from DeepMind

DeepMind in collaboration with UCL has shared a series of deep learning lectures by research scientists at deepmind. 10 out of 12 planned lectures are available on YouTube. Probably the course is affected due to the pandemic. Nevertheless, the available lectures are definitely worth checking out.

Fei-Fei Li on Exponential View Podcast

Dr.Fei-Fei Li, creator of ImageNet was featured on Exponential View podcast. She discussed the early days of ImageNet, the impact it has created, her current work on healthcare and the vision of Human centered AI Lab (HAI) she has cofounded at Stanford. Checkout the full episode here or wherever you generally consume podcasts.

AI fun :)

YOLOv4; PyTorch 1.5; Nvidia DGX A100; Tesla at ScaledML Conference

Vision Week Issue #4

YOLO v4 - Better Speed and Accuracy

You might have heard by now, YOLOv4 has been released. But not actually from the original author of YOLO, Joseph Redmon. This is from different researchers including Alexey Bochkovskiy who is known for his popular github repo on DarkNet, forked from original DarkNet repo. He has made several improvements to DarkNet. In a way his repo was more popular than the original one.

YOLOv4 employs more modern state-of-the-art techniques such as Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation to achieve better speed and accuracy. Checkout the paper and code for more details.

Nvidia DGX A100

Nvidia CEO Jenson Huang announced new products in his “kitchen keynote” as part of GTC 2020. The highlight was the absolute GPU beast DGX A100 which is aimed at datacenter usage or as a server node for high intensive training/inference for your research team, amazingly priced at $199K (yes, you read that right) along with other announcements real time ray tracing and audio-to-face (A2F).

PyTorch 1.5

PyTorch 1.5 has been released with several changes and additions along with updates for torchvision, torchaudio and torchtext. Facebook (maintainer of PyTorch) is also partnering with AWS for TorchServe, a tool that helps you deploy your PyTorch models to the cloud and use them in production with API calls over the internet. Checkout the official blog post for more details.

Machine Learning Yearning

Doing online courses and assignments is one thing, applying machine learning in the real world and deploying the model to production is completely a different thing. Dr. Andrew Ng shares his practical experience building real world products at companies like Baidu and Google.

This book is a treasure trove of practical knowledge and best practices for AI Engineers. Be sure to get the draft of the book for free here. A must read.

Tesla at ScaledML Conference

Andrej Karpathy (Director of AI at Tesla) gave a talk at ScaledML conference 2020 on how the company is using AI to get closer to Full Self Driving. Particularly he discussed about stop sign detection which looks like a simple problem but in reality how challenging it is in production. Also he discussed how Tesla is achieving these results without actually having a LiDAR, just with cameras and few other sensors (radar, ultrasonic) which is quite difficult and impressive.

ICLR 2020

International Conference on Learning Representations (ICLR) is one of the premier conferences in the field of ML/AI. This year it was planned to be conducted at Ethiopia. But due to the current corona virus situation, the organizing committee has decided to make it a virtual conference going completely online.

I think this is the first premier conference to go completely online. Last year NeurIPS was live streamed online. But it was additional to the physical event happened in Canada. A portion of the talks (including the talks from Yann LeCun, Yoshua Bengio, Andrew Ng etc.) and workshops from ICLR have been made available online. Feel free to check it out.

The age of AI

YouTube has released an original series on AI featuring Robert Downey Jr. as host. The series explores the current state of the art works in AI, the impact it can have in our lives and what the future has in store for us. Interesting watch if you have some time to kill.

AI fun :)

Social Distance Monitoring; AI for Medicine Specialization; covid-19 report

Vision Week Issue #3

Social Distance Monitoring

Prof. Andrew Ng's venture has developed a tool for monitoring social distancing. They have also shared the techniques on how they built it so that other people can build themselves a similar tool if needed (the actual source code is not open sourced though).

TensorFlow Dev Summit 2020

TensorFlow Dev Summit this year was a completely online event due to covid-19 concerns. All the talks are available on TensorFlow YouTube channel. In case you missed it, feel free to catch up.

AI for Medicine Specialization has launched a new specialization on Coursera “AI for Medicine” consisting of three courses taught by experts from Stanford. If you are interested in applying AI in the healthcare space it's definitely worth checking out.

Covid-19 - a realistic look

Jeremy and Rachel from has put together a realistic analysis on the current situation. As they point out, just trying to stay calm and not panic is not enough. Staying informed and preparing ourselves both mentally and physically is as important as staying sane in this difficult time.

GANs in Action

If you have been thinking to learn to work with GANs but didn’t really get hold of a good comprehensive resource then this book “GANs in Action - Deep learning with Generative Adversarial Networks” is for you. This manning publication is definitely one of the good resources out there on GAN covering from the basic idea to state of the art results.

TensorFlow without a PhD

If you are a fan of Martin Görner’s without a phd series, then you should definitely star this github repo. This repo contains the collection of resources for all the talks he has given in this “without a phd” series.

AlphaGO movie

DeepMind has released the full documentary “AlphaGO” on YouTube. Previously it was available on Netflix. Just to give some background, AlphaGO is the computer program that beat the 18 times World Champion Lee Sedol (Professional GO player from South Korea) in the board game “GO”.

GO is considered to be a complex game to solve for computer programs. AlphaGO beating a world champion is considered a major breakthrough in the history of AI. If you are looking for something inspiring to watch during this quarantine period, GO for it.

AI fun :)


How China tracks everyone

Sneak peak into how China does surveillance at scale.

TensorFlow World; Microsoft Azure Kinect; Google Coral out of beta

Vision Week Issue #2

O’Reilly TensorFlow World

TensorFlow team has teamed up with O’Reilly to host their first TensorFlow World conference earlier this month. If you are wondering how does this differ from TensorFlow Dev Summit, well, the key difference is in Dev Summit mostly people from the TensorFlow team will present their work. But TensorFlow World is a place for everyone in the community to learn and share what they are building with TensorFlow.

That means you can see talks and sessions from diverse set of people including TensorFlow team. All the sessions from TensorFlow team is up on TensorFlow YouTube channel. Other talks are on the O’Reilly online learning platform. O’Reilly says all the recorded sessions will be available on the platform after three weeks from the conference. They have a 10 day free trial. No credit card required. Give it a try to watch all the sessions.

Azure Kinect

Microsoft has released a new version of Kinect called ‘Azure Kinect DK’. DK stands for developer kit. Original version of Kinect was released almost a decade ago for Xbox. It was mainly intended for gaming use with Xbox. But people also used it for computer vision research because of the depth sensing capability it had.

This time Azure Kinect is solely intended for developers and companies to build things and not intended for regular consumers and this is not meant to replace the existing kinect for Xbox. Microsoft says they have put together their best sensors to build AI applications. It has a 12 MP RGB camera, 1 MP depth sensing camera and microphone arrays. It doesn’t have onboard processor but it can be connected to a CPU to process the wealth of information it captures to build vision and speech applications.

Real time video gesture recognition

Researchers at MIT have developed a new technique “Temporal Shift Module (TSM)” to do video classification efficiently on low compute devices. Generally doing video activity recognition in real time on edge devices is hard because of the high compute. In video classification we look at sequence of frames to predict the class as opposed to looking at a single frame at a time for image classification or object detection. The demo runs in real time on Nvidia Jetson Nano under 10 Watts. (Paper | GitHub | Site)

CVPR 2019

Computer Vision and Pattern Recognition (CVPR) is one of the premier conferences in computer vision. CVPR 2019 was over earlier this year. Not all of us can afford to travel and attend the conference in person. Luckily there is this thing called ‘internet’. Computer Vision foundation has made lot of the sessions available online. You can find the video recordings (if available) on YouTube and slides under each session page on the conference website. This really helps to get a sense of what’s going on in the research frontier.

Ancient Secrets of Computer Vision

Joseph Redmon (the author of YOLO/DarkNet) teaches a computer vision course at the University of Washington. He has generously posted the video lectures on YouTube. It’s definitely one of the good introduction to CV courses available online. Feel free to check it out.

Google Coral TPU graduates out of beta

Google launched it’s new hardware Coral Edge TPU earlier this year in March for AI at the edge. After six months now its stable and out of beta. It runs models in a specific TensorFlow Lite edgetpu format very efficiently for low latency real time applications. AI on the edge is off to a good start. Long way to go though !

3Blue1Brown on Siraj Raval Podcast

You might know Grant Sanderson from his awesome YouTube channel “3Blue1Brown”. He was recently interviewed by Siraj on his podcast where they discussed about doing math animations, Grant’s recent visit to India and more. Listen to the episode to learn more. It is available on Google Podcasts, Spotify and possibly wherever you consume your podcasts.

AI fun :)

XKCD Comic @xkcdComic

RL Specialization; Tesla acquires DeepScale; OpenAI at TC DisruptSF

Vision Week Issue #1

RL Specialization from UAlberta on Coursera

University of Alberta and Alberta Machine Intelligence Institute (AMII) have come together to launch a specialization (4 courses) on Reinforcement Learning on Coursera. If you have been trying to learn RL, you might already know that there are not a lot of well structured go-to courses out there. There are some really good resources like Sutton & Barto’s text book, David Silver’s course lectures and UC Berkeley RL course lectures on YouTube, OpenAI’s spinning up in deep RL. But they lack well structured assignments or proper beginner-friendly MOOC setup.

Why this matters:

The reason this course looks promising is, it’s from the people who work directly with the great minds in RL like Prof. Sutton. In fact Sutton himself is involved in the creation of this course. I kind of feel like these courses might be the video version of Sutton’s text book. But it’s definitely worth checking out. Give it a try and let me know your thoughts if you manage to finish any of the courses in the specialization.

OpenAI at TechCrunch Disrupt SF

Sam Altman (CEO) and Greg Brockman (CTO) from OpenAI were at TechCrunch Disrupt San Francisco and discussed some of the company’s earlier decisions (forming capped for profit OpenAI LP, partnership with Microsoft, GPT-2 etc) and future roadmap for OpenAI. Greg also showed a demo of their recent experiment with multi-agent RL and how the agents discovered tool usage.

Tesla acquires DeepScale

Tesla has acquired DeepScale (the company behind SqueezeNet paper). SqueezeNet was one of the first attempts in creating smaller models without losing too much accuracy. This acquisition clearly shows the need for efficient models that can run on edge devices with smaller footprint. AI on edge is definitely booming.

Introduction to TensorFlow Lite on Udacity

TensorFlow Lite team has launched a course on Udacity covering deploying TFLite models to mobile and edge devices. Google also released their Coral Edge TPU earlier this year to advance AI applications on the edge. The course may not be advanced in-depth course. But it can definitely serve as a good comprehensive introduction to TensorFlow Lite.

PyTorch Mobile

Until now, if you want to deploy a model to mobile, your best bet was TensorFlow Lite. PyTorch has added support for mobile (Android and iOS) with their 1.3 release. (PyTorch is upping it’s game!)

DeepMind Podcast

DeepMind has completely restructured the organization recently. They have released a limited series podcast with mathematician Hanna Fry hosting the show and giving us an insider look. All the eight episodes are out. You can find it on Spotify, Google Podcasts or wherever you consume your podcasts.

PyTorch official YouTube channel

PyTorch gets it official YouTube Channel (finally!). I have been waiting for this. Earlier the videos were scattered among Facebook Developers YouTube channel and facebook pages. Now we can watch all the content in one place. Go ahead and watch the videos (PyTorch Developer Conference, Summer Hackathon etc) when you are free.

AI fun :)

Loading more posts…