Recapping the AI, Machine Learning and Computer Meetup — December 12, 2024

#computervision #machinelearning #ai #datascience

We just wrapped up the December ‘24 AI, Machine Learning and Computer Vision Meetup, and if you missed it or want to revisit it, here’s a recap! In this blog post you’ll find the playback recordings, highlights from the presentations and Q&A, as well as the upcoming Meetup schedule so that you can join us at a future event.

How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos

CoTracker3 is a state-of-the-art point tracking model that introduces significant improvements in tracking objects through video sequences. Its key innovations include:

Use of semi-supervised training with real videos, reducing reliance on synthetic data
Generates pseudo-labels using existing tracking models as teachers
Features a simplified architecture compared to previous trackers

Speaker: Nikita Karaev is currently doing a PhD at Meta AI and Oxford, where he’s working on dynamic reconstruction and motion estimation (CoTracker) with Andrea Vedaldi and Christian Rupprecht. Before that, he did his master’s at École Polytechnique (Paris), and undergrad in cold Siberia (Novosibirsk). He was also an early employee at two startups that got acquired by Snapchat and Farfetch.

Q&A

Is processing a whole video in one go not computationally expensive?
Can you explain more about the 4D correlation?
Do you think leveraging pre-trained world models or models explicitly trained on/sensitive to laws of physics/ how objects in 3D interact - can this be useful for this kind of temporal tracking? Would it be useful for OOD cases?
What are the evaluation metrics that are mainly tracked?
How can CoTracker’s joint tracking technology be leveraged to enhance identity verification and access control in cybersecurity frameworks, and what are the potential risks associated with spoofing or compromising such systems?

Resource Links

Hands-On with Meta AI's CoTracker3: Parsing and Visualizing Point Tracking Output

In this presentation, Harpreet Sahota explores CoTracker3, a state-of-the-art point tracking model that effectively leverages real-world videos during training. He dives into the practical aspects of running inference with CoTracker3 and parsing its output into FiftyOne, a powerful open-source tool for dataset curation, analysis, and visualization. Through a hands-on demonstration, Harpreet shows how to prepare a video for inference, run the model, examine its output, and parse the model’s output into FiftyOne’s keypoint format for seamless integration and visualization within the FiftyOne app.

Speaker: Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Q&A

For the Cotracker models - are there model compression/quantization techniques you tried and or can recommend?

Resource Links

Streamlined Retail Product Detection with YOLOv8 and FiftyOne

In the fast-paced retail environment, automation at checkout is increasingly essential to enhance operational efficiency and improve the customer experience. This talk will demonstrate a streamlined approach to retail product detection using the Retail Product Checkout (RPC) dataset, which includes 200 SKUs across 17 meta-categories such as puffed food, dried food, and drinks. By leveraging YOLOv8, renowned for its speed and accuracy in real-time object detection, and FiftyOne, an open-source toolset for computer vision, we can simplify data loading, training, evaluation, and visualization for effective product detection and classification. Attendees will gain insights into how these tools can be applied to optimize checkout automation.

Speaker: Vanshika Jain is a Data Engineer Intern at UNAR Labs, a startup focused on making information accessible for the blind. She holds a Master’s degree in Machine Learning and Computer Vision from Northeastern University and is passionate about applying AI and computer vision to real-world problems, with a focus on automation and accessibility.

Q&A

In your retail use case, did you find certain types of objects get confused with each other more often than other object pairs?
Is the code you used for fine tuning published? Or can you point to a resource you can recommend?

Resource Links

Join the AI, Machine Learning and Computer Vision Meetup!

The goal of the Meetups is to bring together communities of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies.

Join one of the 12 Meetup locations closest to your timezone.

What’s Next?

Up next on Jan 29, 2024 at 9:00 AM PT / 12:00 PM ET, we have three great speakers lined up!

Is AI Creating a Whole New Earth-Aware Geospatial Stack? Promises and Challenges- Dr. Bruno Sanchez-Andrade Nuno, Clay – AI for Earth
Evaluating the Satlas and Clay Remote Sensing Foundational Models- Steve Pousty, Voxel51
Earth Monitoring for Everyone with Earth Index- Mikel Maron, The Earth Genome

Get Involved!

There are a lot of ways to get involved in the Computer Vision Meetups. Reach out if you identify with any of these:

You’d like to speak at an upcoming Meetup
You have a physical meeting space in one of the Meetup locations and would like to make it available for a Meetup
You’d like to co-organize a Meetup
You’d like to co-sponsor a Meetup

Reach out to Meetup co-organizer Jimmy Guerrero on Meetup.com or ping me over LinkedIn to discuss how to get you plugged in.

—

These Meetups are sponsored by Voxel51, the company behind the open source FiftyOne computer vision toolset. FiftyOne enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. It’s easy to get started, in just a few minutes.