We are an inter-disciplinary team of researchers working in visual computing, in particular, computer graphics and computer vision. Current areas of focus include 3D and robotic vision, 3D printing and content creation, animation, AR/VR, generative AI, geometric and image-based modelling, language and 3D, machine learning, natural phenomenon, and shape analysis. Our research works frequently appear in top venues such as SIGGRAPH, CVPR, and ICCV (we rank #14 in the world in terms of top publications in visual computing, as of 6/2023) and we collaborate widely with the industry and academia (e.g., Adobe Research, Amazon, Autodesk, Google, MSRA, Princeton, Stanford, Tel Aviv, and Washington). Our faculty and students have won numerous honours and awards, including FRSC, SIGGRAPH Outstanding Doctoral Dissertation Award, Alain Fournier Best Thesis Award, CS|InfoGAN Researcher Award, Google Faculty Award, Google PhD Fellowship, Borealis AI Fellowship, TR35@Singapore, CHCCS Achievement and Early Career Researcher Awards, NSERC Discovery Accelerator Awards, and several best paper awards from CVPR, ECCV, SCA, SGP, etc. Gruvi alumni went on to take up faculty positions in Canada, the US, and Asia, while others now work at companies including Amazon, Apple, EA, Facebook (Meta), Google, IBM, and Microsoft.
November 6, 2023
NeurIPS, the premier conference on machine learning, will be held in New Orleans this year (Dec 10-16). GrUVi lab will once again have a good show at NeurIPS, with 7 technical papers and 1 dataset and benchmarks paper! Please refer to our publication page to see more details.
November 3, 2023
Click this link to see the talk replay. Title: Co-speech gesture generation Abstract: Gestures accompanying speech are essential to natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation. It is considered an enabling technology in film, games, virtual social spaces, and interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic nature of human co-speech gesture motion, and by the great diversity of communicative functions that gestures encompass. Gesture generation has seen surging interest recently, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep-learning-based generative models that benefit from the growing availability of data. This talk will review co-speech gesture generation research development, focusing on deep generative models. Bio: Dr. Taras Kucherenko is currently a Research Scientist at Electronic Arts. He finished a Ph.D. at the KTH Royal Institute of Technology in Stockholm in 2021. His research is on machine learning models for non-verbal behavior generation, such as hand gestures and facial expressions. For his research papers, he received ICMI 2020 Best Paper Award and IVA 2020 Best Paper Award. Taras was also the main organizer of The GENEA (Generation and Evaluation of Non-verbal Behavior for Embodied Agents) Workshop and Challenge in 2020, 2021, 2022, and 2023.
October 27, 2023
Click this link to see the talk replay. Title: Quantum Computing for Robust Fitting Abstract: Many computer vision applications need to recover structure from imperfect measurements of the real world. The task is often solved by robustly fitting a geometric model onto noisy and outlier-contaminated data. However, relatively recent theoretical analyses indicate that many commonly used formulations of robust fitting in computer vision are not amenable to tractable solution and approximation. In this paper, we explore the usage of quantum computers for robust fitting. To do so, we examine the feasibility of two types of quantum computer technologies—universal gate quantum computers and quantum annealers—to solve robust fitting. Novel algorithms that are amenable to the quantum machines have been developed, and experimental results on current noisy intermediate scale quantum computers (NISQ) will be reported. Our work thus proposes one of the first quantum treatments of robust fitting for computer vision. Bio: Tat-Jun (TJ) Chin is SmartSat CRC Professorial Chair of Sentient Satellites at The University of Adelaide. He received his PhD in Computer Systems Engineering from Monash University in 2007, which was partly supported by the Endeavour Australia-Asia Award, and a Bachelor in Mechatronics Engineering from Universiti Teknologi Malaysia in 2004, where he won the Vice Chancellor’s Award. TJ’s research interest lies in computer vision and machine learning for space applications. He has published close to 200 research articles, and has won several awards for his research, including a CVPR award (2015), a BMVC award (2018), Best of ECCV (2018), three DST Awards (2015, 2017, 2021), an IAPR Award (2019) and an RAL Best Paper Award (2021). TJ pioneered the AI4Space Workshop series and is an Associate Editor at the International Journal of Robotics Research (IJRR) and Journal of Mathematical Imaging and Vision (JMIV). He was a Finalist in the Academic of the Year Category at Australian Space Awards 2021.
October 20, 2023
Click this link to see the talk replay. Title: Efficient, Less-biased and Creative Visual Learning Abstract: In this talk I will discuss recent methods from my group that focus on addressing some of the core challenges of current visual and multi-modal cognition, including efficient learning, bias and user-controlled generation. Centering on these larger themes I will talk about a number of strategies (and corresponding papers) that we developed to address these challenges. I will start by discussing transfer learning techniques in the context of a semi-supervised object detection and segmentation, highlighting a model that is applicable to a range of supervision: from zero to a few instance-level samples per novel class. I will then talk about our recent work on building a foundational image representation model by combining two successful strategies of masking and sequential token prediction. I will also discuss some of our work on scene graph generation which, in addition to improving overall performance, allows for scalable inference and ability to control data bias (by trade off major improvements on rare classes for minor declines on most common classes). The talk will end with some of our recent work on generative modeling which focuses on novel-view synthesis and language-conditioned diffusion-based story generation. The core of the latter approach is visual memory that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed. Biography: Prof. Leonid Sigal is a Professor at the University of British Columbia (UBC). He was appointed CIFAR AI Chair at the Vector Institute in 2019 and an NSERC Tier 2 Canada Research Chair in Computer Vision and Machine Learning in 2018. Prior to this, he was a Senior Research Scientist, and a group lead, at Disney Research. He completed his Ph.D at Brown University in 2008; received his B.Sc. degrees in Computer Science and Mathematics from Boston University in 1999, his M.A. from Boston University in 1999, and his M.S. from Brown University in 2003. He was a Postdoctoral Researcher at the University of Toronto, between 2007-2009. Leonid’s research interests lie in the areas of computer vision, machine learning, and computer graphics; with the emphasis on approaches for visual and multi-modal representation learning, recognition, understanding and generative modeling. He has won a number of prestigious research awards, including Killam Accelerator Fellowship in 2021 and has published over 100 papers in venues such as CVPR, ICCV, ECCV, NeurIPS, ICLR, and Siggraph.
October 13, 2023
Click this link to see the talk replay. Title: Visual Human Motion Analysis Abstract: Recent advancement of imaging sensors and deep learning techniques has opened door to many interesting applications for visual analysis of human motions. In this talk, I will discuss our research efforts toward addressing the related tasks of 3-D human motion syntheses, pose and shape estimation from images and videos, visual action quality assessment. Looking forward, our results could be applied to everyday life scenarios such as natural user interface, AR/VR, robotics, and gaming, among others. Bio: Li CHENG is a professor at the Department of Electrical and Computer Engineering, University of Alberta. He is associate editors of IEEE Trans. Multimedia and Pattern Recognition Journal. Prior to joining University of Alberta, He worked at A*STAR, Singapore, TTI-Chicago, USA, and NICTA, Australia. His current research interests are mainly on human motion analysis, mobile and robot vision, and machine learning. More details can be found a http://www.ece.ualberta.ca/~lcheng5/.
September 18, 2023
ICCV, the premier conference on computer vision, will be held in Paris this year (Oct 2-6). GrUVi lab will once again have a good show at ICCV, with 6 technical papers, 3 co-organized workshops! Also, Prof. Yasutaka Furukawa serves as a program chair for this year’s ICCV! For the workshops, Prof. Richard Zhang co-organizes the 3D Vision and Modeling Challenges in eCommerce, Prof. Angel Chang co-organizes the 3rd Workshop on Language for 3D Scenes and CLVL: 5th Workshop on Closing the Loop between Vision and Language. Also, Prof. Manolis Savva will give a talk at the 1st Workshop on Open-Vocabulary 3D Scene Understanding And here are the 6 accepted papers: Multi3DRefer: Grounding Text Description to Multiple 3D Objects DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion SKED: Sketch-guided Text-based 3D Editing PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding Congrats for the authors!
June 26, 2023
The recording is available at this link. Title: Towards Controllable 3D Content Creation by leveraging Geometric Priors Abstract: The growing popularity for extended realities pushes the demand for the automatic creation and synthesis of new 3D content that would otherwise be a tedious and laborious process. A key property needed to make 3D content creation useful is user controllability as it allows one to realize specific ideas. User-control can be of various forms, e.g. target scans, input images or programmatic edits etc. In this talk, I will be touching works that enable user-control through i) object parts and ii) sparse scene images by leveraging geometric priors. The former utilizes object semantic priors by proposing a novel shape space factorization through an introduced cross diffusion network that enabled multiple applications in both shape generation and editing. The latter leverages pretrained models of large 2D datasets for sparse view 3D NeRF reconstruction of scenes by learning a distribution of geometry represented as ambiguity-aware depth estimates. As an add-on, we will also briefly revisit the volume rendering equation in NeRFs and reformulate it to piecewise linear density that alleviates underlying issues caused by quadrature instability. Bio: Mika is a fourth year PhD student at Stanford advised by Leo Guibas. Her research focuses on the representation and generation of objects/scenes for user-controllable 3D content creation. She was a research intern at Adobe, Autodesk and now, Google, and is generously supported by Apple AI/ML PhD Fellowship and Snap Research Fellowship.
August 4, 2023
We are proud to highlight that three of Prof. Jason Peng's research papers will be presented in the upcoming SIGGRAPH 2023. These papers mark advances in physics-based character animation. Below are titles and links to the related project pages: Learning Physically Simulated Tennis Skills from Broadcast Videos https://xbpeng.github.io/projects/Vid2Player3D/index.html Synthesizing Physical Character-Scene Interactions https://xbpeng.github.io/projects/InterPhys/index.html CALM: Conditional Adversarial Latent Models for Directable Virtual Characters https://xbpeng.github.io/projects/CALM/index.html Note: The SIGGRAPH conference, short for Special Interest Group on Computer GRAPHics and Interactive Techniques, is the world's premier annual event for showcasing the latest innovations in computer graphics and interactive techniques. It brings together researchers, artists, developers, filmmakers, scientists, and business professionals from around the globe. The conference offers a unique blend of educational sessions, hands-on workshops, and exhibitions of cutting-edge technology and applications.
June 14, 2023
CVPR, the premier conference on computer vision, will be held in Vancouver this year (June 18-22). GrUVi lab will once again have an incredible show at CVPR, with 12 technical papers, 6 invited talks, 4 co-organized workshops! Conference and workshop co-organization Former GrUVi Professor Greg Mori serves as one of the four general conference chairs for the main CVPR conference! Prof. Angel Chang, as one of the social activity chairs, is helping to organize the speed mentoring sessions. In addition, we have exciting workshops and challenges that are organized by GrUVi members as well: Computer Vision in the Built Environment workshop - co-organized by Prof. Yasutaka Furukawa Second Workshop on Structural and Compositional Learning on 3D Data (Struco3D) - co-organized by Prof. Richard Zhang. ScanNet Indoor Scene Understanding Challenge - co-organized by Prof. Angel X. Chang and Prof. Manolis Savva Embodied AI Workshop featuring a variety of challenges including the Multi-Object Navigation (MultiON) challenge (co-organized by Sonia Raychaudhuri, Angel Chang, and Manolis Savva) Workshop talks Prof. Andrea Tagliasacchi is invited to give a keynote talk both in Struco3D and Generative Models for Computer Vision (both on June 18th). He also will give a spotlight at the Area Chair workshop on Saturday. In the Women in Computer Vision, Prof. Angel Chang will be giving a talk on June 19th. She is also invited to give talks at workshops on 3D Vision and Robotics (June 18th), Compositional 3D Vision (June 18th), and Open-Domain Reasoning Under Multi-Modal Settings (June 19th). Technical papers Congratulations to all authors for the accepted papers! The full list of papers featured on CVPR 2023 can be accessed here.