科學研究

                Research

                首頁 >  科學研究
                研究方向

                人工智能基礎理論

                開展人工智能前沿基礎理論研究,包括機器學習、強化學習、深度學習、知識計算、因果推理、信息安全等;關注人工智能交叉學科研究,探索數據驅動的科學研究新范式。

                人工智能開放平臺

                構建人工智能新型大數據、算法和算力等平臺,全面支撐人工智能基礎和應用研究。

                人工智能基礎軟件和基礎硬件系統

                開展人工智能基礎軟硬件系統的研發,構建技術生態的軟硬件基礎,包括新一代人工智能訓練框架、編程語言、編譯器等基礎軟件,人工智能芯片、傳感器等基礎硬件。

                人工智能應用

                探索人工智能技術在城市、交通、醫療、教育、文旅、金融、制造業等行業的應用,關注新領域,開展共性技術平臺的研發。

                人工智能核心技術

                發展新一代人工智能技術,包括計算機視覺、自然語言處理、語音處理、決策智能、智能機器人、城市計算、計算機圖形學、數字孿生等。

                人工智能倫理與政策

                關注人工智能可能引發的經濟、社會、倫理、法律、安全、隱私和數據治理等問題,提出解決方案,提供政策參考。

                學術成果
                發表會議及期刊:arXiv 2023

                OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

                Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases. To facilitate the development of 3D perception, reconstruction, and generation in the real world, we propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popBCorresponding authors. https://omniobject3d.github.io/ ular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations. 2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos. 3) Realistic Scans: The professional scanners support highquality object scans with precise shapes and realistic appearances. With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation. Extensive studies are performed on these four benchmarks, revealing 1 arXiv:2301.07525v2 [cs.CV] 11 Apr 2023 new observations, challenges, and opportunities for future research in realistic 3D vision.

                發表會議及期刊:arXiv 2023

                FengWu: Pushing the Skillful Global Medium-range Weather Forecast Beyond 10 Days Lead

                We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25° latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 m2/s2. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

                發表會議及期刊:Lancet Digit Health 2022

                Spatially aware graph neural networks and cross-level molecular profile prediction in colon cancer histopathology: a retrospective multi-cohort study

                The model was developed on 459 colon tumour whole-slide images from TCGA-COAD, and externally validated on 165 rectum tumour whole-slide images from TCGA-READ and 161 colon tumour whole-slide images from CPTAC-COAD. For TCGA cohorts, our method accurately predicted the molecular classes of the gene mutations (area under the curve [AUCs] from 82·54 [95% CI 77·41–87·14] to 87·08 [83·28–90·82] on TCGA-COAD, and AUCs from 70·46 [61·37–79·61] to 81·80 [72·20–89·70] on TCGA-READ), along with genes with copy number alterations (AUCs from 81·98 [73·34–89·68] to 90·55 [86·02–94·89] on TCGA-COAD, and AUCs from 62·05 [48·94–73·46] to 76·48 [64·78–86·71] on TCGA-READ), microsatellite instability (MSI) status classification (AUC 83·92 [77·41–87·59] on TCGA-COAD, and AUC 61·28 [53·28–67·93] on TCGA-READ), and protein expressions (AUCs from 85·57 [81·16–89·44] to 89·64 [86·29–93·19] on TCGA-COAD, and AUCs from 51·77 [42·53–61·83] to 59·79 [50·79–68·57] on TCGA-READ). For the CPTAC-COAD cohort, our model predicted a panel of gene mutations with AUC values from 63·74 (95% CI 52·92–75·37) to 82·90 (73·69–90·71), genes with copy number alterations with AUC values from 62·39 (51·37–73·76) to 86·08 (79·67–91·74), and MSI status prediction with AUC value of 73·15 (63·21–83·13).

                發表會議及期刊:arXiv

                2023

                FengWu: Pushing the Skillful Global Medium-range Weather Forecast Beyond 10 Days Lead

                We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25° latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 m2/s2. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

                發表會議及期刊:Lancet Digit Health

                2022

                Spatially aware graph neural networks and cross-level molecular profile prediction in colon cancer histopathology: a retrospective multi-cohort study

                The model was developed on 459 colon tumour whole-slide images from TCGA-COAD, and externally validated on 165 rectum tumour whole-slide images from TCGA-READ and 161 colon tumour whole-slide images from CPTAC-COAD. For TCGA cohorts, our method accurately predicted the molecular classes of the gene mutations (area under the curve [AUCs] from 82·54 [95% CI 77·41–87·14] to 87·08 [83·28–90·82] on TCGA-COAD, and AUCs from 70·46 [61·37–79·61] to 81·80 [72·20–89·70] on TCGA-READ), along with genes with copy number alterations (AUCs from 81·98 [73·34–89·68] to 90·55 [86·02–94·89] on TCGA-COAD, and AUCs from 62·05 [48·94–73·46] to 76·48 [64·78–86·71] on TCGA-READ), microsatellite instability (MSI) status classification (AUC 83·92 [77·41–87·59] on TCGA-COAD, and AUC 61·28 [53·28–67·93] on TCGA-READ), and protein expressions (AUCs from 85·57 [81·16–89·44] to 89·64 [86·29–93·19] on TCGA-COAD, and AUCs from 51·77 [42·53–61·83] to 59·79 [50·79–68·57] on TCGA-READ). For the CPTAC-COAD cohort, our model predicted a panel of gene mutations with AUC values from 63·74 (95% CI 52·92–75·37) to 82·90 (73·69–90·71), genes with copy number alterations with AUC values from 62·39 (51·37–73·76) to 86·08 (79·67–91·74), and MSI status prediction with AUC value of 73·15 (63·21–83·13).

                發表會議及期刊:ICLR

                2022

                UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

                It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in this research have been mainly driven by 3D convolutional neural networks and vision transformers. Although 3D convolution can efficiently aggregate local context to suppress local redundancy from a small 3D neighborhood, it lacks the capability to capture global dependency because of the limited receptive field. Alternatively, vision transformers can effectively capture long-range dependency by self-attention mechanism, while having the limitation on reducing local redundancy with blind similarity comparison among all the tokens in each layer. Based on these observations, we propose a novel Unified transFormer (UniFormer) which seamlessly integrates merits of 3D convolution and spatiotemporal self-attention in a concise transformer format, and achieves a preferable balance between computation and accuracy. Different from traditional transformers, our relation aggregator can tackle both spatiotemporal redundancy and dependency, by learning local and global token affinity respectively in shallow and deep layers. We conduct extensive experiments on the popular video benchmarks, e.g., Kinetics-400, Kinetics-600, and Something-Something V1&V2. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10 fewer GFLOPs than other state-of-the-art methods. For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60.9% and 71.2% top-1 accuracy respectively. Code is available at https://github.com/Sense-X/UniFormer.

                comm@pjlab.org.cn

                上海市徐匯區云錦路701號西岸國際人工智能中心37-38層

                滬ICP備2021009351號-1

                        
                        

                              拔萝卜又叫又疼原声视频