データセット
Benchmark Data Sets for Graph Kernels
Benchmark Data Sets for Graph Kernels
IMDB-B | IMDB-M | RDT-B | RDT-M5K | COLLAB | MUTAG | PROTEINS | PTC | NCI1 | |
U2GNN | 77.04 ± 3.45 | 53.60 ± 3.53 | 80.25 | 50.9 | 89.97 | 78.53 | 69.63 ± 3.60 | ||
MEWISPool | 82.13 | 56.23 | 79.66 ± 4.02 | 96.66 ± 1.23 | 80.71 ± 2.31 | ||||
U2GNN-Unsupervised | 96.41 | 89.20 | 84.8 | 77.25 | 95.62 | 81.34 | 80.01 | 84.59 | |
HGP-SL | 84.91 | 78.45 | |||||||
DUGNN+EXTRA TRAINING DATA | 78.7 ± 4.9 | 56.1 ± 2.3 | 84.2 ± 2.7 | 81.7 ± 2.4 | 74.7 ± 6.0 | 85.5 ± 1.2 | |||
sGIN | 77.94±4.31 | 54.52±0.39 | 80.71±1.48 | 94.14±2.74 | 78.97±3.17 | 73.56±4.27 | 83.85±1.05 | ||
GIN(GIN-0) | 75.1 | 52.3 | 92.4 | 57.5 | 80.2 | 89.4 | 76.2 | 64.6 | 82.7 |
GCAPS-CNN | 71.69 ± 3.40 | 48.50 ± 4.10 | 87.61 ± 2.51 | 50.10 ± 1.72 | 77.71 ± 2.51 | 76.40 ± 4.17 | 66.01 ± 5.91 | 82.72 ± 2.38 | |
CapsGNN | 73.10 | 50.27 | 79.62 | 86.67 | 76.28 | 78.35 | |||
LDP(Local Degree Profile) | 75.4 | 50.0 | 92.1 | 55.9 | 78.1 | 90.1 | 72.7 | 61.7 | 73.0 |
RetGK | 71.9 | 47.7 | 92.6 | 56.1 | 81.0 | 90.3 | 75.8 | 84.5 | |
PATCHY-SAN(PSCN) | 71.00 ± 2.29 | 45.23 ± 2.84 | 86.30 ± 1.58 | 49.10 ± 0.70 | 72.60 ± 2.15 | 88.95±4.37 | 75.00 ± 2.51 | 60.00 ± 4.82 | 76.34 ± 1.68 |
Invariant and Equivariant Graph Networks | 71.27±4.5 | 48.55±3.9 | 77.92±1.7 | 84.61±10 | 75.19±4.3 | 59.47±7.3 | 72.48±2.5 |
Pitfalls of Graph Neural Network Evaluation
Fixed splits
Random splits
Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)
Fixed splits
CiteSeer | Cora | PubMed | |
ACMII-Snowball-2 | 82.07 ± 1.04 | 88.95 ± 1.04 | 90.56 ± 0.39 |
ACM-Snowball-3 | 81.32 ± 0.97 | 89.59% ± 1.58 | 91.44 ± 0.59 |
ACMII-GCN | 81.79 ± 0.95 | 89.00 ± 0.72 | 90.74 ± 0.5 |
SSP | 80.52 | 90.16 | 89.36 |
Graph-Bert | 71.2 | 84.3 | 79.3 |
APPNP | 71.8 ± 0.5 | 83.3 ± 0.5 | 80.1 ± 0.2 |
SGC(Simple Graph Convolution) | 71.9 ± 0.1 | 81.0 ± 0.0 | 78.9 ± 0.0 |
Deep Graph Infomax | 71.8 ± 0.7 | 82.3 ± 0.6 | 76.8 ± 0.6 |
SEGCN(Self-Ensembling GCN) | 73.4 ± 0.7 | 83.5 ± 0.4 | 78.9 ± 0.7 |
AGNN | 71.7 ± 0.08 | 83.1 ± 0.08 | 79.9 ± 0.07 |
GraphSGAN | 73.1 ± 1.8 | 83.0 ± 1.3 | |
GAT 実装 | 72.5 ± 0.7 | 83.0 ± 0.7 | 79.0 ± 0.3 |
GWNN | 71.7 | 82.8 | 79.1 |
GCN(Graph Convolutional Network) | 70.3 | 81.5 | 79.0 |
FastGCN |
Random splits
CiteSeer | Cora | PubMed | |
APPNP | 70.0 ± 1.4 | 82.2 ± 1.5 | 79.4 ± 2.2 |
GAT | 72.2 ± 0.9 | 82.6 ± 0.7 | 76.7 +- 0.5 |
SEGCN | 69.0 ± 0.9 | 80.8 ± 1.0 | 78.0 ± 1.4 |
AdaLanczosNet | 68.7 ± 1.0 | 80.4 ± 1.1 | 78.1 ± 0.4 |
GCN | 66.8 ± 0.7 | 79.6 ± 0.6 | 78.3 ± 0.7 |
LanczosNet | 66.2 ± 1.9 | 79.5 ± 1.8 | 78.3 ± 0.3 |
Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)
mIOU | |
ResGCN-28 | 60.0 |
DGCNN | 56.1 |
- LEARNING GENERAL PURPOSE DISTRIBUTED SENTENCE REPRESENTATIONS VIA LARGE SCALE MULTITASK LEARNING
- Universal Sentence Encoder
- Skip-Thought Vectors
F1 scores on the task of paraphrase detection using the SentEval toolkit
InferSent | 83.17 |
ADNet | 81.38 |
Results of abstractive summarizers on the CNN-DM dataset
ROUGE 1 | ROUGE 2 | ROUGE L | |
RankSum | 44.5 | 24.0 | 41.0 |
MatchSum | 44.41 | 20.86 | 40.55 |
PEGASUSLARGE | 44.17 | 21.47 | 41.11 |
BertSumExt (large) | 43.85 | 20.34 | 39.90 |
UNILM | 43.47 | 20.30 | 40.63 |
Pretraining-Based Natural Language Generation for Text Summarization | 41.71 | 19.49 | 38.79 |
Bottom-Up Summarization | 41.22 | 18.68 | 38.34 |
Sentence Rewriting (Chen and Bansal, 2018) | 40.88 | 17.80 | 38.54 |
Pointer-Generator Networks + Coverage Penalty | 39.53 | 17.28 | 36.38 |
seq-to-seq + attn baseline (50k vocab) | 31.33 | 11.81 | 28.83 |
Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs ※学習時間短縮
逐次翻訳:
You May Not Need Attention code
- Phrase-Based & Neural Unsupervised Machine Translation ※教師なし学習(学習ペアなし)
IWSLT 2014 De→En | WMT2014 En→De | WMT2016 En→De | ||
Transformer Cycle | 35.14 | |||
Understanding Back-Translation at Scale | 35.0 | |||
T5-11B | 32.1 | |||
DeepL | 33.3 | |||
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings | 31.2 | |||
MULTI-AGENT DUAL LEARNING | 35.44 | 30.67 | ||
DUAL LEARNING: THEORETICAL STUDY AND ALGORITHMIC EXTENSIONS | 29.97 | |||
DynamicConv 日本語解説 | 35.2 | 29.7(Param 213M) | ||
Transformer + QHAdam | 29.45±0.06 | |||
MULTILINGUAL NEURAL MACHINE TRANSLATION WITH KNOWLEDGE DISTILLATION | 34.02 | |||
Transformer (big) + Relative Position Representations | 29.2 | |||
Transformer+FRANGE | 33.97 | 29.11 | ||
UNIVERSAL TRANSFORMERS arxiv:Universal Transformer | 28.9 | |||
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction | 33.81±0.03 | |||
HYPERBOLIC ATTENTION NETWORKS | 28.52 | |||
RNMT+ | 28.49 ± 0.05 | |||
Transformer-XL github | ||||
Transformer (T2T) | 28.4 |
逐次翻訳:
You May Not Need Attention code
DREAM
SQuAD
Leaderboard
データセット
ProPara
XTREME
Model | Accuracy |
Human Ceiling Performance | 98.6 |
Human Performance | 95.5 |
ALBERT-xxlarge + HRCA+ + Multi-Task Learning | 92.6 |
ALBERT-xxlarge + DUMA + Multi-Task Learning | 91.8 |
SQuAD
Leaderboard
SQuAD1.1 EM | SQuAD1.1 F1 | SQuAD2.0 EM | SQuAD2.0 F1 | |
Retro-Reader (ensemble) | 90.578 | 92.978 | ||
ALBERT + DAAF + Verifier (ensemble) | 90.386 | 92.777 | ||
Retro-Reader on ALBERT (ensemble model) | 90.115 | 92.580 | ||
Retro-Reader on ELECTRA | 89.562 | 92.052 | ||
Megatron-LM 3.9B ensemble | 90.5 | 95.8 | 89.0 | 91.7 |
T5-11B | 90.06 | 95.64 | ||
ALBERT (ensemble model) | 90.1 | 95.5 | 89.731 | 92.215 |
ELECTRA | 88.7 | 91.4 | ||
XLNet + DAAF + Verifier (ensemble) | 88.592 | 90.859 | ||
Retro-Reader on ALBERT (single model) | 88.1 | 91.4 | ||
ALBERT (single model) | 88.107 | 90.902 | ||
XLNet + SG-Net Verifier (ensemble) | 88.174 | 90.702 | ||
XLNet + SG-Net Verifier++ (single model) | 87.238 | 90.071 | ||
BERT + DAE + AoA (ensemble) | 87.147 | 89.474 | ||
RoBERTa (single model) | 87.147 | 89.795 | ||
BERT + ConvLSTM + MTL + Verifier (ensemble) | 86.730 | 89.286 | ||
BERT + N-Gram Masking + Synthetic Self-Training (ensemble) | 86.673 | 89.147 | ||
XLNet (single model) | 89.898 | 95.080 | 86.346 | 89.133 |
SpanBERT (single model) | 88.839 | 94.635 | 85.748 | 88.709 |
BERT + DAE + AoA (single model) | 85.884 | 88.621 | ||
BERT (Ensemble + TriviaQA) | 87.433 | 93.160 | ||
UNILM | 80.5 | 83.4 | ||
BERT (single model) | 85.083 | 91.835 | 80.005 | 83.061 |
EfficientBERT++ 16.0M | 78.3 | 86.5 | 73.0 | 76.1 |
ProPara
XTREME
Avg | Sentence-pair Classification | Structured Prediction | Question Answering | Sentence Retrieval | |
HUAMN | 93.3 | 95.1 | 97.0 | 87.8 | |
Turing ULR v5 | 84.5 | 90.3 | 81.7 | 76.3 | 93.7 |
VECO | 81.4 | 88.9 | 75.6 | 72.9 | 92.7 |
T-ULRv2 + StableTune | 80.7 | 88.8 | 75.4 | 72.9 | 89.3 |
FILTER | 77.0 | 87.5 | 71.9 | 68.5 | 84.4 |
X-STILTs | 73.5 | 83.9 | 69.4 | 67.2 | 76.5 |
XLM-R (large) | 68.2 | 82.8 | 69.0 | 62.3 | 61.6 |
mBERT | 59.6 | 73.7 | 66.3 | 53.8 | 47.7 |
CoNLL-2003 Test F1 | Ontonotes v5 | |
CNN Large + fine-tune | 93.5 | |
GCDT + BERT-L | 93.47 ± 0.03 | |
LSTM-CRF+ELMo+BERT+Flair | 93.38 | |
BERT (Ensemble + TriviaQA) | 92.8 | |
HSCRF + softdict | 92.75 | 89.94 |
20NG | R8 | R52 | Ohsumed | MR | |
SSGC(Simple Spectral Graph Convolution) | 88.6±0.1 | 97.4±0.1 | 94.5±0.2 | 68.5±0.1 | 76.7±0.0 |
NABoE-full | 88.1 | 97.9 | |||
GraphStar | 86.9 ± 0.3 | 97.4 ± 0.2 | 95.0 ± 0.3 | 64.2 ± 0.6 | 76.6 ± 0.4 |
SGC(Simple Graph Convolution) | 88.5 ± 0.1 | 97.2 ± 0.1 | 94.0 ± 0.2 | 68.5 ± 0.3 | 75.9 ± 0.3 |
Text GCN | 86.34 ± 0.09 | 97.07 ± 0.10 | 93.56 ± 0.18 | 68.36 ± 0.56 | 76.74 ± 0.20 |
SWEM (Simple Word-Embedding-based Models) | 85.16 ± 0.29 | 95.32 ± 0.26 | 92.94 ± 0.24 | 63.12 ± 0.55 | 76.65 ± 0.63 |
COCO FID | |
Imagen 2 | |
DALL-E 3 | |
Imagen | 7.27 |
DALL-E 2 | 10.39 |
GLIDE (Nichol et al., 2021) | 12.24 |
Stable Diffusion |
Visual AutoRegressive Modeling: Scalable Image Generation via Next-Scale Prediction
Inception scores
COCO | CUB | Oxford-102 | |
MirrorGAN | 26.47 ± 0.41 | 4.56 ± 0.05 | |
AttnGAN | 25.89 ± .47 | 4.36 ± .03 | |
HDGAN | 11.86±.18 | 4.15±.05 | 3.45±.07 |
CWPGGAN | 4.09 ± 0.03 | 3.86 ± 0.02 | |
Recurrent C4Synth | 4.07 ± .13 | 3.52 ± .15 | |
TAC-GAN | 3.45±.05 | ||
MSGAN | |||
StackGAN-v2(StackGAN++) | 8.30 ± .10 | 3.82 ± .06 | 3.26 ± .01 |
CanvasGAN | |||
FusedGAN | |||
StackGAN | 8.45 ± .03 | 3.70 ± .04 | 3.20±.01 |
声
Mean Opinion Score | |
Ground Truth | 4.274 ± 0.1340 |
WaveGlow sample | 3.961 ± 0.1343 |
WaveNet | 3.885 ± 0.1238 |
Griffin-Lim | 3.823 ± 0.1349 |
- NaturalSpeech 3
- Mega-TTS 2
- StyleTTS 2
- EfficientTTS 2
- Bert-VITS2
- NaturalSpeech
- VITS
- TTS-GAN
- WaveRNN
- ClariNet
- Transformer TTS
- DeepVoice3
- Tacotron
- Wave-Tacotron
Superb leaderboard
WER test-clean | WER test-other | GFLOPs | |
XLS-R | |||
WavLM | |||
BEST-RQ | |||
Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light + Extra Data | 1.4% | 2.6% | |
HuBERT | 1.8% | 2.9% | |
wav2vec 2.0 + Libri-Light + Extra Data | 1.8% | 3.3% | |
wav2vec 2.0 | 4.1% | ||
ContextNet | 1.9% | 4.1% | |
Squeezeformer | 2.47% | 5.97% | 277.9 |
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition | 2.5% | 5.8% | |
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention | 2.7% | 5.7% | |
The CAPIO 2017 Conversational Speech Recognition System | 3.19% | 7.64% | |
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks | 3.80% | 8.76% |
音楽音源分離
音色変換
音声変換(ボイスチェンジャー)
MPII test dataset
Mean | |
W48-s7 | 94.1 |
Cascade Feature Aggregation for Human Pose Estimation | 93.9 |
MSPN | 92.6 |
PASCAL VOC 2012 mAP(%) | VOC 2007 test | COCO test-dev mAP@[0.5:0.95] | ImageNet VID | |
Co-DETR | 66.0(304M) | |||
Stable DINO+ Swin-L | 63.8 (218M) | |||
Soft Teacher + Swin-L(HTC++, multi-scale) | 61.3 | |||
DyHead | 60.6 | |||
Dual-Swin-L | 60.1(453M) | |||
Swin-L(HTC++, multi scale) | 58.7(284M) | |||
CenterNet2(Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) | 56.4 | |||
YOLOv4-P7(CSP-P7, multi-scale) | 55.8(16FPS) | |||
YOLOR-D6 | 55.4(30FPS) | |||
EfficientDet-D7x | 55.1(410B) | |||
DetectoRS(ResNeXt-101-32x4d, multi-scale) | 54.7 | |||
UniverseNet-20.08d(Res2Net-101-v1b) | 54.1 | |||
YOLOv8x | 53.9(68.2M, 257.8B) | |||
EfficientDet-D7 | 53.7(325B) | |||
Cascade Mask R-CNN / Triple-ResNeXt152 | 53.3 | |||
YOLOv8l | 52.9(43.7M, 165.2B) | |||
EfficientDet-D6 | 52.6(226B) | |||
YOLOX-x | 51.5(281.9GFLOPS) | |||
EfficientDet-D5 | 51.5(136B) | |||
AmoebaNet + NAS-FPN + learned augmentation + ↑ anchors, ↑ image size | 50.7(3045B) | |||
YOLOv8m | 50.2(25.9M, 78.9B) | |||
EfficientDet-D4 | 49.7(55B) | |||
YOLOv5x | 49.2(166.4B) | |||
NAS-FPN/AmoebaNet (7 @ 384) +DropBlock | 48.3 | |||
YOLOv5l | 47.7(88.1B) | |||
EfficientDet-D3 | 47.2(24.9B) | |||
CenterNet-HG / Hourglass-104 | 45.1(1.4FPS) | |||
YOLOv8s | 44.9(28.6B) | |||
YOLOv5m | 44.3(39.4B) | |||
YOLOv4 608 / CSPDarknet-53 | 43.5(33FPS) | |||
M2Det / VGG-16 | 44.2 | |||
EfficientDet-D2 | 43.0(11B) | |||
Cascade R-CNN / ResNet-101 | 42.8 | |||
RefineDet512+ / ResNet-101 | 41.8 | |||
M2Det / VGG-16 | 41.0(11.8FPS) | |||
RetinaNet / ResNeXt-101-FPN | 40.8 | |||
EfficientDet-D1 | 40.5(6B) | |||
CenterNet-DLA / DLA-34 | 39.2(28FPS) | |||
RefineDet512+ / VGG-16 | 83.5 | 83.8 | 37.6 | |
R-FCN++ multi-sc train | 80.6 | 82.1 | 37.5 | |
YOLOv5s | 37.0(13.2B) | |||
EfficientDet-D0 | 34.6(2.5B) |
クロスドメイン 顔抽出 テキスト抽出
Cityscapes mIoU (%) | PASCAL VOC 2012 mIoU (%) | CamVid mIoU (%) | |
Hierarchical Multi-Scale Attention for Semantic Segmentation | 85.4 | ||
HRNetV2 + OCR + SegFix | 84.5 | ||
Panoptic-DeepLab + EXTRA TRAINING | 84.2 | ||
OCNet / HRNetV2-W48 | 83.0 | 84.5 | |
Gated-SCNN | 82.8 | ||
DPC | 82.7 | 87.9 | |
DeepLab v3++JFT | 82.1 | 89.0 | |
Auto-DeepLab-L | 82.1 | 85.6 | |
OCNet | 81.7 | ||
DeepLab v3 | 81.3 | 85.7 | |
FC-HarDNet-70 | 76.0(Param 4.1M) | ||
LEDNET | 70.6(Param 0.94M) | ||
CGNet M3N21 | 64.8(Param 0.5M) | 65.6(Param 0.5M) | |
ESPNet | 60.3(Param 0.40M) |
COCO test-dev
ポーズ抽出 認識&QA
Backbone | AP | FPS | |
HTC(multi-scale) | Dual-Swin-L | 52.3 | |
HTC++(multi-scale) | Focal-L | 51.3 | |
Cascade-RCNN | ResNeSt101 | 41.56 | |
CenterMask2 | V2-99 | 41.4 | 14.4 |
BlendMask | R-101 | 41.3 | 9.52 |
Mask R-CNN | ResNeSt101 | 40.65 | |
BCNet | R-101-FPN | 39.8 | |
CenterMask | R-101-FPN | 39.6 | 11.5 |
MS R-CNN | R-101-FPN | 39.6 | 8.6 |
Mask R-CNN | R-101-FPN | 38.4 | 8.6 |
BlendMask-RT | R-101 | 36.8 | 21 |
CenterMask2-Lite | V-39 | 36.7 | 34 |
YOLACT++ | Resnet50-FPN | 34.1 | 33.0 |
YOLACT-700 | R-101-FPN | 31.2 | 23.6 |
YOLACT-550 | R-101-FPN | 29.8 | 33.0 |
ポーズ抽出 認識&QA
CIFAR-10 | CIFAR-100 | CINIC-10 | ImageNet top-1/top-5 | ImageNet-C | ImageNet-P | ImageNetV2 | |
CoAtNet | 9.12%(Param 2.44B) | ||||||
ViT-G/14+Extra Data | 9.55±0.03(Param 1843M) | 16.67±0.03 | |||||
EfficientNet (L2)+Meta Pseudo Labels+Extra Data(300M unlabeled JFT) | 9.8/1.2(Param 480M) | ||||||
EfficientNet (L2)+ SAM+Extra Data | 0.30±0.01 | 3.92±0.06 | 11.39/(Param 480M) | ||||
BEiT | 11.4/1.34(Param 306M) | ||||||
FixEfficientNet (L2)+Extra Data(300M unlabeled images) | 11.5/1.3(Param 480M) | ||||||
EfficientNet-L2 +Noisy Student (L2) + RandAugment +Extra Data(300M unlabeled images) | 11.6/1.3(Param 480M) | 22.2% | 13.6% | ||||
BiT-L(JFT-300M Extra Data) | 0.63 | 6.40±0.18 | 12.2 | ||||
FixEfficientNet-B7+Extra Data | 12.9/1.8(Param 66M) | ||||||
VOLO-D5↑512 | 12.9(Param 296M) | 22.0(Param 296M) | |||||
EfficientNet-B7 + Noisy Student(L2)+RandAugment+Extra Data | 13.1/1.9(Param 66M) | ||||||
BEiT-base(ViT; ImageNet 1K pretrain) | 13.2/1.9(Param 87M) | ||||||
EfficientNetV2-L (21k) | 13.2(Param 120M)/53B | ||||||
FixEfficientNet-B6+Extra Data | 13.3/2.0(Param 43M) | ||||||
FixEfficientNet-B5+Extra Data | 13.6/2.1(Param 30M) | ||||||
EfficientNet-B6 + Noisy Student(L2) + RandAugment + Extra Data | 13.6/2.1(Param 43M) | ||||||
FixResNeXt-101 32×48d | 13.6/2.0(Param 829M) | ||||||
VOLO-D3↑448 | 13.7(Param 86M) | 22.3(Param 86M) | |||||
EfficientNetV2-M (21k) | 13.8(Param 54M)/24B | ||||||
EfficientNet-B5 + Noisy Student(L2) + RandAugment + Extra Data | 13.9/2.2(Param 30M) | ||||||
FixEfficientNet-B4+Extra Data | 14.1/2.3(Param 19M) | ||||||
Fix-EfficientNet-B8+MaxUp+CutMix | 14.20/(Param 87.42M) | ||||||
FixEfficientNet-B8 | 14.30/2.4(Param 87.42M) | ||||||
EfficientNet-B8+AdvProp | 14.5/2.7(Param 88M) | ||||||
ResNeXt-101 32×48d | 14.6/2.4(Param 829M) | ||||||
EfficientNet-B4 + Noisy Student(L2) + RandAugment + Extra Data | 14.7/2.5(Param 19M) | ||||||
FixEfficientNet-B7 | 14.7/2.6(Param 66M) | ||||||
EfficientNetV2-S (21k) | 15.0(Param 24M)/ 8.8B | ||||||
FixEfficientNet-B3+Extra Data | 15.0/2.6(Param 12M) | ||||||
FixEfficientNet-B6 | 15.1/2.7(Param 43M) | ||||||
EfficientNet-B6+AdvProp | 15.2(Param 43M) | ||||||
FixEfficientNet-B5 | 15.3/2.8(Param 30M) | ||||||
EfficientNet-B7 | 1.1 | 8.3 | 15.6/2.9(Param 66M) | ||||
ResNet-RS-50 | 15.6(Param 192M) | ||||||
EfficientNet-B5+AdvProp | 15.7(Param 30M) | ||||||
LambdaResNet200 | 15.7(Param 42M) | ||||||
EfficientNet-B3 + Noisy Student(L2)+ RandAugment +Extra Data | 15.9/3.1(Param 12M) | ||||||
FixEfficientNet-B4 | 16.0/3.0(Param 19M) | ||||||
LambdaResNet152 | 16.0(Param 35M) | ||||||
VAN-Large | 16.1(Param 44.8M) | ||||||
EfficientNetV2-S | 16.1(Param 22M) | ||||||
AmoebaNet-C (6,228)+ARS-Aug | 16.12/3.28 | ||||||
AmoebaNet-C (6,228)+AutoAugment | 16.46/3.52(Param 155.3M) | ||||||
RegNetY-8.0GF | 31.3±0.08(infer 113ms) | ||||||
FixEfficientNet-B3 | 17.0/3.6(Param 12M) | ||||||
EfficientNet-B4+AdvProp | 16.7(Param 19M) | ||||||
VAN-Base | 17.2(Param 26.6M) | ||||||
SENet-154 | 17.28/3.79 | ||||||
FixEfficientNet-B1+Extra Data | 17.4/3.6(Param 7.8M) | ||||||
EfficientNet-B2 + Noisy Student(L2) + RandAugment +Extra Data | 17.6/3.7(Param 9.2M) | ||||||
EfficientNet-B3+AdvProp | 18.1(Param 12M) | ||||||
EfficientNet-B1 + Noisy Student(L2)+ RandAugment + Extra Data | 18.5/4.2(Param 7.8M) | ||||||
FixEfficientNet-B1 | 18.7/4.3(Param 7.8M) | ||||||
Dual-Path-Net-131 | 18.55/4.16(Param 79.5M) | ||||||
RepVGG-B3-200epochs | 19.48(Param 110.96M) | ||||||
EfficientNet-B2+AdvProp | 19.5(Param 9.2M) | ||||||
FixEfficientNet-B0+Extra Data | 19.8/4.6(5.3M) | ||||||
EfficientNet-B2 + AutoAugment | 19.73/5.02(9.2M) | ||||||
Inception-ResNet-v2 + SENet | 19.80/4.79 | ||||||
EfficientNet-B1+AdvProp | 20.4(Param 7.8M) | ||||||
FixEfficientNet-B0 | 20.7/5.4(Param 5.3M) | ||||||
MixNet-L | 21.1/5.8(Param 7.3M) | ||||||
EfficientNet-B0 + Noisy Student(L2)+ RandAugment + Extra Data | 21.2/5.5(Param 5.3M) | ||||||
EffNetV2-B0 | 21.3(Param 7.1M) | ||||||
SE-Res2Net-50 | 21.56/5.94(Param 25M) | ||||||
MobileViT-S | 21.6(Param 5.6M) | ||||||
PyramidNet + ShakeDrop regularization + ARS-Aug | 1.26(Param 26.0M) | 10.24 (Param 26.0M) | |||||
PyramidNet + ShakeDrop regularization + AutoAugment | 1.5±0.1(Param 26.0M) | 10.7 ± 0.2 (Param 26.0M) | |||||
PyramidNet + ShakeDrop regularization + Population Based Augmentation (PBA) | 1.46 ± 0.077 | 10.94 ± 0.094 | |||||
WRN-SRS | 4.06(Param 36.5M) | 10.10(Param 106.4M) | |||||
EfficientNet-B0+AdvProp | 22.4(Param 5.3M) | ||||||
EfficientNet-B0 + AutoAugment | 22.7/6.5(Param 5.3M) | ||||||
ShuffleNetV2+ Large | 22.9/6.7(Param 6.7M) | ||||||
MixNet-M | 23.0/6.7(Param 5.0M) | ||||||
MobileNetV3 | 23.4/(Param 7.5M) or 26.7(Param 4M) | ||||||
PyramidNet-200 (α˜=240)+ CutMix + ShakeDrop | 13.81(Param 26.8 M) | ||||||
MnasNet-92 (+SE) | 23.87/7.15(Param 5.1M) | ||||||
ShuffleNetV2+ Medium | 24.3/7.4(Param 5.6M) | ||||||
VAN-Tiny | 24.6(Param 4.1M) | ||||||
SharpSepConvDARTS | 1.98±0.07(Param 3.6M) | 25.1/7.8(Param 4.9M) | |||||
MobileViT-XS | 25.2(Param 2.3M) | ||||||
NAONet + Cutout | 2.11(Param 128M) | 14.36(Param 128M) | |||||
MobileNetV2 | 25.3/7.5(Param 6.9M) or 28.0/9.0(Param 3.4M) | ||||||
PyramidNet+ ShakeDrop regularization + Cutout | 2.31(Param 26.0M) | 12.19 (Param 26.0M) | |||||
NASNet-A (7 @ 2304) + cutout+AdaNet | 2.30(Param 26.4M) | ||||||
NASNet-A (7 @ 2304) + cutout | 2.40(Param 27.6M) | ||||||
BlockQNN-Connection more filters | 2.35(Param 33.3M) | 14.83(Param 33.3M) | |||||
NAONet + Cutout | 15.67(Param 10.8M) | ||||||
AmoebaNet-B (N=6, F=128) + cutout | 2.13±0.04(Param 34.9M) | 15.80(Param 34.9M) | |||||
WideResNet-22 + AgrLearn | 2.45 | ||||||
Shake-Shake + Cutout | 2.56± 0.07 | 15.20± 0.21(Param 34.4M) | |||||
AlphaX+Cutout 実装 | 2.82(Param 5.1M) | 24.5/7.8(Param 7.2M) | |||||
ProxylessNAS | 2.08(Param 5.7M) | 25.4/7.8 | |||||
GDAS(C=36,N=6)+CutOut | 2.82(Param 2.5M) | 18.13(Param 2.5M) | |||||
ASNG-NAS | 2.83±0.14(Param 3.9M) | ||||||
DARTS + Cutout | 2.83±0.06(Param 3.4M) | 26.9/9.0(Param 4.9M) | |||||
ENAS + micro search space + CutOut | 2.89(Param 4.6M) | ||||||
SNAS + cutout | 2.98(Param 2.89M) | 27.3/9.2(Param 4.3M) | |||||
AmoebaNet-A(6, 36) | 3.34±0.06(Param 3.2M) |
- Attention Augmented Convolutional Networks
- GPipe
- Label Smoothing
- Knowledge Distillation 日本語解説
- DropBlock 日本語解説
- CutOut・Random Erasing Data Augmentation
- AutoAugment 日本語解説
- Fast AutoAugment
- Population Based Augmentation (PBA) AutoAugmentより1000倍効率的
- RandAugment PBAより更に効率が良い
- MentorMix
- Manifold Mixup
Top-1 Err(%) | Top-5 Err(%) | |
Baseline: PyramidNet-200(α˜=240)(# params:26.8M) | 16.45 | 3.69 |
+ Mixup(α=1.0) | 15.63 | 3.99 |
+ DropBlock + Label smoothing (ε=0.1) | 15.16 | 3.86 |
+ Cutout + Manifold Mixup (α=1.0) | 15.09 | 3.35 |
+ ShakeDrop | 15.08 | 2.79 |
+ CutMix | 14.47 | 2.97 |
+ CutMix + ShakeDrop | 13.81 | 2.29 |
Error rate (%) for CIFAR10
Methods/Labels | 40 | 250 | 500 | 1000 | 2000 | 4000 |
PiModel | 53.02±2.05 | 41.82±1.52 | 31.53±0.98 | 23.07±0.66 | 17.41±0.37 | |
PseudoLabel | 49.98±1.17 | 40.55±1.70 | 30.91±1.73 | 21.96±0.42 | 16.21±0.11 | |
Mixup | 47.43±0.92 | 36.17±1.36 | 25.72±0.66 | 18.14±1.06 | 13.15±0.20 | |
VAT | 36.03±2.82 | 26.11±1.52 | 18.68±0.40 | 14.40±0.15 | 11.05±0.31 | |
MeanTeacher | 47.32±4.71 | 42.01±5.86 | 17.32±4.00 | 12.17±0.22 | 10.36±0.25 | |
MixMatch | 47.54±11.50 | 11.08±0.87 | 9.65±0.94 | 7.75±0.32 | 7.03±0.15 | 6.24±0.06 |
iGPT-L | 26.8±1.5 | 12.4±0.6 | 5.7±0.1 | |||
UDA | 29.0 ± 5.9 | 8.8 ± 1.1 | 4.9 ± 0.2 | |||
ReMixMatch | 19.10±9.64 | 6.27±0.34 | 5.73±0.16 | 5.14±0.04 | ||
FixMatch (CTA) | 11.39±3.35 | 5.07±0.33 | 4.31±0.15 | |||
FixMatch (RandAugment) | 6.4 | 4.69 | 4.23 | |||
Meta Pseudo Labels | 3.89 ± 0.07 |
プログラミング言語
学習時ペア画像必要なし
学習時ペア画像必要あり
学習時ペア画像必要なし
Car2Car: root median residual deviation from linear alignment (lower is better).
TQM: Translation quality measured by translated digit classification accuracy (%)
学習時ペア画像必要あり
学習時ペア画像必要なし
- U-GAT-IT
- Sem-GAN
- StarGAN ※ドメイン複数
- GANimation
- ELEGANT ※ドメイン複数
- 正のセット、負のセットを用意する必要あり、つまり「笑顔のドメイン」←→「笑顔でないドメイン」といったように
- pix2pix-starGAN ※ドメイン複数
- ALICE
Car2Car: root median residual deviation from linear alignment (lower is better).
TQM: Translation quality measured by translated digit classification accuracy (%)
Car2Car | TQM:SVHN→MNIST | TQM:MNIST→SVHN | |
CrossNet | |||
NAM | 1.47 | 33.3 | 31.9 |
CycleGAN | 26.8 | 17.7 | |
DiscoGAN | 13.81 | ||
SPA-GAN | |||
AGGAN |
- SR3
- Neural Differential Equations for Single Image Super-Resolution
- KernelGAN + ZSSR
- EPSR PIRM2018 Region1 で1位を獲得
- EUSR-PCL PIRM2018 Region1 で2位を獲得
- 4PP-EUSR
- ESRGAN PIRM2018 Region1 で3位を獲得
- IDN
- Super-FAN
- SRGAN
- Deep Image Prior
- "Zero-Shot" Super-Resolution using Deep Internal Learning
ノイズなし画像 | ノイズあり画像ペア | |
NAC | 必要なし | 必要なし |
Deep Image Prior | 必要なし | 必要なし |
Noise2Void code | 必要なし | 必要なし |
Noise2Noise 実装 日本語解説 | 必要なし | 必要あり |
Path-Restore | 必要あり | 必要なし |
Unprocessing Images for Learned Raw Denoising | 必要あり | 必要なし |
GAN2GAN | 必要あり | 必要なし |
KITTI Eigen split
Abs Rel | Sq Rel | RMSE | RMSE log | δ < 1.25 | δ < 1.25^2 | δ < 1.25^3 | |
LightedDepth NewCRFs | 0.028 | 0.077 | 1.567 | 0.049 | 0.991 | 0.999 | 1.000 |
BTS+ pre-trained on Cityscapes dataset | 0.056 | 0.169 | 1.925 | 0.087 | 0.964 | 0.994 | 0.999 |
AdaBins | 0.058 | 0.190 | 2.360 | 0.088 | 0.964 | 0.995 | 0.999 |
BTS | 0.059 | 0.241 | 2.756 | 0.096 | 0.956 | 0.993 | 0.998 |
struct2depth(Motion) | 0.1087 | 0.8250 | 4.7503 | 0.1866 | 0.8738 | 0.9577 | 0.9825 |
struct2depth | 0.1231 | 1.4367 | 5.3099 | 0.2043 | 0.8705 | 0.9514 | 0.9765 |
Unsupervised Monocular Depth Estimation with Left-Right Consistency | 0.133 | 1.158 | 5.370 | 0.208 | 0.841 | 0.949 | 0.978 |
KITTI 2015 stereo D1-all | |
CSPF github | 1.74% |
Dedge-AGMNet | 1.85% |
EdgeStereo | 2.08% |
PSMNet | 2.32% |
iResNet | 2.44% |
単一画像→3次元情報(検知)
KITTI BEV(birds-eye-view) KITTI 3D(3D bounding box)
BEV Easy | BEV Moderate | BEV Hard | 3D Easy | 3D Moderate | 3D Hard | |
OFTNet/RGB | 7.16 | 5.69 | 4.61 | 1.61 | 1.32 | 1.00 |
MonoDIS/RGB | 17.23 | 13.19 | 11.12 | 10.37 | 7.94 | 6.40 |
SMOKE/RGB | 20.83 | 14.49 | 12.75 | 14.03 | 9.76 | 7.84 |
MoVi-3D/RGB | 22.76 | 17.03 | 14.85 | 15.19 | 10.90 | 9.26 |
PatchNet/Depth | 15.68 | 11.12 | 10.17 | |||
D4LCN/RGB+Depth | 22.51 | 16.02 | 12.55 | 16.65 | 11.72 | 9.51 |
AM3D/RGB+Depth | 25.03 | 17.32 | 14.91 | 16.50 | 10.74 | 9.52 |
GrooMeD-NMS | 26.19 | 18.27 | 14.05 | 18.10 | 12.32 | 9.65 |
kinematic3d | 26.69 | 17.52 | 13.10 | 19.07 | 12.72 | 9.17 |
PatchNet+3D Confidence/Depth | 23.66 | 13.25 | 11.23 |
BEV Easy | BEV Moderate | BEV Hard | 3D Easy | 3D Moderate | 3D Hard | |
Monocular Quasi-Dense 3D Object Tracking | 41.71 | 33.73 | 31.05 | 36.74 | 29.30 | 26.67 |
BEV Easy | BEV Moderate | BEV Hard | 3D Easy | 3D Moderate | 3D Hard | |
DSGN | 82.90 | 65.05 | 56.60 | 73.50 | 52.18 | 82.90 |
CG-Stereo | 85.29 | 66.44 | 58.95 | 74.39 | 53.58 | 46.50 |
SVBRDF 顔画像→3D
BからAのドメインに変換するタスクの平均失敗率(Bと認識される確率)
A Style-Based Generator Architecture for Generative Adversarial Networks
Glasses | Smile | Facial Hair | |
Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer 実装 | 1.1% | 5.2 | 11.9% |
Fader networks | 6.6% | 6.4% | 18.2% |
- MUNIT(Multimodal Unsupervised Image-to-Image Translation)
- Learning Linear Transformations for Fast Arbitrary Style Transfer
- FastPhotoStyle
- Deep Photo Style Transfer
- Arbitrary Style Transfer
- Deformable GANs for Pose-based Human Image Generation 写真+指定ポーズ画像→指定のポーズをした画像
A Style-Based Generator Architecture for Generative Adversarial Networks
テクニック
ノイズから画像生成
※ the gan zoo
不完全なデータから学習
テキスト生成
BLEU-2 score 1000sentence
- Tempered Adversarial Networks GANの学習の際に学習データをそのままつかわず、ぼかすレンズのような役割のネットワークを通すことで、Progressive GANと似たような効果を得る手法。レンズのぼやけは敵対的損失と復元損失を最小化するよう学習し、敵対的損失だけ徐々に減らしてく。
- Discriminator Rejection Sampling 密度比を用いて生成器の出力から採用/棄却を決める
- Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
- ExtraAdam
- Self-modulation
- D-Optimal Regularizer
- MH-GAN 実装
- Adversarial Feedback Loop
ノイズから画像生成
Method | CIFAR-10 FID | CIFAR-10 IS | CelebA 64x64 FID | STL-10 FID | STL-10 IS | LSUN-bedrom 256 x 256 FID | LSUN-bedrom IS | FFHQ | ImageNet |
real | 7.8 | 11.24 | |||||||
StyleGAN-XL | 12.24 | ||||||||
StyleGAN3 | |||||||||
Projected GAN | |||||||||
SWAGAN | |||||||||
Anycost GAN | |||||||||
InsGen | |||||||||
StyleGAN2 | 2.84 ± 0.03 | ||||||||
COCO-GAN | 6.95 | ||||||||
NCSN | 25.32 | 8.91 | |||||||
FastGAN | 12.97 | 7.76 ± .12 | |||||||
PGGAN | 8.80 | 8.04 | |||||||
AutoGAN-top1 | 12.42 | 8.55 ± .10 | 31.01 | 9.16 ± .12 | |||||
Sphere GAN-ResNet | 17.1 | 8.39 ± .08 | |||||||
MMD-rep-b 実装 | 16.21 | 8.29 | 6.79 | 37.63 | 9.34 | ||||
SN-GAN | 21.7 | 8.22 | 40.1±.50 | 9.10±.04 | |||||
VGAN-GP 実装 | 18.1 | ||||||||
WGAN-CT | 8.12±.12 | ||||||||
MoLM-1536 | 18.9 | 7.90 | |||||||
WGAN-GP, ResNet | 19.9 | 7.86 ± .07 | |||||||
DCGAN | 37.11 | 6.40 |
- COCO-GAN 一枚の画像を複数回に分けて処理して生成することで,巨大なメモリなどを必要としない
- GAN(Generative Adverasarial Networks)
- WGAN(Wasserstein GAN)
- Cramer GAN
- MMD GAN
- DRGAN
- VAEGAN(Variational AutoEncoder GAN)
※ the gan zoo
不完全なデータから学習
- MisGAN
- Ambient-GAN ※ノイズや欠損があるデータからも学習可能
テキスト生成
BLEU-2 score 1000sentence
Taobao | Amazon | PTB | |
VGAN | 0.969 | 0.868 | 0.695 |
SeqGAN | 0.968 | 0.856 | 0.681 |
指標
テクニック
クラス適用
ImageNet128×128
スタイル適用
クラス適用
ImageNet128×128
Inception Score | FID | IS/FID | |
LOGAN | 148.2 ± 3.1 | 3.36 ± 0.14 | 43.53 |
VQ-VAE-2 | |||
BigGAN-deep | 166.5 | 7.4 | 22.5 |
BigGAN 日本語解説 生成サンプル | 98.8 ± 2.8 | 8.7 ± .6 | 11.35 |
Improved SAGAN with DRS | 76.08 ± 0.30 | 13.57 ± 0.13 | 5.61 |
Self-Attention Generative Adversarial Network(SAGAN) 日本語解説 | 52.52 | 18.65 | 2.82 |
SN-GAN-projection | 36.80 | 27.62 | 1.33 |
AC-GAN | 28.5 | ||
IFcVAE-GAN | |||
GAMO2pix | |||
infoGAN 実装 | |||
RoC-GAN(Robust Conditional GAN) | |||
CGAN(Conditional GAN) |
- BiGAN(Bidirectional GAN) & ALI ※潜在変数を識別器は見る
スタイル適用
コメントをかく