人工ニューラルネットワーク

応用

グラフ+グラフ→対応関係

GMNwVGG

技術:

グラフ→ 分散表現(ベクトル)

技術

グラフ→テキスト

Graph2seq

グラフ→分類

データセット
Benchmark Data Sets for Graph Kernels

KCNN
- Github

	IMDB-B	IMDB-M	RDT-B	RDT-M5K	COLLAB	MUTAG	PROTEINS	PTC	NCI1
U2GNN	77.04 ± 3.45	53.60 ± 3.53	80.25	50.9		89.97	78.53	69.63 ± 3.60
MEWISPool	82.13	56.23			79.66 ± 4.02	96.66 ± 1.23	80.71 ± 2.31
U2GNN-Unsupervised	96.41	89.20	84.8	77.25	95.62	81.34	80.01	84.59
HGP-SL							84.91		78.45
DUGNN+EXTRA TRAINING DATA	78.7 ± 4.9	56.1 ± 2.3			84.2 ± 2.7		81.7 ± 2.4	74.7 ± 6.0	85.5 ± 1.2
sGIN	77.94±4.31	54.52±0.39			80.71±1.48	94.14±2.74	78.97±3.17	73.56±4.27	83.85±1.05
GIN(GIN-0)	75.1	52.3	92.4	57.5	80.2	89.4	76.2	64.6	82.7
GCAPS-CNN	71.69 ± 3.40	48.50 ± 4.10	87.61 ± 2.51	50.10 ± 1.72	77.71 ± 2.51		76.40 ± 4.17	66.01 ± 5.91	82.72 ± 2.38
CapsGNN	73.10	50.27			79.62	86.67	76.28		78.35
LDP(Local Degree Profile)	75.4	50.0	92.1	55.9	78.1	90.1	72.7	61.7	73.0
RetGK	71.9	47.7	92.6	56.1	81.0	90.3	75.8		84.5
PATCHY-SAN(PSCN)	71.00 ± 2.29	45.23 ± 2.84	86.30 ± 1.58	49.10 ± 0.70	72.60 ± 2.15	88.95±4.37	75.00 ± 2.51	60.00 ± 4.82	76.34 ± 1.68
Invariant and Equivariant Graph Networks	71.27±4.5	48.55±3.9			77.92±1.7	84.61±10	75.19±4.3	59.47±7.3	72.48±2.5

グラフ→ノード分類

Pitfalls of Graph Neural Network Evaluation
Fixed splits

	CiteSeer	Cora	PubMed
ACMII-Snowball-2	82.07 ± 1.04	88.95 ± 1.04	90.56 ± 0.39
ACM-Snowball-3	81.32 ± 0.97	89.59% ± 1.58	91.44 ± 0.59
ACMII-GCN	81.79 ± 0.95	89.00 ± 0.72	90.74 ± 0.5
SSP	80.52	90.16	89.36
Graph-Bert	71.2	84.3	79.3
APPNP	71.8 ± 0.5	83.3 ± 0.5	80.1 ± 0.2
SGC(Simple Graph Convolution)	71.9 ± 0.1	81.0 ± 0.0	78.9 ± 0.0
Deep Graph Infomax	71.8 ± 0.7	82.3 ± 0.6	76.8 ± 0.6
SEGCN(Self-Ensembling GCN)	73.4 ± 0.7	83.5 ± 0.4	78.9 ± 0.7
AGNN	71.7 ± 0.08	83.1 ± 0.08	79.9 ± 0.07
GraphSGAN	73.1 ± 1.8	83.0 ± 1.3
GAT 実装	72.5 ± 0.7	83.0 ± 0.7	79.0 ± 0.3
GWNN	71.7	82.8	79.1
GCN(Graph Convolutional Network)	70.3	81.5	79.0
FastGCN

Random splits

	CiteSeer	Cora	PubMed
APPNP	70.0 ± 1.4	82.2 ± 1.5	79.4 ± 2.2
GAT	72.2 ± 0.9	82.6 ± 0.7	76.7 +- 0.5
SEGCN	69.0 ± 0.9	80.8 ± 1.0	78.0 ± 1.4
AdaLanczosNet	68.7 ± 1.0	80.4 ± 1.1	78.1 ± 0.4
GCN	66.8 ± 0.7	79.6 ± 0.6	78.3 ± 0.7
LanczosNet	66.2 ± 1.9	79.5 ± 1.8	78.3 ± 0.3

Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

	mIOU
ResGCN-28	60.0
DGCNN	56.1

グラフ→グラフ

VJTNN+GAN
GT-GAN(Graph-Translation-Generative Adversarial Networks)

木構造→木構造

Tree-to-tree Neural Networks for Program Translation

単語→分散表現

テキスト→分散表現

F1 scores on the task of paraphrase detection using the SentEval toolkit

InferSent	83.17
ADNet	81.38

🖹テキスト→🖹テキスト

要約

Results of abstractive summarizers on the CNN-DM dataset

	ROUGE 1	ROUGE 2	ROUGE L
RankSum	44.5	24.0	41.0
MatchSum	44.41	20.86	40.55
PEGASUSLARGE	44.17	21.47	41.11
BertSumExt (large)	43.85	20.34	39.90
UNILM	43.47	20.30	40.63
Pretraining-Based Natural Language Generation for Text Summarization	41.71	19.49	38.79
Bottom-Up Summarization	41.22	18.68	38.34
Sentence Rewriting (Chen and Bansal, 2018)	40.88	17.80	38.54
Pointer-Generator Networks + Coverage Penalty	39.53	17.28	36.38
seq-to-seq + attn baseline (50k vocab)	31.33	11.81	28.83

翻訳

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs ※学習時間短縮

Phrase-Based & Neural Unsupervised Machine Translation　※教師なし学習(学習ペアなし)
- https://github.com/facebookresearch/UnsupervisedMT

	IWSLT 2014 De→En	WMT2014 En→De	WMT2016 En→De
Transformer Cycle		35.14
Understanding Back-Translation at Scale		35.0
T5-11B		32.1
DeepL		33.3
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings		31.2
MULTI-AGENT DUAL LEARNING	35.44	30.67
DUAL LEARNING: THEORETICAL STUDY AND ALGORITHMIC EXTENSIONS		29.97
DynamicConv 日本語解説	35.2	29.7(Param 213M)
Transformer + QHAdam			29.45±0.06
MULTILINGUAL NEURAL MACHINE TRANSLATION WITH KNOWLEDGE DISTILLATION	34.02
Transformer (big) + Relative Position Representations		29.2
Transformer+FRANGE	33.97	29.11
UNIVERSAL TRANSFORMERS arxiv:Universal Transformer		28.9
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction	33.81±0.03
HYPERBOLIC ATTENTION NETWORKS		28.52
RNMT+		28.49 ± 0.05
Transformer-XL github
Transformer (T2T)		28.4

逐次翻訳:
You May Not Need Attention code

Question Answering

DREAM

Model	Accuracy
Human Ceiling Performance	98.6
Human Performance	95.5
ALBERT-xxlarge + HRCA+ + Multi-Task Learning	92.6
ALBERT-xxlarge + DUMA + Multi-Task Learning	91.8

SQuAD
Leaderboard

	SQuAD1.1 EM	SQuAD1.1 F1	SQuAD2.0 EM	SQuAD2.0 F1
Retro-Reader (ensemble)			90.578	92.978
ALBERT + DAAF + Verifier (ensemble)			90.386	92.777
Retro-Reader on ALBERT (ensemble model)			90.115	92.580
Retro-Reader on ELECTRA			89.562	92.052
Megatron-LM 3.9B ensemble	90.5	95.8	89.0	91.7
T5-11B	90.06	95.64
ALBERT (ensemble model)	90.1	95.5	89.731	92.215
ELECTRA			88.7	91.4
XLNet + DAAF + Verifier (ensemble)			88.592	90.859
Retro-Reader on ALBERT (single model)			88.1	91.4
ALBERT (single model)			88.107	90.902
XLNet + SG-Net Verifier (ensemble)			88.174	90.702
XLNet + SG-Net Verifier++ (single model)			87.238	90.071
BERT + DAE + AoA (ensemble)			87.147	89.474
RoBERTa (single model)			87.147	89.795
BERT + ConvLSTM + MTL + Verifier (ensemble)			86.730	89.286
BERT + N-Gram Masking + Synthetic Self-Training (ensemble)			86.673	89.147
XLNet (single model)	89.898	95.080	86.346	89.133
SpanBERT (single model)	88.839	94.635	85.748	88.709
BERT + DAE + AoA (single model)			85.884	88.621
BERT (Ensemble + TriviaQA)	87.433	93.160
UNILM			80.5	83.4
BERT (single model)	85.083	91.835	80.005	83.061
EfficientBERT++ 16.0M	78.3	86.5	73.0	76.1

データセット
ProPara

XTREME

	Avg	Sentence-pair Classification	Structured Prediction	Question Answering	Sentence Retrieval
HUAMN	93.3	95.1	97.0	87.8
Turing ULR v5	84.5	90.3	81.7	76.3	93.7
VECO	81.4	88.9	75.6	72.9	92.7
T-ULRv2 + StableTune	80.7	88.8	75.4	72.9	89.3
FILTER	77.0	87.5	71.9	68.5	84.4
X-STILTs	73.5	83.9	69.4	67.2	76.5
XLM-R (large)	68.2	82.8	69.0	62.3	61.6
mBERT	59.6	73.7	66.3	53.8	47.7

固有表現抽出(Named Entity Recognition)

	CoNLL-2003 Test F1	Ontonotes v5
CNN Large + fine-tune	93.5
GCDT + BERT-L	93.47 ± 0.03
LSTM-CRF+ELMo+BERT+Flair	93.38
BERT (Ensemble + TriviaQA)	92.8
HSCRF + softdict	92.75	89.94

🖹テキスト→分類

	20NG	R8	R52	Ohsumed	MR
SSGC(Simple Spectral Graph Convolution)	88.6±0.1	97.4±0.1	94.5±0.2	68.5±0.1	76.7±0.0
NABoE-full	88.1	97.9
GraphStar	86.9 ± 0.3	97.4 ± 0.2	95.0 ± 0.3	64.2 ± 0.6	76.6 ± 0.4
SGC(Simple Graph Convolution)	88.5 ± 0.1	97.2 ± 0.1	94.0 ± 0.2	68.5 ± 0.3	75.9 ± 0.3
Text GCN	86.34 ± 0.09	97.07 ± 0.10	93.56 ± 0.18	68.36 ± 0.56	76.74 ± 0.20
SWEM (Simple Word-Embedding-based Models)	85.16 ± 0.29	95.32 ± 0.26	92.94 ± 0.24	63.12 ± 0.55	76.65 ± 0.63

Learning Structured Text Representations

🖹テキスト→🖼️画像

	COCO FID
Imagen 2
DALL-E 3
Imagen	7.27
DALL-E 2	10.39
GLIDE (Nichol et al., 2021)	12.24
Stable Diffusion

Visual AutoRegressive Modeling: Scalable Image Generation via Next-Scale Prediction

Inception scores

	COCO	CUB	Oxford-102
MirrorGAN	26.47 ± 0.41	4.56 ± 0.05
AttnGAN	25.89 ± .47	4.36 ± .03
HDGAN	11.86±.18	4.15±.05	3.45±.07
CWPGGAN		4.09 ± 0.03	3.86 ± 0.02
Recurrent C4Synth		4.07 ± .13	3.52 ± .15
TAC-GAN			3.45±.05
MSGAN
StackGAN-v2(StackGAN++)	8.30 ± .10	3.82 ± .06	3.26 ± .01
CanvasGAN
FusedGAN
StackGAN	8.45 ± .03	3.70 ± .04	3.20±.01

参考

テキストから画像を生成するGANまとめ

🖹テキスト+🎥参照動画→🎥動画

Imagine This! Scripts to Compositions to Videos

🖹テキスト→💬音声

声

	Mean Opinion Score
Ground Truth	4.274 ± 0.1340
WaveGlow sample	3.961 ± 0.1343
WaveNet	3.885 ± 0.1238
Griffin-Lim	3.823 ± 0.1349

GANベース

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

※TTSについて

nnmnkwii

💬音声→🖹テキスト

Superb leaderboard

	WER test-clean	WER test-other	GFLOPs
XLS-R
WavLM
BEST-RQ
Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light + Extra Data	1.4%	2.6%
HuBERT	1.8%	2.9%
wav2vec 2.0 + Libri-Light + Extra Data	1.8%	3.3%
wav2vec 2.0		4.1%
ContextNet	1.9%	4.1%
Squeezeformer	2.47%	5.97%	277.9
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition	2.5%	5.8%
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention	2.7%	5.7%
The CAPIO 2017 Conversational Speech Recognition System	3.19%	7.64%
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks	3.80%	8.76%

テキスト→3D

https://openreview.net/pdf?id=rylNH20qFQ

💬音声→💬音声

Translatotron

音楽音源分離

音色変換

音声変換（ボイスチェンジャー）

音声→分類

	Google Speech Commands V2 35
Audiomer-L	99.74
AST+Extra Training Data	98.11
TRANSFER LEARNING FOR MUSIC CLASSIFICATION AND REGRESSION TASKS
Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features

音声→画像

Direct Speech-to-image Translation

🖼️画像→品質スコア:Blind image quality assessment (BIQA) / No-Reference Image Quality Assessment(NR-IQA)

画像→グラフ

Graph R-CNN

画像→ポーズ

MPII test dataset

	Mean
W48-s7	94.1
Cascade Feature Aggregation for Human Pose Estimation	93.9
MSPN	92.6

🖼️画像→物体認識情報

指標

オブジェクト検出(ボックス) ※Titan X を使用時のFPS

	PASCAL VOC 2012 mAP(%)	VOC 2007 test	COCO test-dev mAP@[0.5:0.95]	ImageNet VID
Co-DETR			66.0(304M)
Stable DINO+ Swin-L			63.8 (218M)
Soft Teacher + Swin-L(HTC++, multi-scale)			61.3
DyHead			60.6
Dual-Swin-L			60.1(453M)
Swin-L(HTC++, multi scale)			58.7(284M)
CenterNet2(Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)			56.4
YOLOv4-P7(CSP-P7, multi-scale)			55.8(16FPS)
YOLOR-D6			55.4(30FPS)
EfficientDet-D7x			55.1(410B)
DetectoRS(ResNeXt-101-32x4d, multi-scale)			54.7
UniverseNet-20.08d(Res2Net-101-v1b)			54.1
YOLOv8x			53.9(68.2M, 257.8B)
EfficientDet-D7			53.7(325B)
Cascade Mask R-CNN / Triple-ResNeXt152			53.3
YOLOv8l			52.9(43.7M, 165.2B)
EfficientDet-D6			52.6(226B)
YOLOX-x			51.5(281.9GFLOPS)
EfficientDet-D5			51.5(136B)
AmoebaNet + NAS-FPN + learned augmentation + ↑ anchors, ↑ image size			50.7(3045B)
YOLOv8m			50.2(25.9M, 78.9B)
EfficientDet-D4			49.7(55B)
YOLOv5x			49.2(166.4B)
NAS-FPN/AmoebaNet (7 @ 384) +DropBlock			48.3
YOLOv5l			47.7(88.1B)
EfficientDet-D3			47.2(24.9B)
CenterNet-HG / Hourglass-104			45.1(1.4FPS)
YOLOv8s			44.9(28.6B)
YOLOv5m			44.3(39.4B)
YOLOv4 608 / CSPDarknet-53			43.5(33FPS)
M2Det / VGG-16			44.2
EfficientDet-D2			43.0(11B)
Cascade R-CNN / ResNet-101			42.8
RefineDet512+ / ResNet-101			41.8
M2Det / VGG-16			41.0(11.8FPS)
RetinaNet / ResNeXt-101-FPN			40.8
EfficientDet-D1			40.5(6B)
CenterNet-DLA / DLA-34			39.2(28FPS)
RefineDet512+ / VGG-16	83.5	83.8	37.6
R-FCN++ multi-sc train	80.6	82.1	37.5
YOLOv5s			37.0(13.2B)
EfficientDet-D0			34.6(2.5B)

※FAIR's research platform
クロスドメイン

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

顔抽出

ArcFace

テキスト抽出

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

セグメンテーション

Segment Anything

	Cityscapes mIoU (%)	PASCAL VOC 2012 mIoU (%)	CamVid mIoU (%)
Hierarchical Multi-Scale Attention for Semantic Segmentation	85.4
HRNetV2 + OCR + SegFix	84.5
Panoptic-DeepLab + EXTRA TRAINING	84.2
OCNet / HRNetV2-W48	83.0	84.5
Gated-SCNN	82.8
DPC	82.7	87.9
DeepLab v3++JFT	82.1	89.0
Auto-DeepLab-L	82.1	85.6
OCNet	81.7
DeepLab v3	81.3	85.7
FC-HarDNet-70	76.0(Param 4.1M)
LEDNET	70.6(Param 0.94M)
CGNet M3N21	64.8(Param 0.5M)		65.6(Param 0.5M)
ESPNet	60.3(Param 0.40M)

Semantic Soft Segmentation

※ディープラーニングにおけるセマンティックセグメンテーションのガイド2017年版

インスタンスセグメンテーション

COCO test-dev

	Backbone	AP	FPS
HTC(multi-scale)	Dual-Swin-L	52.3
HTC++(multi-scale)	Focal-L	51.3
Cascade-RCNN	ResNeSt101	41.56
CenterMask2	V2-99	41.4	14.4
BlendMask	R-101	41.3	9.52
Mask R-CNN	ResNeSt101	40.65
BCNet	R-101-FPN	39.8
CenterMask	R-101-FPN	39.6	11.5
MS R-CNN	R-101-FPN	39.6	8.6
Mask R-CNN	R-101-FPN	38.4	8.6
BlendMask-RT	R-101	36.8	21
CenterMask2-Lite	V-39	36.7	34
YOLACT++	Resnet50-FPN	34.1	33.0
YOLACT-700	R-101-FPN	31.2	23.6
YOLACT-550	R-101-FPN	29.8	33.0

ポーズ抽出

認識＆QA

Transparency-by-Design networks (TbD-nets)

🖼️画像→分類

	CIFAR-10	CIFAR-100	CINIC-10	ImageNet top-1/top-5	ImageNet-C	ImageNet-P	ImageNetV2
CoAtNet				9.12%(Param 2.44B)
ViT-G/14+Extra Data				9.55±0.03(Param 1843M)			16.67±0.03
EfficientNet (L2)+Meta Pseudo Labels+Extra Data(300M unlabeled JFT)				9.8/1.2(Param 480M)
EfficientNet (L2)+ SAM+Extra Data	0.30±0.01	3.92±0.06		11.39/(Param 480M)
BEiT				11.4/1.34(Param 306M)
FixEfficientNet (L2)+Extra Data(300M unlabeled images)				11.5/1.3(Param 480M)
EfficientNet-L2 +Noisy Student (L2) + RandAugment +Extra Data(300M unlabeled images)				11.6/1.3(Param 480M)	22.2%	13.6%
BiT-L(JFT-300M Extra Data)	0.63	6.40±0.18		12.2
FixEfficientNet-B7+Extra Data				12.9/1.8(Param 66M)
VOLO-D5↑512				12.9(Param 296M)			22.0(Param 296M)
EfficientNet-B7 + Noisy Student(L2)+RandAugment+Extra Data				13.1/1.9(Param 66M)
BEiT-base(ViT; ImageNet 1K pretrain)				13.2/1.9(Param 87M)
EfficientNetV2-L (21k)				13.2(Param 120M)/53B
FixEfficientNet-B6+Extra Data				13.3/2.0(Param 43M)
FixEfficientNet-B5+Extra Data				13.6/2.1(Param 30M)
EfficientNet-B6 + Noisy Student(L2) + RandAugment + Extra Data				13.6/2.1(Param 43M)
FixResNeXt-101 32×48d				13.6/2.0(Param 829M)
VOLO-D3↑448				13.7(Param 86M)			22.3(Param 86M)
EfficientNetV2-M (21k)				13.8(Param 54M)/24B
EfficientNet-B5 + Noisy Student(L2) + RandAugment + Extra Data				13.9/2.2(Param 30M)
FixEfficientNet-B4+Extra Data				14.1/2.3(Param 19M)
Fix-EfficientNet-B8+MaxUp+CutMix				14.20/(Param 87.42M)
FixEfficientNet-B8				14.30/2.4(Param 87.42M)
EfficientNet-B8+AdvProp				14.5/2.7(Param 88M)
ResNeXt-101 32×48d				14.6/2.4(Param 829M)
EfficientNet-B4 + Noisy Student(L2) + RandAugment + Extra Data				14.7/2.5(Param 19M)
FixEfficientNet-B7				14.7/2.6(Param 66M)
EfficientNetV2-S (21k)				15.0(Param 24M)/ 8.8B
FixEfficientNet-B3+Extra Data				15.0/2.6(Param 12M)
FixEfficientNet-B6				15.1/2.7(Param 43M)
EfficientNet-B6+AdvProp				15.2(Param 43M)
FixEfficientNet-B5				15.3/2.8(Param 30M)
EfficientNet-B7	1.1	8.3		15.6/2.9(Param 66M)
ResNet-RS-50				15.6(Param 192M)
EfficientNet-B5+AdvProp				15.7(Param 30M)
LambdaResNet200				15.7(Param 42M)
EfficientNet-B3 + Noisy Student(L2)+ RandAugment +Extra Data				15.9/3.1(Param 12M)
FixEfficientNet-B4				16.0/3.0(Param 19M)
LambdaResNet152				16.0(Param 35M)
VAN-Large				16.1(Param 44.8M)
EfficientNetV2-S				16.1(Param 22M)
AmoebaNet-C (6,228)+ARS-Aug				16.12/3.28
AmoebaNet-C (6,228)+AutoAugment				16.46/3.52(Param 155.3M)
RegNetY-8.0GF							31.3±0.08(infer 113ms)
FixEfficientNet-B3				17.0/3.6(Param 12M)
EfficientNet-B4+AdvProp				16.7(Param 19M)
VAN-Base				17.2(Param 26.6M)
SENet-154				17.28/3.79
FixEfficientNet-B1+Extra Data				17.4/3.6(Param 7.8M)
EfficientNet-B2 + Noisy Student(L2) + RandAugment +Extra Data				17.6/3.7(Param 9.2M)
EfficientNet-B3+AdvProp				18.1(Param 12M)
EfficientNet-B1 + Noisy Student(L2)+ RandAugment + Extra Data				18.5/4.2(Param 7.8M)
FixEfficientNet-B1				18.7/4.3(Param 7.8M)
Dual-Path-Net-131				18.55/4.16(Param 79.5M)
RepVGG-B3-200epochs				19.48(Param 110.96M)
EfficientNet-B2+AdvProp				19.5(Param 9.2M)
FixEfficientNet-B0+Extra Data				19.8/4.6(5.3M)
EfficientNet-B2 + AutoAugment				19.73/5.02(9.2M)
Inception-ResNet-v2 + SENet				19.80/4.79
EfficientNet-B1+AdvProp				20.4(Param 7.8M)
FixEfficientNet-B0				20.7/5.4(Param 5.3M)
MixNet-L				21.1/5.8(Param 7.3M)
EfficientNet-B0 + Noisy Student(L2)+ RandAugment + Extra Data				21.2/5.5(Param 5.3M)
EffNetV2-B0				21.3(Param 7.1M)
SE-Res2Net-50				21.56/5.94(Param 25M)
MobileViT-S				21.6(Param 5.6M)
PyramidNet + ShakeDrop regularization + ARS-Aug	1.26(Param 26.0M)	10.24 (Param 26.0M)
PyramidNet + ShakeDrop regularization + AutoAugment	1.5±0.1(Param 26.0M)	10.7 ± 0.2 (Param 26.0M)
PyramidNet + ShakeDrop regularization + Population Based Augmentation (PBA)	1.46 ± 0.077	10.94 ± 0.094
WRN-SRS	4.06(Param 36.5M)	10.10(Param 106.4M)
EfficientNet-B0+AdvProp				22.4(Param 5.3M)
EfficientNet-B0 + AutoAugment				22.7/6.5(Param 5.3M)
ShuffleNetV2+ Large				22.9/6.7(Param 6.7M)
MixNet-M				23.0/6.7(Param 5.0M)
MobileNetV3				23.4/(Param 7.5M) or 26.7(Param 4M)
PyramidNet-200 (α˜=240)+ CutMix + ShakeDrop		13.81(Param 26.8 M)
MnasNet-92 (+SE)				23.87/7.15(Param 5.1M)
ShuffleNetV2+ Medium				24.3/7.4(Param 5.6M)
VAN-Tiny				24.6(Param 4.1M)
SharpSepConvDARTS	1.98±0.07(Param 3.6M)			25.1/7.8(Param 4.9M)
MobileViT-XS				25.2(Param 2.3M)
NAONet + Cutout	2.11(Param 128M)	14.36(Param 128M)
MobileNetV2				25.3/7.5(Param 6.9M) or 28.0/9.0(Param 3.4M)
PyramidNet+ ShakeDrop regularization + Cutout	2.31(Param 26.0M)	12.19 (Param 26.0M)
NASNet-A (7 @ 2304) + cutout+AdaNet	2.30(Param 26.4M)
NASNet-A (7 @ 2304) + cutout	2.40(Param 27.6M)
BlockQNN-Connection more filters	2.35(Param 33.3M)	14.83(Param 33.3M)
NAONet + Cutout		15.67(Param 10.8M)
AmoebaNet-B (N=6, F=128) + cutout	2.13±0.04(Param 34.9M)	15.80(Param 34.9M)
WideResNet-22 + AgrLearn	2.45
Shake-Shake + Cutout	2.56± 0.07	15.20± 0.21(Param 34.4M)
AlphaX+Cutout 実装	2.82(Param 5.1M)			24.5/7.8(Param 7.2M)
ProxylessNAS	2.08(Param 5.7M)			25.4/7.8
GDAS(C=36,N=6)+CutOut	2.82(Param 2.5M)	18.13(Param 2.5M)
ASNG-NAS	2.83±0.14(Param 3.9M)
DARTS + Cutout	2.83±0.06(Param 3.4M)			26.9/9.0(Param 4.9M)
ENAS + micro search space + CutOut	2.89(Param 4.6M)
SNAS + cutout	2.98(Param 2.89M)			27.3/9.2(Param 4.3M)
AmoebaNet-A(6, 36)	3.34±0.06(Param 3.2M)

テクニック

Attention Augmented Convolutional Networks
GPipe
Label Smoothing
Knowledge Distillation 日本語解説
DropBlock 日本語解説
- CutOut・Random Erasing Data Augmentation
- AutoAugment 日本語解説
- Fast AutoAugment
- Population Based Augmentation (PBA) AutoAugmentより1000倍効率的
- RandAugment PBAより更に効率が良い
MentorMix
Manifold Mixup
Random Mask ロバストなCNN
Don't Decay the Learning Rate 日本語解説
Co-Mixup ※CutMixより良い
AugMax

	Top-1 Err(%)	Top-5 Err(%)
Baseline: PyramidNet-200(α˜=240)(# params:26.8M)	16.45	3.69
+ Mixup(α=1.0)	15.63	3.99
+ DropBlock + Label smoothing (ε=0.1)	15.16	3.86
+ Cutout + Manifold Mixup (α=1.0)	15.09	3.35
+ ShakeDrop	15.08	2.79
+ CutMix	14.47	2.97
+ CutMix + ShakeDrop	13.81	2.29

Error rate (%) for CIFAR10

Methods/Labels	40	250	500	1000	2000	4000
PiModel		53.02±2.05	41.82±1.52	31.53±0.98	23.07±0.66	17.41±0.37
PseudoLabel		49.98±1.17	40.55±1.70	30.91±1.73	21.96±0.42	16.21±0.11
Mixup		47.43±0.92	36.17±1.36	25.72±0.66	18.14±1.06	13.15±0.20
VAT		36.03±2.82	26.11±1.52	18.68±0.40	14.40±0.15	11.05±0.31
MeanTeacher		47.32±4.71	42.01±5.86	17.32±4.00	12.17±0.22	10.36±0.25
MixMatch	47.54±11.50	11.08±0.87	9.65±0.94	7.75±0.32	7.03±0.15	6.24±0.06
iGPT-L	26.8±1.5	12.4±0.6				5.7±0.1
UDA	29.0 ± 5.9	8.8 ± 1.1				4.9 ± 0.2
ReMixMatch	19.10±9.64	6.27±0.34		5.73±0.16		5.14±0.04
FixMatch (CTA)	11.39±3.35	5.07±0.33				4.31±0.15
FixMatch (RandAugment)	6.4	4.69				4.23
Meta Pseudo Labels						3.89 ± 0.07

増分学習

学習後に、新しいクラスを追加可能(インクリメンタル学習)

ツリー構造

Tree-CNN

	Enzyme data	MNIST
SupoortNet	0.839	0.988
iCaRL	0.629	0.878

🖼️画像→分散表現(ベクトル)

🖼️画像→LATEX

im2latex

🖼️画像→🖹テキスト

プログラミング言語

pix2code

🖼️画像→🖼️画像

学習時ペア画像必要なし

ContentDisentanglement

学習時ペア画像必要あり

学習時ペア画像必要なし

U-GAT-IT
Sem-GAN
StarGAN ※ドメイン複数
GANimation
ELEGANT ※ドメイン複数
- 正のセット、負のセットを用意する必要あり、つまり「笑顔のドメイン」←→「笑顔でないドメイン」といったように
pix2pix-starGAN ※ドメイン複数
ALICE

Car2Car: root median residual deviation from linear alignment (lower is better).
TQM: Translation quality measured by translated digit classification accuracy (%)

	Car2Car	TQM:SVHN→MNIST	TQM:MNIST→SVHN
CrossNet
NAM	1.47	33.3	31.9
CycleGAN		26.8	17.7
DiscoGAN	13.81
SPA-GAN
AGGAN

Unsupervised Image-to-Image Translation Networks

🖼️白黒写真→🖼️カラー写真

ディープネットワークを用いた大域特徴と局所特徴の学習による白黒写真の自動色付け

🖼️ラフ画→🖼️線画

線画→イラスト

Style Transfer for Anime Sketches

🖼️欠損画像→🖼️補間画像

🖼️低解像度画像→🖼️高解像度画像

SR3
Neural Differential Equations for Single Image Super-Resolution
KernelGAN + ZSSR
EPSR PIRM2018 Region1 で1位を獲得
EUSR-PCL PIRM2018 Region1 で2位を獲得
4PP-EUSR
ESRGAN PIRM2018 Region1 で3位を獲得
IDN
Super-FAN
SRGAN
Deep Image Prior
"Zero-Shot" Super-Resolution using Deep Internal Learning

顔

低品質写真→高品質写真

DPED

🖼️ノイズあり画像→🖼️ノイズなし画像

	ノイズなし画像	ノイズあり画像ペア
NAC	必要なし	必要なし
Deep Image Prior	必要なし	必要なし
Noise2Void code	必要なし	必要なし
Noise2Noise 実装日本語解説	必要なし	必要あり
Path-Restore	必要あり	必要なし
Unprocessing Images for Learned Raw Denoising	必要あり	必要なし
GAN2GAN	必要あり	必要なし

Learning to See in the Dark

	Abs Rel	Sq Rel	RMSE	RMSE log	δ < 1.25	δ < 1.25^2	δ < 1.25^3
LightedDepth NewCRFs	0.028	0.077	1.567	0.049	0.991	0.999	1.000
BTS+ pre-trained on Cityscapes dataset	0.056	0.169	1.925	0.087	0.964	0.994	0.999
AdaBins	0.058	0.190	2.360	0.088	0.964	0.995	0.999
BTS	0.059	0.241	2.756	0.096	0.956	0.993	0.998
struct2depth(Motion)	0.1087	0.8250	4.7503	0.1866	0.8738	0.9577	0.9825
struct2depth	0.1231	1.4367	5.3099	0.2043	0.8705	0.9514	0.9765
Unsupervised Monocular Depth Estimation with Left-Right Consistency	0.133	1.158	5.370	0.208	0.841	0.949	0.978

MiDaS

ステレオ画像→Depth Map

	KITTI 2015 stereo D1-all
CSPF github	1.74％
Dedge-AGMNet	1.85%
EdgeStereo	2.08%
PSMNet	2.32％
iResNet	2.44%

画像→モーション画像

Im2Flow

🖼️顔画像→🖼️正面顔画像

TP-GAN
DREAM

顔写真→顔イラスト

CariGANs: Unpaired Photo-to-Caricature Translation

🖼️画像→3D

単一画像→3次元情報(検知)
KITTI BEV(birds-eye-view) KITTI 3D(3D bounding box)

	BEV Easy	BEV Moderate	BEV Hard	3D Easy	3D Moderate	3D Hard
OFTNet/RGB	7.16	5.69	4.61	1.61	1.32	1.00
MonoDIS/RGB	17.23	13.19	11.12	10.37	7.94	6.40
SMOKE/RGB	20.83	14.49	12.75	14.03	9.76	7.84
MoVi-3D/RGB	22.76	17.03	14.85	15.19	10.90	9.26
PatchNet/Depth				15.68	11.12	10.17
D4LCN/RGB+Depth	22.51	16.02	12.55	16.65	11.72	9.51
AM3D/RGB+Depth	25.03	17.32	14.91	16.50	10.74	9.52
GrooMeD-NMS	26.19	18.27	14.05	18.10	12.32	9.65
kinematic3d	26.69	17.52	13.10	19.07	12.72	9.17
PatchNet+3D Confidence/Depth				23.66	13.25	11.23

単一画像→3次元情報(トラッキング)

	BEV Easy	BEV Moderate	BEV Hard	3D Easy	3D Moderate	3D Hard
Monocular Quasi-Dense 3D Object Tracking	41.71	33.73	31.05	36.74	29.30	26.67

複数画像→3次元情報

	BEV Easy	BEV Moderate	BEV Hard	3D Easy	3D Moderate	3D Hard
DSGN	82.90	65.05	56.60	73.50	52.18	82.90
CG-Stereo	85.29	66.44	58.95	74.39	53.58	46.50

SVBRDF

Single-Image SVBRDF Capture with a Rendering-Aware Deep Network

顔画像→3D

🖼️画像+α→🖼️画像

Cityscape dataset

	FID	LPIPS
DSGAN (pix2pixHD)	28.80	0.12
BiCycleGAN 複数画像生成可	89.42	0.16

1つ以上の画像→画像

🖼️コンテンツ画像+🖼️スタイル画像→🖼️スタイル適用画像

BからAのドメインに変換するタスクの平均失敗率(Bと認識される確率)

	Glasses	Smile	Facial Hair
Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer 実装	1.1%	5.2	11.9%
Fader networks	6.6%	6.4%	18.2%

MUNIT(Multimodal Unsupervised Image-to-Image Translation)
Learning Linear Transformations for Fast Arbitrary Style Transfer
FastPhotoStyle
Deep Photo Style Transfer
Arbitrary Style Transfer
Deformable GANs for Pose-based Human Image Generation 写真+指定ポーズ画像→指定のポーズをした画像

A Style-Based Generator Architecture for Generative Adversarial Networks

🖼️画像+🖼️画像→セグメンテーション

One-Shot Texture Retrieval with Global Context Metric テクスチャ領域探索

🖼️線画イラスト+α→🖼️カラーイラスト

🖼️顔画像+年齢→🖼️推定顔画像

Learning Face Age Progression: A Pyramid Architecture of GANs

顔画像+形状線→推定顔画像

AF-VAE

🖼️画像+🕺ポーズ→🖼️推定ポーズ画像

🖼️画像＋🔊音声→🖼️ヒートマップ画像

Learning to Localize Sound Source in Visual Scenes

ストリーミング動画→異常値タイムライン+異常セグメント動画

Future Frame Prediction for Anomaly Detection – A New Baseline

点群→空間情報

SPLATNet

🖹テキスト+画像→対象領域

Actor and Action Video Segmentation from a Sentence

画像→質問&回答

FlipDial

深度画像→3D

ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans

動画→動画

高解像度

TecoGAN

深度

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

動画→分類

生成モデル

一覧

指標

FID(Fréchet Inception Distance) ※多様性の指標
Geometry Score 解説
The GAN Landscape
Pros and Cons of GAN Evaluation Measures
How good is my GAN?

GAIA

ACAI(Adversarially Constrained Autoencoder Interpolation)

Pytorch実装

条件なし

テクニック

Tempered Adversarial Networks GANの学習の際に学習データをそのままつかわず、ぼかすレンズのような役割のネットワークを通すことで、Progressive GANと似たような効果を得る手法。レンズのぼやけは敵対的損失と復元損失を最小化するよう学習し、敵対的損失だけ徐々に減らしてく。
Discriminator Rejection Sampling 密度比を用いて生成器の出力から採用/棄却を決める
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
ExtraAdam
Self-modulation
D-Optimal Regularizer
MH-GAN 実装
Adversarial Feedback Loop

ノイズから画像生成

Method	CIFAR-10 FID	CIFAR-10 IS	CelebA 64x64 FID	STL-10 FID	STL-10 IS	LSUN-bedrom 256 x 256 FID	LSUN-bedrom IS	FFHQ	ImageNet
real	7.8	11.24
StyleGAN-XL									12.24
StyleGAN3
Projected GAN
SWAGAN
Anycost GAN
InsGen
StyleGAN2								2.84 ± 0.03
COCO-GAN						6.95
NCSN	25.32	8.91
FastGAN	12.97	7.76 ± .12
PGGAN		8.80						8.04
AutoGAN-top1	12.42	8.55 ± .10		31.01	9.16 ± .12
Sphere GAN-ResNet	17.1	8.39 ± .08
MMD-rep-b 実装	16.21	8.29	6.79	37.63	9.34
SN-GAN	21.7	8.22		40.1±.50	9.10±.04
VGAN-GP 実装	18.1
WGAN-CT		8.12±.12
MoLM-1536	18.9	7.90
WGAN-GP, ResNet	19.9	7.86 ± .07
DCGAN	37.11	6.40

COCO-GAN 一枚の画像を複数回に分けて処理して生成することで,巨大なメモリなどを必要としない
GAN(Generative Adverasarial Networks)
WGAN(Wasserstein GAN)
- WGAN-GP(Improved Wasserstein GAN)
Cramer GAN
MMD GAN
DRGAN
VAEGAN(Variational AutoEncoder GAN)

※ really-awesome-gan
※ the gan zoo

不完全なデータから学習

MisGAN
Ambient-GAN ※ノイズや欠損があるデータからも学習可能

音楽生成

GanSynth

テキスト生成

MaskGAN

BLEU-2 score 1000sentence

	Taobao	Amazon	PTB
VGAN	0.969	0.868	0.695
SeqGAN	0.968	0.856	0.681

条件付き

指標

GAN-train & GAN-test

テクニック

クラス適用
ImageNet128×128

	Inception Score	FID	IS/FID
LOGAN	148.2 ± 3.1	3.36 ± 0.14	43.53
VQ-VAE-2
BigGAN-deep	166.5	7.4	22.5
BigGAN 日本語解説生成サンプル	98.8 ± 2.8	8.7 ± .6	11.35
Improved SAGAN with DRS	76.08 ± 0.30	13.57 ± 0.13	5.61
Self-Attention Generative Adversarial Network（SAGAN）日本語解説	52.52	18.65	2.82
SN-GAN-projection	36.80	27.62	1.33
AC-GAN	28.5
IFcVAE-GAN
GAMO2pix
infoGAN 実装
RoC-GAN(Robust Conditional GAN)
CGAN(Conditional GAN)

BiGAN(Bidirectional GAN) & ALI ※潜在変数を識別器は見る

スタイル適用

ノイズ+パラメータ→キャラクター顔画像

Towards the Automatic Anime Characters Creation with Generative Adversarial Networks

人物照合

Multi-Level Factorisation Net for Person Re-Identification

参考文献

データセット

画像分類

セグメンテーション

髪

Figaro-1k

道路

Playing for Data: Ground Truth from Computer Games

コンテンスト

このページを編集するこのページを元に新規ページを作成

印刷する

コメント（0）

カテゴリ：
学問・理系
総合

人工ニューラルネットワーク - 人工知能先頭へ

コメントをかく

名前	ユーザIDを使用しないで書き込む	ユーザーIDを使う	ログインする
画像コード	画像に記載されている文字を下のフォームに入力してください。
備考	「http://」を含む投稿は禁止されています。
本文
利用規約をご確認のうえご記入下さい

人工知能