Non-technical terms in a review

Paper: Methods of integrating data to uncover genotype–phenotype interactions
このレビューペイパーは分野の夏休みセミナー(2015年8月)に使用したものです。セミナーのためのサイトはhereです。ユーザ名、パスワードともに"guestsan"でログインして覗いてみてください。
This review was used in our summer seminar in Aug 2015, whose seminar site is here. You can login with username "guestsan" and password "guestsan". Feel free to access there.
専門用語の理解も大事ですが、いわゆる「英単語」を正確に理解することも有用です。
Besides the specialized terms in the field, precise understanding of various English words are beneficial.
以下の単語リストは、上記のレビューに出てくるもので、理解しておくと役に立つでしょう。
The followings are the words that appear in the review and their understanding would be helpful.
簡単な言葉で分野外の人を相手に説明できればOKです!
If you can describe them to somebody unfamiliar with biology/informatics/statistics, your understanding is fine!
中学生レベルの単語から院生レベルの単語まで色々です。複数回、登場する単語もあります。
Some of the words appear in high-school English textbooks and some appear frequently in graduate school textbooks. Some words appear more than once.

------

TITLELS, ABSTRACT and INTRODUCTION
- integrate (integrating data)
- genotype
- phenotype
- identification (identification of effective model)
- predict (predict phenotypic traits)
- trait (phenotypic traits)
- elucidate (elucidating important biomarkers)
- biomarker
- insight (generate important insights)
- heritability
- complex trait
- harness (harness the utility of ...)
- emerge (emerging approaches for data integration)
- meta (meta-dimensional)
- dimension (meta-dimensional analyses)
- multi (multi-staged)
- stage (multi-staged analyses)
- system (biological systems)
- translational (translational bioinformatics)
- complementary (complementary analysis)
- family-based, population based (family-based data and population-basd data)
- architecture (genetic architecture)
- pathway (biological pathways)
- aetiology (genetic aetiology)
- interrogation (interrogation of genotype-phenotype associations)
- compensate (compensate for missing or unreliable information)
- principle (principles of meta-dimensional analysis and multi-staged analysis)
- quantitative, categorical (quantitative or categorical outcome)
- outcome (quantitative or categorical outcome)
- challenge (analytical challenges)
- perspective (provide our perspective on how such systems genomic analyses might develop in the future)
WHY INTEGRATE DATA?
- predictor (predictor variables)
- variable (predictor variables)
- comprehensive (comprehendive modelling)
- elaborate (the result of an elaborate interplay)
- interplay (the result of an elaborate interplay)
- informative (more informative model)
- bridge (bridge the gap)
- reflect (reflecting the complexity)
- complexity (reflecting the complexity)
- primary (primary motivation)
- explain, predict (explain or predict disease risk)
- risk (disease risk)
- modest (The success ... has been modest)
- limited (limited exploration)
- exploration (limited exploration)
- power (improved power)
- mechanism (understanding of the mechanism)
- causal (causal relationship)
- stepwise (stepwise or hierarchical analysis)
- hierarchical (hierarchical analysis)
- refer (refers to the concept)
- concept (the concept of integrating multiple different data types)
- build (build a multivariate model)
- multivariate (multivariate model)
- given (a given outcome)
- scientific (new scientific questions)
- assemble (assembling all of these data types together)
- diversity (diversity in the size of data sets)
- size (diversity in the size)
- pattern (patterns of missing data)
- noise (noise across the different data types)
- across (noise across the different data types)
- correspondence (correspondence between measurements from different technologies)
- measurements (correspondence between measurements from different technologies)
- substantial (create substantial challenges)
- single (no single analysis approach)
- optimal (be optimal for all studies)
- comprehensive (a comprehensive analysis toolbox)
- expanded (a expanded analysis toolbox)
CHALLENGES WITH INDIVIDUAL DATA SETS
- individual (individual data sets)
- unique (unique challenges)
- implement (before implementing multi-staged analyses)
- quality (data quality)
- scale (data scale)
- dimensionality
- potential (potential confounding of the data)
- confounding (potential confounding of the data)
- issue (these issues are not dealt)
- each (each individual data types)
- downstream (avoid downstream problems)
- storage (computational power and storage capabilities)
- capability (storage capabilities)
- system (computing systems)
- open-source (open-source to commercial packages)
- commercial (commercial packages)
- packages (commercial packages)
- store (store these data)
- track (track these data)
- assurance (quality assurance)
- control (quality control)
- assay (low-throughput assays)
- cluster (genotype clusters)
- sample (any samples that did not cluster well)
- rest (with the rest of the data set)
- nature (large-scale nature of high-throughput data)
- feasible (examining data individually is not feasible)
- summary statistics (rely on summary statistics)
- overview (broad overview of the data)
- pipeline (quality control pipelines)
- electronic medical record
- profile (methylation profiling)
- specific (specific and critical quality control steps)
- critical (critical steps)
- integrity (sample integrity)
- distributional (distributional evaluation)
- respect (with respect to variables)
- ensure (will ensure that ...)
- rigorously (how rigorously to perform)
- reduction (data reduction)
- limit (limit the number of variables)
- single (in a single data set)
- initial (as an initial step)
- consider (when considering data with a vast number of independent variables)
- independent (independent variables)
- cross (cross-validation)
- validation (cross-validation)
- permutation (permutation testing)
- concern (address this concern)
- filter (filtering strategy)
- facilitate (facilitates data integration analyses)
- refine (more refined subset)
- subset (more refined subset)
- efficient (efficient computation)
- computation (efficient computation)
- burden (multiple-hypothesis testing burden)
- full (full dimensionality)
- consideration (computational time, memory and sample size considerations)
- exhaustive (in an exhaustive manner)
- combinatorial (combinatorial increase in models)
- respective (and their respective computation times)
- possible (all possible pairwise models)
- pairwise (all possible pairwise models)
- choose (by choosing 2 of the 5 million variables)
- GPU (GPU clusters)
- considerably (considerably faster)
- traditional (traditional computing processors)
- practicality (reaching the limits of practicality)
- mine (data mining)
- extrinsic, intrinsic (either extrinsic ... or intrinsic)
- external (using information external to the data set itself)
- prior (prior knowledge)
- domain (in the public domain)
- system (immune system)
- time (the knowledge of the field at the time)
- feature (remove biologically important features)
- threshold (on a chosen P value threshold)
- relevant (biologically relevant variants)
- annotation (based on a Biofilter annotation)
- drive (will drive the hypothesis that can be tested)
- dominant (dominant paradigm)
- paradigm (dominant paradigm)
- stratify (by stratifying the data by type)
- alternative (Hypothesis B is the alternative possibility)
- multiple (multiple levels of molecular variation)
- contribute (contribute to disease risk)
- interactive (in a nonlinear, interactive and complex way)
- subsequently (and subsequently performing analyses would inhibit ...)
- appropriate (would be more appropriate)
- particular (association with a particular outcome)
- spurious (spurious association)
- finding (interpretations of findings)
- demographic (genetic, environmental, demographic or other technical factors)
- technical (genetic, environmental, demographic or other technical factors)
- address (address population stratification)
- surrogate (surrogate variable)
- interest (other variables of interest)
- issue (overcome the potential issues with heterogeneity)
- heterogeneity (overcome the potential issues with heterogeneity)
- comprehensive (comprehensive data integration analyses)
AN OVERVIEW OF DATA INTEGRATION
- scale (using only two different scales at a time)
- refer (we refer to the numerical and categorical features)
- continuous (continuous values)
- reflect (this approach reflects Hypothesis A)
- fusion (fusion of scales)
- simultaneously (are combined simultaneously)
DATA INTEGRATION: MULTI-STAGED ANALYSIS
- suggest (as its name suggests)
- signal (signals are enriched)
- enrich (signals are enriched with each step of the analyses)
- deem (SNPs deemed significant)
- option (one option is to look for ...)
- binary (on a continuous or a binary dependent variable)
- respectively (linear or logistic regression (depending on a continuous or a binary dependent variable, respectively)
- rational (the rational of this approach)
- arbitrary (relatively arbitrary threshold)
- combat (combat multiple testing problems)
- functional (functional SNPs)
- inference (causal inference)
- key (key drivers)
- driver (key drivers)
- exploit ((something) that exploit the naturally occurring DNA variation)
- natural (naturally occurring)
- reactive (as an independent, causative or reactive function)
- likelihood (maximum likelihood)
- fairly (are fairly powerful)
- specific (allele-specific expression)
- organism (diploid organisms)
- preferential (preferentially expressed)
- modification (epigenetic modifications)
- product (gene product)
- extra (extra resources)
- resource (extra resources used for experimentally tagging the two alleles)
- tag (experimentally tagging the two alleles)
- extend (other extended methods)
- context (used in other contexts)
- state (chromatin state)
- domain (domain knowledge-guided approaches)
- guide (domain knowledge-guided approaches)
- consolidate (is consolidated by initiatives)
- initiative (initiatives such as ENCODE)
- input (the genomic regions of interest are inputs)
- unit (functional units)
- annotate (annotate them with domain knowledge from muliple public database resources)
- current (biased by current knowledge)
- perturbation (environmental perturbations)
- applicable (a multi-staged analysis would be applicable)
DATA INTEGRATION: META-DIMENSIONAL ANALYSIS
- concatenation (concatenation-based integration)
- transformation (transformation-based integration)
- joint (joint relationship)
- recurrence (time to recurrence)
- alteration (copy number alteration)
- via (via LASSO)
- meaningful (in a meaningful way)
- corresponding (values corresponding to the copies of a specific allele per individual)
- per (values corresponding to the copies of a specific allele per individual)
- inflate (can inflate high-dimensionality)
- intermediate (transforming each data type into an intermediate form)
- symmetrical (symmetrical ... matrix)
- positive (positive ... matrix)
- semi (semi-definite)
- definite (semi-definite)
- represent (a matrix represents the relative positions)
- position (the relative positions of all samples)
- merge (multiple graphs or kernels can then be merged)
- elaborate (before elaborating any models)
- preserve (the advantage of preserving data-type-specific properties)
- property (data-type-specific properties)
- representation (transformed into an appropriate intermediate representation)
- unifying (as long as the data contain a unifying feature, such as patient identifiers)
- identifier (patient identifiers)
- robust (robust to different data measurement scales)
- supervised (semi-supervised)
- learning (semi-supervised learning)
- space (original feature space)
- encompass (model-based integration encompasses methods)
- training (training set)
- final (a final model)
- phase (during the training phase)
- available (DNA sequence data may be available)
- suite (a suite of analysis tools)
- majority (majority voting)
- vote (majority voting)
- resistance (drug resistance)
- mutants (HIV proteave mutants)
- complex (HIV protease-drug inhibitor complex)
- recognition (protein fold recognition)
- resulting (the resulting model)
- weighted (in a weighted voting scheme)
- scheme (in a weighted voting scheme)
- probabilistic (construct probabilistic causal networks)
- require (model-based integration requires a specific hypothesis)
- resultant (resultant DNA sequence model)
- incorporate (the only variables that are incorporated into the integrative analysis)
- ensemble (ensemble-based approaches)
- supervised (supervised learning)
- label (with known labels (outcome or phenotype))
- latent (latent variable)
- exploratory (exploratory learning)
CAVEATS AND LIMITATIONS
- caveat (caveats and limitations)
- theoretical (theoretical distributions from which power calculations can be performed)
- empirical (empirical power)
- apply (these power estimates will apply only to the data set or simulation at hand)
- at hand (these power estimates will apply only to the data set or simulation at hand)
- universal (the universal power of the approach)
- pitfall (potential pitfalls)
- prohibitive (as the computation time can be prohibitive)
- orthogonal (that extract orthogonal, or independent, relationships)
- essential (which primary variables are essential)
- gold standard (the 'gold standard' in human genetics is to look for replication of results)
- replication (the 'gold standard' in human genetics is to look for replication of results)
- stringent (more stringent protection)
- protection (more stringent protection from type 1 errors)
- underlie (underlying functional genomic units)
- unit (underlying functional genomic units)
- represent (represented by each variable)
- external (external replication)
- readily (independent data sets are not often readily available)
- internal (internal replication)
- extrinsic (extrinsic data)
- corroborate (to estimate the strength of the available corroborating evidence supporting a given association)
- validation (functional validation)
- viable (viable alternative to replication)
- bench (bench science)
- literature (text mining to find literature that supports or refutes the original findings)
- refute (text mining to find literature that supports or refutes the original findings)
- in silico (in sillico modelling)
- series (a series of experiments)
- kinetic (kinetic experiments)
- differential (differential equations)
- within, between (highly correlated variables both within and between data types)
- sparse (sparse data matrices)
- metric (two metrics of the models are compared)
- fitness (fitness metric)
- parsimony (parsimony metric)
FUTURE DIRECTIONS
- crude (crude tissue extract)
- promise (showing promise)
- reductionist (reductionist paradigm)
- prevalent (less preavlent)
- affordable (readily available and affordable)
- prevail (will prevail as the dominat type of study design)
- isolation (the days of studying molecular data variability in isolation)
CONCLUSION
- emergence (emergence of new statistical and computational techniques)
- facilitate (the emergence ... will facilitate the search)
- compensatory (compensatory mechanisms)

このページを編集するこのページを元に新規ページを作成

印刷する

コメント（0）

カテゴリ：
学問・理系
総合

Non-technical terms in a review - Statistical Genetics, Kyoto University 先頭へ

コメントをかく

名前	ユーザIDを使用しないで書き込む	ユーザーIDを使う	ログインする
画像コード	画像に記載されている文字を下のフォームに入力してください。
備考	「http://」を含む投稿は禁止されています。
本文
利用規約をご確認のうえご記入下さい

Statistical Genetics, Kyoto University

コメントをかく

Menu

メニューサンプル1

メニューサンプル2

Latest pages

2016-03-14

2015-12-12

2015-11-01

2015-10-10

2015-09-21

2015-09-11

2015-09-07

2015-08-22

2015-08-17

2015-08-15

2015-08-13

2015-07-30

2015-07-29

2015-07-27

2015-07-24

Latest comments

QR code

アクセス解析中