京都大学医学研究科統計遺伝学分野

  • Paper: Methods of integrating data to uncover genotype–phenotype interactions
  • このレビューペイパーは分野の夏休みセミナー(2015年8月)に使用したものです。セミナーのためのサイトはhereです。ユーザ名、パスワードともに"guestsan"でログインして覗いてみてください。
  • This review was used in our summer seminar in Aug 2015, whose seminar site is here. You can login with username "guestsan" and password "guestsan". Feel free to access there.
  • 専門用語の理解も大事ですが、いわゆる「英単語」を正確に理解することも有用です。
  • Besides the specialized terms in the field, precise understanding of various English words are beneficial.
  • 以下の単語リストは、上記のレビューに出てくるもので、理解しておくと役に立つでしょう。
  • The followings are the words that appear in the review and their understanding would be helpful.
  • 簡単な言葉で分野外の人を相手に説明できればOKです!
  • If you can describe them to somebody unfamiliar with biology/informatics/statistics, your understanding is fine!
  • 中学生レベルの単語から院生レベルの単語まで色々です。複数回、登場する単語もあります。
  • Some of the words appear in high-school English textbooks and some appear frequently in graduate school textbooks. Some words appear more than once.

------
  • TITLELS, ABSTRACT and INTRODUCTION
    • integrate (integrating data)
    • genotype
    • phenotype
    • identification (identification of effective model)
    • predict (predict phenotypic traits)
    • trait (phenotypic traits)
    • elucidate (elucidating important biomarkers)
    • biomarker
    • insight (generate important insights)
    • heritability
    • complex trait
    • harness (harness the utility of ...)
    • emerge (emerging approaches for data integration)
    • meta (meta-dimensional)
    • dimension (meta-dimensional analyses)
    • multi (multi-staged)
    • stage (multi-staged analyses)
    • system (biological systems)
    • translational (translational bioinformatics)
    • complementary (complementary analysis)
    • family-based, population based (family-based data and population-basd data)
    • architecture (genetic architecture)
    • pathway (biological pathways)
    • aetiology (genetic aetiology)
    • interrogation (interrogation of genotype-phenotype associations)
    • compensate (compensate for missing or unreliable information)
    • principle (principles of meta-dimensional analysis and multi-staged analysis)
    • quantitative, categorical (quantitative or categorical outcome)
    • outcome (quantitative or categorical outcome)
    • challenge (analytical challenges)
    • perspective (provide our perspective on how such systems genomic analyses might develop in the future)
  • WHY INTEGRATE DATA?
    • predictor (predictor variables)
    • variable (predictor variables)
    • comprehensive (comprehendive modelling)
    • elaborate (the result of an elaborate interplay)
    • interplay (the result of an elaborate interplay)
    • informative (more informative model)
    • bridge (bridge the gap)
    • reflect (reflecting the complexity)
    • complexity (reflecting the complexity)
    • primary (primary motivation)
    • explain, predict (explain or predict disease risk)
    • risk (disease risk)
    • modest (The success ... has been modest)
    • limited (limited exploration)
    • exploration (limited exploration)
    • power (improved power)
    • mechanism (understanding of the mechanism)
    • causal (causal relationship)
    • stepwise (stepwise or hierarchical analysis)
    • hierarchical (hierarchical analysis)
    • refer (refers to the concept)
    • concept (the concept of integrating multiple different data types)
    • build (build a multivariate model)
    • multivariate (multivariate model)
    • given (a given outcome)
    • scientific (new scientific questions)
    • assemble (assembling all of these data types together)
    • diversity (diversity in the size of data sets)
    • size (diversity in the size)
    • pattern (patterns of missing data)
    • noise (noise across the different data types)
    • across (noise across the different data types)
    • correspondence (correspondence between measurements from different technologies)
    • measurements (correspondence between measurements from different technologies)
    • substantial (create substantial challenges)
    • single (no single analysis approach)
    • optimal (be optimal for all studies)
    • comprehensive (a comprehensive analysis toolbox)
    • expanded (a expanded analysis toolbox)
  • CHALLENGES WITH INDIVIDUAL DATA SETS
    • individual (individual data sets)
    • unique (unique challenges)
    • implement (before implementing multi-staged analyses)
    • quality (data quality)
    • scale (data scale)
    • dimensionality
    • potential (potential confounding of the data)
    • confounding (potential confounding of the data)
    • issue (these issues are not dealt)
    • each (each individual data types)
    • downstream (avoid downstream problems)
    • storage (computational power and storage capabilities)
    • capability (storage capabilities)
    • system (computing systems)
    • open-source (open-source to commercial packages)
    • commercial (commercial packages)
    • packages (commercial packages)
    • store (store these data)
    • track (track these data)
    • assurance (quality assurance)
    • control (quality control)
    • assay (low-throughput assays)
    • cluster (genotype clusters)
    • sample (any samples that did not cluster well)
    • rest (with the rest of the data set)
    • nature (large-scale nature of high-throughput data)
    • feasible (examining data individually is not feasible)
    • summary statistics (rely on summary statistics)
    • overview (broad overview of the data)
    • pipeline (quality control pipelines)
    • electronic medical record
    • profile (methylation profiling)
    • specific (specific and critical quality control steps)
    • critical (critical steps)
    • integrity (sample integrity)
    • distributional (distributional evaluation)
    • respect (with respect to variables)
    • ensure (will ensure that ...)
    • rigorously (how rigorously to perform)
    • reduction (data reduction)
    • limit (limit the number of variables)
    • single (in a single data set)
    • initial (as an initial step)
    • consider (when considering data with a vast number of independent variables)
    • independent (independent variables)
    • cross (cross-validation)
    • validation (cross-validation)
    • permutation (permutation testing)
    • concern (address this concern)
    • filter (filtering strategy)
    • facilitate (facilitates data integration analyses)
    • refine (more refined subset)
    • subset (more refined subset)
    • efficient (efficient computation)
    • computation (efficient computation)
    • burden (multiple-hypothesis testing burden)
    • full (full dimensionality)
    • consideration (computational time, memory and sample size considerations)
    • exhaustive (in an exhaustive manner)
    • combinatorial (combinatorial increase in models)
    • respective (and their respective computation times)
    • possible (all possible pairwise models)
    • pairwise (all possible pairwise models)
    • choose (by choosing 2 of the 5 million variables)
    • GPU (GPU clusters)
    • considerably (considerably faster)
    • traditional (traditional computing processors)
    • practicality (reaching the limits of practicality)
    • mine (data mining)
    • extrinsic, intrinsic (either extrinsic ... or intrinsic)
    • external (using information external to the data set itself)
    • prior (prior knowledge)
    • domain (in the public domain)
    • system (immune system)
    • time (the knowledge of the field at the time)
    • feature (remove biologically important features)
    • threshold (on a chosen P value threshold)
    • relevant (biologically relevant variants)
    • annotation (based on a Biofilter annotation)
    • drive (will drive the hypothesis that can be tested)
    • dominant (dominant paradigm)
    • paradigm (dominant paradigm)
    • stratify (by stratifying the data by type)
    • alternative (Hypothesis B is the alternative possibility)
    • multiple (multiple levels of molecular variation)
    • contribute (contribute to disease risk)
    • interactive (in a nonlinear, interactive and complex way)
    • subsequently (and subsequently performing analyses would inhibit ...)
    • appropriate (would be more appropriate)
    • particular (association with a particular outcome)
    • spurious (spurious association)
    • finding (interpretations of findings)
    • demographic (genetic, environmental, demographic or other technical factors)
    • technical (genetic, environmental, demographic or other technical factors)
    • address (address population stratification)
    • surrogate (surrogate variable)
    • interest (other variables of interest)
    • issue (overcome the potential issues with heterogeneity)
    • heterogeneity (overcome the potential issues with heterogeneity)
    • comprehensive (comprehensive data integration analyses)
  • AN OVERVIEW OF DATA INTEGRATION
    • scale (using only two different scales at a time)
    • refer (we refer to the numerical and categorical features)
    • continuous (continuous values)
    • reflect (this approach reflects Hypothesis A)
    • fusion (fusion of scales)
    • simultaneously (are combined simultaneously)
  • DATA INTEGRATION: MULTI-STAGED ANALYSIS
    • suggest (as its name suggests)
    • signal (signals are enriched)
    • enrich (signals are enriched with each step of the analyses)
    • deem (SNPs deemed significant)
    • option (one option is to look for ...)
    • binary (on a continuous or a binary dependent variable)
    • respectively (linear or logistic regression (depending on a continuous or a binary dependent variable, respectively)
    • rational (the rational of this approach)
    • arbitrary (relatively arbitrary threshold)
    • combat (combat multiple testing problems)
    • functional (functional SNPs)
    • inference (causal inference)
    • key (key drivers)
    • driver (key drivers)
    • exploit ((something) that exploit the naturally occurring DNA variation)
    • natural (naturally occurring)
    • reactive (as an independent, causative or reactive function)
    • likelihood (maximum likelihood)
    • fairly (are fairly powerful)
    • specific (allele-specific expression)
    • organism (diploid organisms)
    • preferential (preferentially expressed)
    • modification (epigenetic modifications)
    • product (gene product)
    • extra (extra resources)
    • resource (extra resources used for experimentally tagging the two alleles)
    • tag (experimentally tagging the two alleles)
    • extend (other extended methods)
    • context (used in other contexts)
    • state (chromatin state)
    • domain (domain knowledge-guided approaches)
    • guide (domain knowledge-guided approaches)
    • consolidate (is consolidated by initiatives)
    • initiative (initiatives such as ENCODE)
    • input (the genomic regions of interest are inputs)
    • unit (functional units)
    • annotate (annotate them with domain knowledge from muliple public database resources)
    • current (biased by current knowledge)
    • perturbation (environmental perturbations)
    • applicable (a multi-staged analysis would be applicable)
  • DATA INTEGRATION: META-DIMENSIONAL ANALYSIS
    • concatenation (concatenation-based integration)
    • transformation (transformation-based integration)
    • joint (joint relationship)
    • recurrence (time to recurrence)
    • alteration (copy number alteration)
    • via (via LASSO)
    • meaningful (in a meaningful way)
    • corresponding (values corresponding to the copies of a specific allele per individual)
    • per (values corresponding to the copies of a specific allele per individual)
    • inflate (can inflate high-dimensionality)
    • intermediate (transforming each data type into an intermediate form)
    • symmetrical (symmetrical ... matrix)
    • positive (positive ... matrix)
    • semi (semi-definite)
    • definite (semi-definite)
    • represent (a matrix represents the relative positions)
    • position (the relative positions of all samples)
    • merge (multiple graphs or kernels can then be merged)
    • elaborate (before elaborating any models)
    • preserve (the advantage of preserving data-type-specific properties)
    • property (data-type-specific properties)
    • representation (transformed into an appropriate intermediate representation)
    • unifying (as long as the data contain a unifying feature, such as patient identifiers)
    • identifier (patient identifiers)
    • robust (robust to different data measurement scales)
    • supervised (semi-supervised)
    • learning (semi-supervised learning)
    • space (original feature space)
    • encompass (model-based integration encompasses methods)
    • training (training set)
    • final (a final model)
    • phase (during the training phase)
    • available (DNA sequence data may be available)
    • suite (a suite of analysis tools)
    • majority (majority voting)
    • vote (majority voting)
    • resistance (drug resistance)
    • mutants (HIV proteave mutants)
    • complex (HIV protease-drug inhibitor complex)
    • recognition (protein fold recognition)
    • resulting (the resulting model)
    • weighted (in a weighted voting scheme)
    • scheme (in a weighted voting scheme)
    • probabilistic (construct probabilistic causal networks)
    • require (model-based integration requires a specific hypothesis)
    • resultant (resultant DNA sequence model)
    • incorporate (the only variables that are incorporated into the integrative analysis)
    • ensemble (ensemble-based approaches)
    • supervised (supervised learning)
    • label (with known labels (outcome or phenotype))
    • latent (latent variable)
    • exploratory (exploratory learning)
  • CAVEATS AND LIMITATIONS
    • caveat (caveats and limitations)
    • theoretical (theoretical distributions from which power calculations can be performed)
    • empirical (empirical power)
    • apply (these power estimates will apply only to the data set or simulation at hand)
    • at hand (these power estimates will apply only to the data set or simulation at hand)
    • universal (the universal power of the approach)
    • pitfall (potential pitfalls)
    • prohibitive (as the computation time can be prohibitive)
    • orthogonal (that extract orthogonal, or independent, relationships)
    • essential (which primary variables are essential)
    • gold standard (the 'gold standard' in human genetics is to look for replication of results)
    • replication (the 'gold standard' in human genetics is to look for replication of results)
    • stringent (more stringent protection)
    • protection (more stringent protection from type 1 errors)
    • underlie (underlying functional genomic units)
    • unit (underlying functional genomic units)
    • represent (represented by each variable)
    • external (external replication)
    • readily (independent data sets are not often readily available)
    • internal (internal replication)
    • extrinsic (extrinsic data)
    • corroborate (to estimate the strength of the available corroborating evidence supporting a given association)
    • validation (functional validation)
    • viable (viable alternative to replication)
    • bench (bench science)
    • literature (text mining to find literature that supports or refutes the original findings)
    • refute (text mining to find literature that supports or refutes the original findings)
    • in silico (in sillico modelling)
    • series (a series of experiments)
    • kinetic (kinetic experiments)
    • differential (differential equations)
    • within, between (highly correlated variables both within and between data types)
    • sparse (sparse data matrices)
    • metric (two metrics of the models are compared)
    • fitness (fitness metric)
    • parsimony (parsimony metric)
  • FUTURE DIRECTIONS
    • crude (crude tissue extract)
    • promise (showing promise)
    • reductionist (reductionist paradigm)
    • prevalent (less preavlent)
    • affordable (readily available and affordable)
    • prevail (will prevail as the dominat type of study design)
    • isolation (the days of studying molecular data variability in isolation)
  • CONCLUSION
    • emergence (emergence of new statistical and computational techniques)
    • facilitate (the emergence ... will facilitate the search)
    • compensatory (compensatory mechanisms)

コメントをかく


「http://」を含む投稿は禁止されています。

利用規約をご確認のうえご記入下さい

Menu

メニューサンプル1

メニューサンプル2

開くメニュー

閉じるメニュー

  • アイテム
  • アイテム
  • アイテム
【メニュー編集】

どなたでも編集できます