Input: A set of people with/without a disease (e.g., cancer) Measure a large set of genetic markers for each person (e.g., measurement of DNA at various.

Slides:



Advertisements
Similar presentations
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Identity Management - Login © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Reprint Outstanding Transactions Report © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Purchase Requisitions - Requester © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Payroll and HR Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

Co- location Mass Market Managed Hosting ISV Hosting.
Windows 7 Training Microsoft Confidential. Windows ® 7 Compatibility Version Checking.
Multitenant Model Request/Response General Model.
Feature: Purchase Order Prepayments II © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Announcing Demo Announcing.
Feature: OLE Notes Migration Utility
Feature: Web Client Keyboard Shortcuts © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: SmartList Usability Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
 Rico Mariani Architect Microsoft Corporation.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
WinHEC /22/2017 © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Print Remaining Documents © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Connect with life Connect with life
NEXT: Overview – Sharing skills & code.
FonePlus Hugh Teegan Architect Mobile Devices Microsoft Corporation.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Document Attachment –Replace OLE Notes © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Suggested Item Enhancements – Sales Script and Additional Information © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: Employee Self Service Timecard Entry © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
announcing Dev Manager Do I understand what we’ve built? Developer Can I bet on using this shared component? Testers What’s changed since I last.
Ian Ellison-Taylor General Manager Microsoft Corporation PC27.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
Feature: Void Historical/Open Transaction Updates © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
Feature: Suggested Item Enhancements – Analysis and Assignment © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and.
projekt202 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
The CLR CoreCLRCoreCLR © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Sr. Dir. – Systems Architecture Inlet Technologies.

IoCompleteRequest (Irp);... p = NULL; …f(p);
Ctrl-K, X Ctrl-K, S
MIX 09 4/17/2018 4:41 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Возможности Excel 2010, о которых следует знать
Title of Presentation 11/22/2018 3:34 PM
Title of Presentation 12/2/2018 3:48 PM
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
4/27/17, Bell #8 What amount of net pay has been earned this period?
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
PENSACOLA ENERGY WORK PLAN OCTOBER 10, 2016
Title of Presentation 5/12/ :53 PM
Шитманов Дархан Қаражанұлы Тарих пәнінің
Title of Presentation 5/24/2019 1:26 PM
5/24/2019 6:44 PM 1/8/18 Bell #10 In a world governed by the gods, is there any room for human will? Do human choices make a difference? EXPLAIN © 2007.
日本初公開!? Vista の新機能を実演 とっちゃん わんくま同盟 7/23/2019 9:09 AM
Title of Presentation 7/24/2019 8:53 PM
Presentation transcript:

Input: A set of people with/without a disease (e.g., cancer) Measure a large set of genetic markers for each person (e.g., measurement of DNA at various points) -personalized medicine -new drug targets -screening & preventative measures -genetic counseling -disease mechanism understanding + Desired output: A list of genetic markers causing the disease

Hidden structure in the data leads to: 1. Loss of power to detect signal of interest 2. Spurious hits (i.e., false positives) due to unaccounted confounding signal + SPURIOUS HITS! e.g., UNKNOWN GENETIC RELATEDNESS

BUT…IF subjects: Are closely/distantly related to each othe. Comprise different ethnicities Have samples that contain batch effects (processed slightly differently, and not at random) etc. (unknown confounders we don’t yet know about) THEN… Spurious correlations induced giving spurious hits True signal swamped out, reducing power to detect true associations Fundamental assumption in most statistical tests is that the subjects are sampled independently from the same distribution SPURIOUS HITS!

Also, the larger the study (# people), the worse the problem, since the power to detect ‘spurious’ signal increases But large studies are needed to detect markers with weak effect (Balding, Nat Rev Genet. 2006) Suppose the set of cases has a different proportion of ethnicity X from control Then genetic markers that differ between X and other ethnicities in the study, Y, will appear artificially to be associated with disease Furthermore, these (often numerous and strong) spurious associations can swamp out the true signal of interest A A A A A A T A A A A A T T T TT T T T T T T T T A A A A A A A A A A A A A A A

When testing thousands of genetic markers for association with a disease, we expect very few of them to truly be associated with disease Key insight: the resulting distribution of test statistics should be close to a uniform p-value distribution deviation from null distribution due to unmodeled structure ~7500 SNPs, ~1000 people, contains multiple ethnicities and families)

When testing thousands of genetic markers for association with a disease, we expect very few of them to truly be associated with disease Key insight: the resulting distribution of test statistics should be close to a uniform p-value distribution ~7500 SNPs, ~1000 people, contains multiple ethnicities and families)

Use the large scale of the data set itself to infer hidden population structure i.e., Use the genetic markers themselves, in aggregate, to see how ‘similar’ every two people are, and incorporate this into the analysis Best current approaches are: 1. Principle Component Analysis –based 2. Linear Mixed Models genetic ‘similarity’ matrix

Regress target phenotype on each genetic marker e.g., regress blood pressure level on SNP (and do for each SNP) Evaluate SNP for association by comparing this model to one that ignores the SNP (e.g. use LRT statistical test) blood pressure level SNP learned regression weight (importance of SNP to blood pressure) gaussian noise

genetic similarity between every two people Find major ‘axes of variation’ of the high dimensional space (# markers) Project each person’s markers into the low dimensional space captured by the top few axes Add projections as covariates in a standard regression analysis that looks for associations between marker and phenotype projection in low-dim space learned regression weight SNP 1 SNP 2 SNP M  Works well to capture broad structure  Sensitive to outliers (bad!)  Cannot capture fine-grained structure (bad!)  Fast computations (good!)

Do *not* reduce space to a set of directions Use it in its entirety! Use similarity as a (Bayesian) prior over hidden regression coefficients that are integrated out within a standard regression analysis genetic similarity between every two people  Captures multiple levels of similarity: broad and fine (good!)  Not sensitive to outliers (good!)  Computationally expensive (bad!)

Mixed model works better than PCA approach here ~7500 SNPs, 1000 people, variety of ethnicities + people that are related

Learning similarity matrixes from the data and showing them to be better than prior known structure usually used (e.g. pedigree) Combining heterogeneous sources of ‘similarity’ to gain power and reduce spurious association Using approximation tricks to make the models as fast as Principle Components Analysis approaches

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.