case-control match

组内流传着一个可以做case-control match的matlab code，然后时隔3年，影子都没有见过。求人不如求己，自行谷歌

Principles of matching

Confounding implies that the confounding factor (which is one of the exposures) is not evenly distributed between cases and controls (or between exposed and unexposed). Therefore in order to prevent confounding the simplest solution would be to design a study in which cases and controls (or exposed and unexposed) would have an equal distribution of the confounding factor. This process is called matching.

Matching is most often applied in to case controls studies, however matching may be performed also in cohort studies[1].

We usually identify two types of matching process,individual matchingandfrequency matching. Both individual and frequency matching have the same consequence: matching will have to be taken into account during the analysis.

Individual matching

In this first method, matching is performed subject by subject. This is called individual matching. For example, if age is a confounding factor, for each case age 30 years, one control of the same age will be selected, and so far and so on for all cases included in the study. The results arepairsof individuals belonging to the same study population and sharing one common characteristic (in this example, a specific age).

In individual matching, we may also consider to select more than one control per case. Then two or more controls have then the same characteristic of the case. We have then constitutedtriplets(one case and 2 controls),quadruplets(one case and 3 controls), etc.

Frequency matching

In a second type of matching process, matching is no longer done individually but for groups of subjects. In such instance a group of controls is matched to a group of cases with respect to a particular characteristic (the confounding factor). For example if in a case control study with 50 cases there are 20 men and 30 women, we would select a control group having the same gender distribution. We would first select 20 men from the male study population and then 30 women from the female study population.

Why matching?

Matching controls to cases is nothing more than stratifying in advance of analysis. Instead of constituting strata at the time of the study analysis we prepare them before the study is done, at the time of controls selection. When we select one control per case, each stratum will include one case and one control. We will therefore have as many strata as pairs in the study. The objective of matching is to prepare the analysis. Matching optimizes the number of cases and controls per stratum. It avoids having no case or no control in a stratum, as could happen when doing a stratified analysis afterwards (The biggest inefficiency in a stratified analysis done afterward would occur when in a stratum there is either no case or no control). This is why matching is frequently mentioned as a way to improve the efficiency of an analysis by better distributing cases and controls between strata.

Propensity score matching （PSM）倾向评分匹配法

维基百科

倾向评分匹配法适用于两类情形。第一，在观察研究中，对照组与实验组中可直接比较的个体数量很少。在这种情形下，实验组和对照组的交集很小，比如治疗组健康状况最好的10%人群与非治疗组健康状况最差的10%人群是相似的，如果将这两个重合的子集进行比较，就会得出非常偏倚的结论。第二，由于衡量个体特征的参数很多，所以想从对照组中选出一个跟实验组在各项参数上都相同或相近的子集作对比变得非常困难。在一般的匹配方法中，我们只需要控制一两个变量（如年龄、性别等）即可，就可以很容易从对照组中选出一个拥有相同特征的子集，以便与实验组进行对比。但是在某型情形下，衡量个体特征的变量会非常多，这时想选出一个理想的子集变得非常困难。经常出现的情形是，控制了某些变量，但是在其他变量上差异很大，以至于无法将实验组和对照组进行比较。我们在测试A药对B病的治疗效果, 于是我们对50名病病人做实验, 给他们吃了药。接着就需要到社会上找与这50名病人情况类似, 但没有吃药的病人。为了研究X药的有效性, 对于每一个病人我们需要以某种方式在社会中找到他们的对照组。而这种匹配的方法, 就叫PSM。

倾向评分匹配通过使用逻辑回归模型来决定评分。

英文维基

General procedure

Run logistic regression:
Dependent variable: Y= 1, if participate; Y= 0, otherwise.
Choose appropriate confounders (variables hypothesized to be associated with both treatment and outcome)
Obtain propensity score: predicted probability (p) or log[p/(1 − p)].
Check that propensity score is balanced across treatment and comparison groups, and check that covariates are balanced across treatment and comparison groups within strata of the propensity score.
Use standardized differences or graphs to examine distributions
Match each participant to one or more nonparticipants on propensity score:
Nearest neighbor matching
Caliper matching
Mahalanobis metric
matching in conjunction with PSM
Stratification matching
Difference-in-differences matching (kernel and local linear weights)
Exact matching
Verify that covariates are balanced across treatment and comparison groups in the matched or weighted sample
Multivariate analysis based on new sample
Use analyses appropriate for non-independent matched samples if more than one nonparticipant is matched to each participant

Note: When you have multiple matches for a single treated observation, it is essential to use Weighted Least Squares rather than OLS.

SPSS implementation

Fuzzy matching

http://www.sific.com.cn/InsidePage/1000/80/2121.html

Python implementation

ctmatching

epydemiology

NearestNeighbors

R implementation

Description

Finds controls matching the cases as good as possible.

Usage

matchControls(formula, data = list(), subset, contlabel = "con",
               caselabel = NULL, dogrep = TRUE, replace = FALSE)`

偶然发现的内容

使用Fuzzywuzzy 找到最相近的strings

https://dirkmjk.nl/en/2017/10/how-do-fuzzy-matching-python

PreviousIntroduction Nextelastic net regularization

Last updated 5 years ago

Was this helpful?