GSoC 2021 project idea 11.1: Measure Quality of CerebUnit Validation tests

Harsh · March 21, 2021, 8:39am

@lungsi As I am currently focussed on calculating power as suggested by you, I have some suggestions. I would like you to either rectify me if am wrong or let me know if that’s what relates to power.

One of the ways is

Type 1 error is rejecting the null hypothesis even when it is true. Can we consider this as a False Positive?
Similarly for Type 2 error, can we consider that as false negative?
Now, the Higher the power, the lesser the chance of making a type 2 error as (β = 1-power).
Now P(True Positive) = 1 - P(False Negative) (β = P(False Negative))
So can we find P(True Positive) or P(False Negative)?
Is the Positive Predictive Value closely relates to P(True Positive)?

Another way (inspired by online resources):

Power = P(Reject H0) when H0 is False
= P(Z >=(x̅ - μ)/ (σ/root(n) ) ) when μ is (value when H0 is false)

Harsh · March 21, 2021, 9:00am

I have sent out the mail including link to project proposal draft. Completed the first section of the proposal (About me). Please do let me know if you want me to add anything.

lungsi · March 22, 2021, 11:40am

Yes. (See related comments at the bottom)

Yes but one needs to be careful in this interpretation. Remember that α (symbol for probability of making a Type 1 error) and β (symbol for probability of making a Type 2 error) are conditional probabilities - α, probability of wrongly rejecting the null hypothesis given that it is true, and β, probability of wrongly rejecting the alternate hypothesis given that it is true.

Also, α and β are probabilities for a test to make the respective errors. Meaning, if the test makes an error, the error signifies a single false positive/negative. Therefore, when you interpret α and β in terms of P(false positive) and P(false negative) respectively, note that these are probabilities that the single test outcome is either a false positive or false negative.

This is why the answer to your question below is no.

Furthermore, the Positive Predictive Value is mathematically defined as \frac{n_{TP}}{n_{TP}+n_{FP}} where n_{TP} is the number of true positives, n_{FP} is the number of false positives and n_{TP}+n_{FP} is the total positives. Unlike “numbers of things” probability values ranges from 0 to 1. The number of true positives will depend on the prevalence, say of a disease.

It is true that power viewed in terms of probability is P(rightly reject H_0 | H_0 is false) because

β = P(wrongly reject H_a | H_a is true) = 1 - P(rightly reject H_0 | H_0 is false)

However, the formula above is wrong. First of, even if the expression inside the parenthesis were correct, it would be the formula of β (which is what we ultimately want). Secondly, the expression is

\beta \approx P\left[ z <z_{\alpha*} - \frac{|\mu_0 - \mu_a|}{\sigma_{\bar y}}\right]

where, \mu_0 denotes the hypothesized mean under H_0, \mu_a is the mean under research hypothesis,H_a, \sigma_{\bar y} is the standard deviation of the sample mean \bar y and for some specified α the z-value is z_{\alpha*} = z_{\alpha} for one-tailed test and z_{\alpha*} = z_{\alpha / 2} for two-tailed test.

Notice that the above formula for β is for z-score. There are other formulas too. Here’s one for a 1-Sample t - test and another for a 2-Sample test.

The point being that, there is no one exact formula to compute power. However, looking at the above expression one can determine the factors affecting β

β is inversely proportional to sample size, n
β is inversely proportional to level of significance, α
β is inversely proportional to | null value - true value |

Consequently,

power is proportional to sample size, n
power is proportional to level of significance, α
power is proportional to | null value - true value |

Therefore, a well designed experiment will consider the questions “Do we have sufficient power for a worthwhile hypothesis testing?” Put another way, do we have enough data, i.e., sample size (for large enough power).

I will end it here to say that it is up to the user to decide which error (Type-1 or Type-2) is their main concern.

Regarding the proposal draft its good that you have begun working on it. But, at the moment I don’t have much to comment since you have yet to enter the sections where you discuss the specificity of your project and your plan of action.

Harsh · March 28, 2021, 11:48am

@lungsi Can we have a short meeting call ?

lungsi · April 1, 2021, 7:41pm

Sure. How do you want to do it? Zoom?

Harsh · April 3, 2021, 12:57pm

@lungsi Via Google Meet. Tomorrow at 5:00 PM IST. Is that ok?

lungsi · April 3, 2021, 5:50pm

How are you doing? Hope you are feeling better. The deadline for submission is April 13 18:00 UTC, which is about 10 days from now, so don’t worry too much about the proposal. Prioritize taking care of your health for now.

5pm (your time) is fine with me. But, because of Easter weekend (holiday) the earliest I will be available for a chat is Tuesday April 6.

Let me know if April 6 5pm (your date and time) works for you. Also, don’t forget to send me the link for google meet.

Davidyzy · April 6, 2021, 6:58am

Hi! This is Zhanyuan Ye (David). I am a senior student double major in mathematics and statistics and a minor in computer science. I am proficient with python and statistic. I have great interest in this project and discover more about biology through coding and statistical methods. May I ask if there is still seat available for this project?
Thank you.
David

lungsi · April 6, 2021, 10:21am

@Davidyzy Sure! There is no limit to the number of project proposals that can be submitted. So feel free to submit your proposal before the deadline April 13 18:00 UTC. And if you have questions or want feedback before you submit your proposal don’t hesitate to ask.

malin · April 6, 2021, 11:25am

Actually, in this year’s round students are limited to submitting three proposals (and that is probably good, for each one is a lot of work) @Davidyzy

lungsi · April 6, 2021, 11:28am

@malin

Thanks for the clarification. @Davidyzy This means Each student may submit up to 3 proposals to the program. This is not the same as saying only 3 proposals from different students are the limit.

Can I submit more than one proposal? GSoC FAQ
Yes, each student may submit up to three proposals. However, only one per student may be accepted. No more than one proposal per student will be accepted, no matter how many proposals you submit.

Yes

Davidyzy · April 6, 2021, 11:33am

What about the space in this specific project? Like you would only accept a certain number of students into this project?

lungsi · April 6, 2021, 12:10pm

@malin May be the right person to answer regarding the final number of students (accepted) per project. But, I imagine it would be about 1-3 students.

Google’s “maximum number of students” for our INCF organization in recent years have been up to 20ish slots. This divided by each team in INCF’s is the rough number of students per team (not project): a team may have more than one project ideas.

@Davidyzy As of now as long as you have not already submitted proposals to three project ideas you are free to submit the proposal for this project.

malin · April 6, 2021, 1:39pm

Hi, and yes, 1-3 depending on the nature of the project; if it can be split into independent parts that several students can work on separately.

lungsi · April 8, 2021, 8:54am

To all prospective proposal writers, there are just a few more days until the deadline. Although I have mentioned it at the top of this thread my interactions with those who contacted me indicated most have not “studied” Writing a proposal | Google Summer of Code Guides.

I strongly recommend you study it not just read it. Pay particular attention to Elements of a Quality Proposal (making sure you attempt to check the boxes/points in your proposal) and Submit a Draft Proposal early.

Good Luck!

Harsh · April 8, 2021, 9:23am

@malin @lungsi As I am trying to submit the draft proposal, What is the INCF proposal tag for this project?

The following dropdown options are available:

tvb_project
genn_project
brian_project
imagej_project
devoworm_project
osb_project
dipy_project
bids_project
brainbox_project
other_project
N/A

Which one should I choose ?

lungsi · April 8, 2021, 9:41am

@malin Since this project idea in not under any of the listed INCF GSoC Working Groups, I think the proposal tag ought to be “other_project” or “N/A”. Am I right?

Harsh · April 8, 2021, 9:46am

Ok, I have selected “other_project” which can be changed later as mentioned there while submitting Draft proposal

Harsh · April 8, 2021, 9:50am

I have successfully shared my draft proposal on GSoC portal with “other_project” and title as “Measure Quality of CerebUnit Validation tests”.

Harsh · April 9, 2021, 4:29am

@lungsi Would you please review my Draft Proposal? Do let me know if you have any suggestions to make it better.