The statistics of dbSNP data is well explained in dbSNP summary page.
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summary&build_id=142
When I'm finding the statistics of dbSNP database, the dbSNP build 142 is the newest.
But, I needed not only the number of RefSNP of Homo sapiens(Human), but also other organisms.
I agonized how can I get the whole number of RefSNP including all of Organisms...
I found that previous Build release has other organisms's data.
The build 138 has 131 Organisms's data.
So I combined each builds, and when there are same organisms in different build such as homo sapiens, I select only up to date data.
... I combined The number of each RefSNP of organisms.
To remove bracket(~) I made a c++ parsing code with sumation.
#include <stdio.h>
#include <iostream>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <vector>
using namespace std;
int main()
{
FILE * pFile;
char mystring [100];
char* tokens;
vector<int> v;
int sum = 0;
pFile = fopen ("count.txt" , "r");
while ( fgets (mystring , 100 , pFile) != NULL ){
tokens = strtok(mystring, "(");
v.push_back( atoi(tokens) );
}
fclose (pFile);
for(int i=0;i < v.size(); i++){
sum+=v[i];
cout<<v[i]<<endl;
}cout<<endl;
cout<<"Sum of RefSNP without homo sapiens: "<<sum<<endl;
sum+=112736879; // Homo sapiens
cout<<"Sum of RefSNP with homo sapiens: "<<sum<<endl;
return 0;
}
Number of RefSNP Clusters
Homosapiens: 112,736,879
Sum of 151 organisms with Homosapiens: 466,071,153
No comments:
Post a Comment