Thursday, January 8, 2015

dbSNP statistics Number of RefSNP Clusters

I'm writing this post at 2015-01-08

The statistics of dbSNP data is well explained in dbSNP summary page.

http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summary&build_id=142

































When I'm finding the statistics of dbSNP database, the dbSNP build 142 is the newest.

But, I needed not only the number of RefSNP of Homo sapiens(Human), but also other organisms.

I agonized how can I get the whole number of RefSNP including all of Organisms...

I found that previous Build release has other organisms's data.

















The build 138 has 131 Organisms's data.

So I combined each builds, and when there are same organisms in different build such as homo sapiens, I select only up to date data.
















... I combined The number of each RefSNP of organisms.
To remove bracket(~) I made a c++ parsing code with sumation.




#include <stdio.h>
#include <iostream>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <vector>

using namespace std;


int main()
{
   FILE * pFile;
   char mystring [100];
   char* tokens;
   vector<int> v;

   int sum = 0;

   pFile = fopen ("count.txt" , "r");

    while ( fgets (mystring , 100 , pFile) != NULL ){
                tokens = strtok(mystring, "(");
                v.push_back( atoi(tokens) );
        }
    fclose (pFile);

        for(int i=0;i < v.size(); i++){
                sum+=v[i];
                cout<<v[i]<<endl;
        }cout<<endl;

        cout<<"Sum of RefSNP without homo sapiens: "<<sum<<endl;
        sum+=112736879; // Homo sapiens

        cout<<"Sum of RefSNP with homo sapiens: "<<sum<<endl;
        return 0;
}


Number of RefSNP Clusters

Homosapiens: 112,736,879
Sum of 151 organisms with Homosapiens: 466,071,153

No comments:

Post a Comment