Tuesday, February 10, 2015

variant, gene information from biomart R


# install
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
#document
#browseVignettes("biomaRt")
#load & attach package
library(biomaRt)
#Mart list.
listMarts()
SNP <- useMart("snp", "hsapiens_snp")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")



Using ensembl, and snp database, you can get many information using biomart such as variant, gene, transcript.. etc.

1. snp database information using rsid of variant

attributes.snp <- c("refsnp_id","chr_name","allele", "chrom_start", "chrom_end", "chrom_strand","sift_score","polyphen_score","minor_allele","minor_allele_freq","ensembl_gene_stable_id","ensembl_transcript_stable_id", "consequence_allele_string","consequence_type_tv","ensembl_type")

rs = "rs115940994";
List <- getBM(attributes = attributes.snp, filters="snp_filter", values = rs, mart = SNP)
List

There are Chr, position, allel, SIFT score, Polyphen, MAF, ENSG, ENST,, AA change, variant type(consequence type) etc.. informations.

ex)




There are features that is important in my thought. but also you can get another information attributes using this command.

SNP <- useMart("snp", "hsapiens_snp")
listAttributes(SNP)




2. ensembl database information using ENSG

ensembl_gene <- c("ENSG00000204406")
mart<- useDataset("hsapiens_gene_ensembl", ensembl)
attributes=c("ensembl_gene_id","hgnc_symbol","ensembl_transcript_id","ensembl_peptide_id","refseq_mrna","refseq_peptide","phenotype_description")
List = getBM(filters= "ensembl_gene_id", attributes, values=ensembl_gene, mart= mart)
List



I think the below attribute lists is important.
"ensembl_gene_id" "hgnc_symbol" "ensembl_transcript_id" "ensembl_peptide_id" "refseq_peptide" "uniprot_swissprot" "description" "go_id" "phenotype_description"

but you can get more informations using this command
# ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
#listAttributes(ensembl)

No comments:

Post a Comment