Fly Factor Survey

» Database of Drosophila TF DNA-binding Specificities

(A collaboration with Dr. Brodsky and Dr. Wolfe)

The FlyFactorSurvey database summarizes a project using the bacterial one-hybrid method to systematically describe the binding site preferences of transcription factors in Drosophila melanogaster.

Zhu LJ, Christensen RG, Kazemian M, Hull CJ, Enuameh MS, Basciotta MD, Brasefield JA, Zhu C, Asriyan Y, Lapointe DS, Sinha S, Wolfe SA and Brodsky MH. (2010) FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res., 39(Database issue): D111-D117.


» a bioconductor package for annotating peaks identified from ChIP-seq, Chip-chip or any high-throuput experiments

(A collaboration with Dr. Lawson and Dr. Green)

Batch annotation of the peaks identified from either ChIP-seq or ChIP-chip experiments. The package includes functions to retrieve the sequences around the peak, obtain enriched Gene Ontology (GO) terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements and other transcription factor binding sites supplied by users. This package leverages the biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest and stat packages

Zhu LJ*, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS and Green MR. (2010) [* denotes corresponding author] ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237.


» a bioconductor package for classifiying putative polyadenylation sites as true or false/internally oligodT primed

(A collaboration with Dr. Lawson)

This package uses the Naive Bayes classifier (from e1071) to assign probability values to putative polyadenylation sites (pA sites) based on training data from zebrafish. This will allow the user to separate true, biologically relevant pA sites from false, oligodT primed pA sites.

Sheppard, S., Lawson ND* and Zhu LJ*. (2013) [* denotes cocorresponding author] Accurate identification of polyadenylation sites from 3' end deep sequencing using a naïve Bayes classifier. Bioinformatics 2013


» a bioconductor package for analysis of high-throughput sequencing data processed by restriction enzyme digestion.

(A collaboration with Dr. Fazio)

The package includes functions to build restriction enzyme cut site (RECS) map, distribute mapped sequences on the map with five different approaches, find enriched/depleted RECSs for a sample, and identify differentially enriched/depleted RECSs between samples.

Chen PB, Zhu LJ, Hainer SJ, McCannell KN, Fazzio TG. Unbiased chromatin accessibility profiling by RED-seq uncovers unique features of nucleosome variants in vivo. BMC Genomics. 2014 Dec 15;15:1104. doi:10.1186/1471-2164-15-1104. PubMed PMID: 25494698; PubMed Central PMCID:PMC4378318.


» a bioconductor package for design of target-specific guide RNAs in CRISPR-Cas9, genome-editing systems.

(A collaboration with Dr. Brodsky)

The package includes functions to find potential guide RNAs for input target sequences, optionally filter guide RNAs without restriction enzyme cut site, or without paired guide RNAs, genome-wide search for off-targets, score, rank, fetch flank sequence and indicate whether the target and off-targets are located in exon region or not. Potential guide RNAs are annotated with total score of the top5 and topN off-targets, detailed topN mismatch sites, restriction enzyme cut sites, and paired guide RNAs. If GeneRfold is installed, then the minimum free energy and bracket notation of secondary structure of gRNA and gRNA backbone constant region will be included in the summary file. This package leverages Biostrings and BSgenome packages.

Zhu LJ, Holmes BR, Aronin N and Brodsky MH (2014). “CRISPRseek: A Bioconductor Package to Identify Target-Specific Guide RNAs for CRISPR-Cas9 Genome-Editing Systems.” PLoS one, 9(9).


» Build Regulatory Network from ChIP-chip/ChIP-seq and Expression Data

(A collaboration with Dr. Heidi)

GeneNetworkBuilder (GNB) is a web appliation for discovering the transcriptional regulatory network for a given transcription factor (TF) of Caenorhabditis elegans, Homo sapiens and so on, using ChIP-chip (ChIP-seq) combined with gene expression profile from either RNA-seq or expression microarray experiments.

A R/Bioconductor package is also available.


» a bioconductor package to plot stacked logos for single or multiple DNA, RNA and amino acid sequence

(A collaboration with Dr. Brodsky and Dr. Wolfe)

motifStack is a package that is able to draw amino acid sequence as easy as to draw DNA/RNA sequence. motifStack provides the flexibility for users to select the font type and symbol colors. motifStack is designed for graphical representation of multiple motifs.

This package is involved in the pipeline of finding candidate binding sites for known transcription factors via sequence matching.


» a bioconductor package to find and visualize signficantly enriched or depleted amino acid motif or amino acid group patterns in proteom dataset

(A collaboration with Dr. Acharya)

In addition to implement iceLogo in R to visualize differential amino acid sequence pattern, dagLogo can also test and visualize significant amino acid group patterns by classifying the amino acids into groups according to charge, chemistry and hydrophobicity and etc.


» Identification of Novel alternative PolyAdenylation Sites (PAS)

(A collaboration with Dr. Green)

Alternative polyadenylation (APA) is one of the important post-transcriptional regulation mechanisms which occurs in most human genes. InPAS facilitates the discovery of novel APA sites from RNAseq data. It leverages cleanUpdTSeq to fine tune identified APA sites.


» A bioconductor package with minimalist design for plotting elegant track layers

(A collaboration with Dr. Wang)

Visualize mapped reads along with annotation as track layers for NGS dataset such as ChIP-seq, RNA-seq, miRNA-seq, DNA-seq.


» search tool for RNAiCore

Analyzing composition of ZFP sites

» ZFN target site algorithm for identifying sites for selection using the Bacterial one hybrid system

This algorithm will also aid you in the design of libraries for the target sites using a combination of design and selection

ZFN target site algorithm for identifying sites compatible with the Lawson-Wolfe modular assembly system

» ZFN target site algorithm for identifying sites compatible with the Lawson-Wolfe modular assembly system

Zebrafish Genome Browser


To create motif logo of transcript factor for preview.

© 2016 mccb@umassmed
This site is maintained by Jianhong Ou & Lihua Julie Zhu.
Questions or comments? email: phone: 508-856-5379