Glossary

Accession number, a unique identifier given to a biological polymer sequence (DNA, protein) when it is submitted to a sequence database.

A-DNA is one of the possible double helical structures which DNA can adopt. A-DNA is thought to be one of three biologically active double helical structures. It is a right-handed double helix with short, more compact helical structure whose base pairs are not perpendicular to the helix-axis.

Algorithm is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing.

Alignments are commonly represented both graphically and in text format. In almost all sequence alignment representations, sequences are written in rows arranged so that aligned residues appear in successive columns. In text formats, aligned columns containing identical or similar characters are indicated with a system of conservation symbols.

An allele is a variation of the same sequence of nucleotides at the same place on a long DNA molecule. The chromosomal or genomic location of a gene or any other genetic element is called a locus and alternative DNA sequences at a locus are called alleles.

An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. For example, in bioinformatics, it can describe coding sequences.

Computer architecture is the organisation of the components which make up a computer system and the meaning of the operations which guide its function.

Backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event.

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data.

Browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Examples include Google Chrome, Mozilla Firefox, Microsoft Edge, Opera Mini, and Internet Explorer.

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest and phenotypes or disease states.

The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein.

A chromosome is a long DNA molecule with part or all of the genetic material of an organism. These chromosomes display a complex three-dimensional structure, which plays a significant role in transcriptional regulation.

Client is a piece of computer hardware or software that accesses a service made available by a server as part of the client-server model of computer networks.

Code is a system of rules to convert information such as a letter, word, sound, image, or gesture into another form for communication or storage.

The process of creating and maintaining the source code of computer programs.

A sequence of 3 nucleotides which together form a unit of genetic code in DNA or RNA molecule.

In computer programming, the translation of source code into object code by a compiler.

Consensus sequence is the order of nucleotide or amino acid residues most frequently found within a DNA, RNA, or protein sequence.

Cyberspace is a concept describing a widespread interconnected digital technology.

Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML.

Data is a collection of discrete or continuous values that convey information, describing quantity, quality, fact, or statistics.

Database is an organized collection of data stored and accessed electronically through the use of a database management system.

A database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data.

Data definition language (DDL) is a syntax for creating and modifying database objects such as tables, indexes, and users.

Debugging is the process of finding and resolving bugs within computer programs, software, or systems.

Direct repeats are a type of genetic sequence that consists of two or more repeats of a specific sequence. Example:

5´ TTACGnnnnnnTTACG 3´
3´ AATGCnnnnnnAATGC 5´

A data manipulation language (DML) is used for adding, deleting, and modifying data in a database.

Deoxyribonucleic acid (DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix.

Domain names are often used to identify services provided through the Internet, such as websites and email services.

Download means to receive data from a remote system, typically a server such as a web server, FTP server, or email server.

An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing.

In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules.

The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network.

Gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA.

The GI number has been used for many years by NCBI to track sequences of GenBank and many other databases.

A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species.

A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA or RNA in RNA viruses.

G-quadruplex secondary structures (G4) are formed in nucleic acids by sequences that are rich in guanine.

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

Homologous chromosomes are a set of one maternal and one paternal chromosome that pair up with each other inside a cell during fertilization.

A network host is a computer or other device connected to a computer network. A host may work as a server offering services to users or other hosts on the network.

A hyperlink is a digital reference to data that the user can follow or be guided to by clicking or tapping.

HTML (HyperText Markup Language) is the standard markup language for documents designed to be displayed in a web browser.

An integrated development environment (IDE) is a software application that provides comprehensive facilities for software development.

Sequence identity is a way to measure the similarity between two sequences. For sequencing data, it is often thought of as the opposite of sequencing error rate.

Input refers to any information or data that is sent to a computer for processing.

The Internet is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices.

An intranet is a computer network for sharing information, communication, collaboration tools, and services within an organization.

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product.

An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement.

An Internet Protocol address (IP address) is a numerical label connected to a computer network that uses the Internet Protocol for communication.

JavaScript, often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS.

Linux is a family of open-source Unix-like operating systems based on the Linux kernel.

Motif is a region of protein or DNA sequence that has a specific structure and may indicate functionally important sites.

Mirror repeats are stretches of a sequence followed by its reverse sequence on the same strand. The subset of mirror repeats form triplex motifs.

MySQL is an open-source relational database management system (RDBMS).

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA.

Nucleotides are composed of a nucleobase, a five-carbon sugar, and a phosphate group.

Open Reading Frames (ORFs) are defined as spans of DNA sequence between the start and stop codons.

Perl is a high-level, general-purpose, interpreted, dynamic programming language.

Physical map is a technique used in molecular biology to find the order and physical distance between DNA base pairs by DNA markers.

Platform is an environment in which a piece of software is executed, such as hardware, an operating system, or a web browser.

Server is a piece of computer hardware or software that provides functionality for other programs or devices, called clients.

Short tandem repeat (STR) analysis is a common molecular biology method used to compare allele repeats at specific loci in DNA.

Structured Query Language (SQL) is a domain-specific language used in programming and designed for managing data held in a relational database.

User is a person who utilizes a computer or network service.

The World Wide Web (WWW), commonly known as the Web, is an information system enabling information to be shared over the Internet.

Z-DNA is one of the many possible double helical structures of DNA. It is a left-handed double helical structure in which the helix winds to the left in a zigzag pattern.

Source: wikipedia.org