Site Search

Legal Glossary: Document Review & eDiscovery Definitions

Common Document Review and eDiscovery Terms

This glossary serves as a Document Review 101 guide to teach you some of the basic terms involved in this portion of discovery. These key words and phrases will allow you to talk the same language as your account executives and project managers and give you a greater understanding behind some of their questions, recommendations and decisions.


Archival Data

Digital information an organization holds for long-term storage, which is not immediately available or directly accessible to the user of a computer. This information is stored on backup tapes or other removable media for record keeping and disaster recovery purposes.

Back to top


Your connection speed; Amount of information or data that can be sent over a network connection within a certain period of time. This is typically measured in bits per second (bps) or megabits per second (Mbps).

Example: If you were to download 100MB worth of documents, you can expect it to take the following amount of time based on these sample bandwidths:


56 K (Modem) 56,000 bps 4 hours, 9 minutes and 39 seconds
256 K (DSL) 256,000 bps 1 hour
T1, DS-1 1.544 Mbps 9 minutes and 3 seconds
T1, DS-3 44.736 Mbps 18 seconds

Back to top

Bates Number

A unique number assigned to each page of a document within a paper or near-paper production set. Bates numbering, named for the Bates automatic numbering machine patented in the 1890’s, allows for the reference or retrieval of any page within a document review. (See Hash Value.)

Back to top

Cloud Computing

In its simplest form cloud computing is used to describe services that are hosted over the internet and are generally pay-as-you-go. There are three main categories of cloud computing: 1) SaaS or Software as Service, which is software offered by a third-party vendor that can be accessed over the internet; 2) PaaS or Platform as a Service, which allows the user to develop new platforms that is hosted on the provider’s infrastructure accessed through the internet; and 3) IaaS or Infrastructure as a Service, which can be best described as renting storage space or computing power through a secure internet connection. A cloud service can either be private or public. A public cloud provider only sells its services to a select group of users while a public cloud provider sells it services to anyone over the internet.

Back to top

Coding Documents

The process of capturing pertinent information from a document as it relates to your litigation. This is typically used to segregate documents into three categories: potentially privileged, potentially responsive or potentially non-responsive. Further issues codes specific to the matter can be captured at that time.

Back to top

De-duplication or De-duping

The process of comparing documents and other files based on their characteristics and removing duplicate or nearly-identical records from the data set. De-duping removes excess files from having to be reviewed, thus reducing the overall discovery cost.

Back to top

Early Case Assessment (ECA)

Early Case Assessment (ECA) can have different meanings relative to where in the discovery lifecycle it is deployed. ECA as an investigational tool is a methodology or business process during which a party or potential party to a lawsuit evaluates the merits of the underlying matter. ECA in this setting would involve conducting a risk-benefit analysis, issuing legal holds and taking the necessary steps to preserve potentially relevant data (the Zubulake duty). Further down the discovery timeline, ECA deployment would involve the use of specialty software to help cull data collections so that fewer documents are processed for review and production. Collectively, the use of ECA as a business process and as a software solution can provide significant cost savings.

Back to top

ESI - Electronically Stored Information

This is defined by the Federal Rules of Civil Procedure as computer-based information and other digitally stored data which includes writing, drawings, graphs, charts, photographs, sound recordings, images, and other data or data compilations stored in any medium from which information can be obtained.

Back to top

Extracted Text

The underlying text of a native file. Extracted text is more accurate than OCR and therefore provides highly defensible search results.

Back to top

Family Relationship

The various connections between two or more documents.

Family Range

The range of documents from the first Bates production number of the first page of the parent document through the last Bates production number of the last page of the last related document.

Parent/Child Relationship

The chain of documents that originate from one email or file folder. Note that a document may be both a parent and child depending on its relationship with surrounding documents. Example: In an email with a file attached, the email is the “parent” and the attached file is the “child.”


A document or file that shares a common parent. Example: In an email with two files attached, the email is the “parent” and the attachments are the “siblings.”


A document that was created as a child and subsequently separated from its parent. Example: An emailed attachment without the email or a faxed letter with a fax transmittal sheet.

Back to top

Forensic Collection

The collection of ESI using special software and techniques to create a bit by bit image of a computer or other electronic device. Careful planning and strict protocols are critical during this process to avoid spoliation.

Back to top

Hash Value

Hashing assigns a unique value to any electronic file that can demonstrate (or even authenticate) that the file has not been modified. It is like a digital fingerprint for ESI. Hash values have slowly started to replace Bates numbering during document productions. (See Bates Number.)

Back to top


Commonly referred to as data about data, metadata is various layers of information for a native file such as author, date created or time sent, etc. Think of this as the DNA code of a document. When document review software analyzes your files it looks at metadata, which had been automatically encoded into your files when they were created.

Back to top

Native File

A file saved in the format of its original application. Reviewing native files can save money related to per page costs for traditional TIFF or PDF processing. Example: The native files of a PDF document that was originally created in Microsoft Word would have the .doc file extension.

Sample native files include:

  • Microsoft Word (.doc)
  • Microsoft PowerPoint (.ppt)
  • Adobe Photoshop (.psd)
  • Microsoft Excel (.xls)
  • Microsoft Publisher (.pub)
  • Adobe Illustrator (.ai)

Back to top

Optical Character Recognition (OCR)

Technology used to scan static files and images to identify text. OCR software turns otherwise unsearchable letters, numbers and other characters into editable documents. (See Extracted Text.)

Back to top


The process of pulling random documents from a data set and statistically testing them to assess the frequency or likelihood of relevant information. Sampling can be a critical step in establishing the validity of the search processes, determining which sets of data should be preserved and reviewed, forecasting the time it will take to perform a legal document review and subsequently projecting the cost of the review.

Back to top


The accidental or sometimes intentional destruction of evidence that may be relevant to a pending or reasonably anticipated law suit or investigation. In electronic discovery, spoliation typically refers to how metadata is captured and preserved during collection and processing.

Back to top

Static File

A fixed image of a native file, typically a TIFF or PDF, which reflects a picture of a native file at a particular point in time.

Sample static files include:

  • Bitmap / BMP
    A bitmap, or bitmap image, is a file format that stores color data for each pixel in an image without compressing it. Because the image is not compressed, the file size tends to be large. However, it is sometimes preferable to other image storage formats as it reproduces crisp, high-quality graphics. Files in this format have a .bmp extension.
  • JPEG – Joint Photographic Experts Group
    Named for the team that developed this standard, the JPEG renders high-resolution photographs and other graphic images in a compressed format. Files in this format have a .jpg extension
  • PDF – Portable Document Format
    A widely-used format for sharing electronic documents. Documents are rarely created in PDF, but instead saved in this format to preserve the image of what was created in another software program. PDF files may retain metadata from the native file. Files in this format have a .pdf extension.
  • TIFF – Tagged Image File Format
    A format in which scanned documents and electronic files are saved as images. Files in this format have a .tif extension.

Back to top


Success Stories


© 2014 DTI -- All Rights Reserved

Trusted Advisors in Discovery Matters
Hudson Legal, a DTI Company, is a global full-service discovery firm offering end-to-end e-Discovery solutions and managed review services tailored to meet the needs of each case. Specializing in complete discovery process management, the experts at DTI bring clients a best-in-class combination of expert process design, technology, qualified teams of review attorneys and workspace logistics. Our approach ensures the highest quality results are delivered efficiently, accurately and affordably with a practical perspective.

Hudson Legal, a DTI Company, provides legal support services under the supervision of legal counsel representing our clients. DTI is not a law firm and does not practice law or provide legal services. Click here for further details regarding our policy.