Largest number that can be stored in a floating word of 7 bits

QUESTION: What is the largest base-10 positive number that can be stored using 7 bits, where the 1st bit is used for the sign of the number; the 2nd bit for sign of the exponent; 3 bits for mantissa, and the rest of the bits for the exponent?

ANSWER: Remember the base is 2.
1st bit will need to be zero as the number is positive.

2nd bit will need to be zero as that will make the exponent positive as 2^positive. number will give higher number than 2^negative number.

The mantissa bits will need to be 111 as you are looking for largest number and that will give the number to be 1.111 (the 1 before radix point is automatic) in base of 2 or 1*2^0+1*2^(-1)+1*2^(-2)+1*2^(-3)=1.875 in base of 10.

Now the exponent: it uses 2 bits. This will need to be 11 in base 2 and that is 3 in base 10. So the exponent part is 2^(+3)=8.

Largest number is +1.875*8=15

Now think what will give you the smallest positive number.

_______________________________________________

This post is brought to you by

MOOC on Introduction to Matrix Algebra released

Introduction to Matrix Algebra is available as a MOOC now on Udemy. Of course, it is free! https://www.udemy.com/matrixalgebra/.  You will have a lifetime access to 177 lectures, 14+ hours of high quality content, 10 textbook chapters complete with multiple choice questions and their complete solutions.

Learning Objectives are

  • know vectors and their linear combinations and dot products
  • know why we need matrix algebra and differentiate between various special matrices
  • carry unary operations on matrices
  • carry binary operations on matrices
  • differentiate between inconsistent and consistent system of linear equations via finding rank of matrices
  • differentiate between system of equations that have unique and infinite solutions
  • use Gaussian elimination methods to find solution to a system of equations
  • use LU decomposition to find solution to system of equations and know when to choose the method over Gaussain elimination
  • use Gauss-Seidel method to solve a system of equations iteratively
  • find quantitatively how adequate your solution is through the concept of condition numbers
  • find eigenvectors and eigenvalues of a square matrix

_____________________________________________________

This post is brought to you by

A Floating Point Question Revisited

QUESTION: A machine stores floating point numbers in 7-bit word. The first bit is stored for the sign of the number, the next three for the biased exponent and the next three for the magnitude of the mantissa. You are asked to represent 33.35 in the above word. The error you will get in this case would be
(A) underflow
(B) overflow
(C) NaN
(D) No error will be registered

The solution to problem is given here.

However a student asked me a follow up question, and here is the answer.

QUESTION: I was doing the multiple choice question and I am having trouble understanding it. I looked at the solution but I am having trouble still. I began by turning 33.35 into binary and i get 100001.01011. I just am having trouble putting it into the format. The max exponent value is 4 in this case but in the solutions it says you need 5. Maybe I do not understand what underflow and over flow is exactly.

ANSWER: The solution is given as you have pointed out.

The binary number in fixed format needs to be converted to floating point format. That would be 100001.01011=1.0000101011*2^5 as you move the radix point by 5 places to the left.  We move that 5 places as it gives us only one non-zero digit now to the left of the radix point.  This is no different from the procedure you use for converting a decimal format to scientific format for base-10 numbers.

Now all floating point formats have an upper limit of number it can represent.  Since the biased exponent has 3 bits, the biased exponent that can be represented is from 0 to 7, which means the unbiased exponent that can be represented is from -3 to 4 (biasing by +3, and unbiasing by -3).  But since we need to represent an unbiased exponent of 5, it cannot be done.  The maximum unbiased exponent that can be represented is 4.  So the number is larger than the one that can be represented.  If you put 32 ounces of water in a 24-ounce cup, we say that the water overflowed.  In this case, the number will overflow as it is more than it can handle.

You can see this in a different way as follows (looking at a solution a different way; that always helps the brain and your long-term memory).

The maximum number you can represent in binary for the given 7-bit word is 0111111 and that translates to (1.111)2*2^(111)2 which in base 10 is equivalent to (1.875)*2^(7-3)=30 (the 3 is used for unbiasing the exponent).  Hence, 33.35 would overflow, just like when you put  32 ounces of water in a 24-ounce cup.

_____________________________________________________

This post is brought to you by

Patriots football deflation given as a lesson learned and as an exercise in Numerical Methods

Deflate Gate is a great lesson in not jumping to conclusions.  Physicist Neil deGrasse Tyson did not change gauge pressure to absolute pressure; Bill Nye, a mechanical engineer, who calls himself the science guy, did not give convincing arguments; others did not change temperature to absolute temperature; other variables like water vapor pressure, and temperature of compressed air (compressed air is hot) to inflate balls, and time interval between when balls were inflated to when balls were taken to field were not accounted for.

Two exercises were given to students: http://nm.MathForCollege.com/experiments/deflategate.pdf

___________________

This post is brought to you by

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 100,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.

An example of Gaussian quadrature rule by using two approaches

Here is an example of using Gaussian quadrature rule through two approaches:

EITHER

by applying it on the original integrand by updating the argument of the integrand

OR

by applying it to the equivalent integrand because of the need to change the limits of integration to: -1 to 1.

http://nm.MathForCollege.com/blog/3pointquadruleexample.pdf

___________________

This post is brought to you by

Open course ware for Matrix Algebra Released

The open course ware for “Introduction to Matrix Algebra” has been released.   The topics include

 

  • Chapter 1: Introduction
  • Chapter 2: Vectors
  • Chapter 3: Binary Matrix Operations
  • Chapter 4: Unary Matrix Operations
  • Chapter 5: System of Equations
  • Chapter 6: Gaussian Elimination Method
  • Chapter 7: LU Decomposition
  • Chapter 8: Gauss-Seidel Method
  • Chapter 9: Adequacy of Solutions
  • Chapter 10: Eigenvalues and Eigenvector

For more details go to http://tap.usf.edu/stories/open-courseware-released-for-introduction-to-matrix-algebra/

___________________________________________

This post is brought to you by