Wednesday, January 28, 2015

Linux script to replace environment variables

I searched and could not find a good solution to replace all environment variables on linux. So below is my version (simple, reasonably fast). Enjoy.

#!/bin/bash
infile=$1
now=$(date +"%s")
tmpFile=~\$infile.$now.tmp

# Prepare the echo command file to replace environment variables
echo > $tmpFile

OLD_IFS="$IFS"
IFS=
while read line
do
    echo "printf \"$line\n\"" >> $tmpFile;
done < $infile
IFS="$OLD_IFS"

source $tmpFile
rm $tmpFile

Tuesday, December 2, 2014

The normal / traditional resolution for reservoir sampling proves the 1/n probability for picking a random number. Though, the random is largely biased in the sense much less variations since the only time the number changed is the probability matches the count. Given the constraint with only one number allowed for caching, I added line to toggle the cached values (improved the toggling rate for cached value). This logic applied after deciding the return value. It's a separate logic: I got a new value, do I keep it, or toggle with cached one?

Try the run and compare the result.


import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class ReservoirSampling {

    public static void main(String[] args) {
        List lst = new ArrayList();
        List lstToggled = new ArrayList();
        ReservoirSampling r = new ReservoirSampling();
        ReservoirSampling rt = new ReservoirSampling();
        for (int i = 1; i <= 100; i++) {
            lst.add(r.run(i));
            lstToggled.add(rt.runToggled(i));
        }
        System.out.println(lst);
        System.out.println(lstToggled);
    }

    public ReservoirSampling() {
        _rdm = new Random(System.currentTimeMillis());
    }

    public int run(int number) {
        _count++;

        if (_count == 1) {
            _lastNumber = number;
            return _lastNumber;
        }

        int n = _rdm.nextInt(_count + 1);
        if (n == _count) {
            _lastNumber = number;
        }

        return _lastNumber;
    }

    public int runToggled(int number) {
        _count++;

        if (_count == 1) {
            _lastNumber = number;
            return number;
        }

        int n = _rdm.nextInt(_count + 1);
        if (n == _count) {
            _lastNumber = number;
            return number;
        }

        int result = _lastNumber;
        if (n % 2 == 0) // toggled
            _lastNumber = number;

        return result;
    }

    private int _count;
    private int _lastNumber; // previous random number cached
    private Random _rdm;
}

Friday, March 26, 2010

.Net: How much slower when calling PropertyInfo.SetValue?

Link: internals from CLR team about this.

This question has been around in my mind for a while so I decide to give it a test.
The test is simple, directly set property value, verse using PropertyInfo to set the value. To make things more interesting, I also tested set value in Dictionary.

The result: 232 times (27166 / 117) slower!
In 1 million test runs,
  • direct access value: 117 milliseconds
  • PropertyInfo: 27166 milliseconds
  • Dictionary: 266 milliseconds
My conclusion: unless the logic is in the UI, or you really have to, don't use reflection for large performance critical jobs.

class Employee
{
private string _firstName;
private string _lastName;

public string FirstName
{
get { return _firstName; }
set { _firstName = value; }
}
public string LastName
{
get { return _lastName; }
set { _lastName = value; }
}

[TestMethod]
public void PropertyPerformanceTest()
{
int maxLoop = 1000000;
long durationDirect = 0;
long durationProperty = 0;
long durationDictionary = 0;

Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < e =" new" firstname = "Joan" lastname = "Smith" durationdirect =" sw.ElapsedMilliseconds;" pilastname =" typeof(Employee).GetProperty(" pifirstname =" typeof(Employee).GetProperty(" i =" 0;" e =" new" durationproperty =" sw.ElapsedMilliseconds;"> employeeValue = new Dictionary();
employeeValue.Add("LastName", null);
employeeValue.Add("FirstName", null);
sw.Start();
for (int i = 0; i < e =" new" durationdictionary =" sw.ElapsedMilliseconds;"> durationDictionary);
Assert.IsTrue(durationDictionary > durationDirect);
}

Monday, March 1, 2010

Software Engineer Productivity

Is this a myth?

Enough said.

Wouldn't it be easier to treat software engineers like building bricks and add / remove / replace them as needed? So whenever you want to grow, buy more, whenever facing financial pressure, remove some?

Unfortunately, that's not the way system works.
It's still take quite some effort / experience to learn which part of the your internal system is core asset and which parts is less dependent on domain knowledge and engineering insight. The latter part could be outsourced much more easily, but the core asset is not.

The other myth. As far as I saw, software engineering is still like an art, much or less. Yes, every one after some training can work as an carpenter. Does every carpenter produce the same quality of work in a similar time?

'Reuse' Is Not Usable

When it comes to code sharing, many times it is either not done at all, or only in a very minimal fashion, or heavily shared but super complicated -- meaning, shared once, but no one use that after they were shared.

Is there a balance? I have been always battling between sharing the code and keeping code simple. What's the principle or guidelines to follow?

When to share? When not? Or what are not sharable?
Below is the list I come up with:
1. Interface sharing. If the interface could be defined clear enough that any one (developer) comes and understand it right away, share it. If the shared interface became layer by layer, function plus function, specialized here and there for a particular implementation, don't share it.
2. Function sharing. Surprisingly, this is the best sharing technique. It provides two main benefits: a. Scalable. b. Focused logic (any function should be implemented within one page).
... to be filled

Rule of thump
  1. Copy / paste code is never a good idea.
  2. If it turns out you spend more time and make tremendous effort to reuse some simple stuff, you probably over shared the code
  3. Refer to rules above and try writing reusable code as much as possible.
And I liked this link...